Changelog🔗
null🔗
🏷️ Release Tag: v1.2.1
📅 Release Date: 2025-01-26
- Fix issue with
null
headers in theCHANGELOG.md
file. --> Root cause: it was because GitHub was rate-limiting unauthenticated API calls. --> Resolution: Add authentication to thecurl
command (by Chris Mahoney) View - Fix a few logical bugs in the docs for the
info
and theformatting
modules (by Chris Mahoney) View - For all the info about how to contribute, pull it out of the
README.md
file and place it in to theCONTRIBUTING.md
file. Also create a newContributing
page in the docs, add instructions for installation usinguv
, and add more questions to the FAQ. (by Chris Mahoney) View
null🔗
🏷️ Release Tag: v1.2.0
📅 Release Date: 2025-01-25
- Add the
column_contains_value()
function to thechecks
module (by Chris Mahoney) View - Add missing docstrings for functions in the
checks
module -assert_valid_spark_type() -
warn_column_invalid_type()-
warn_columns_invalid_type()` (by Chris Mahoney) View - For all docs in the
checks
module, ensure they they have theSee Also
section, which appropriately references other functions (by Chris Mahoney) View - Fix typos in the
cleaning
module (by Chris Mahoney) View - Fix up the
assert_table_exists()
function to ensure it throws the correct error, and that it is approprately documented (by Chris Mahoney) View - Add unit tests for the
column_contains_value()
function (by Chris Mahoney) View - Fix missing docs references (by Chris Mahoney) View
- Fix logical flaw in the function
column_contains_value()
(by Chris Mahoney) View - Fix missing unit tests (by Chris Mahoney) View
- Fix bug in the unit tests for the
checks
module (by Chris Mahoney) View - Fix ref errors in docs (by Chris Mahoney) View
- Add new functions to the
dimensions
module -make_dimension_table()
-replace_columns_with_dimension_id()
(by Chris Mahoney) View - Add two new modules
formatting
andinfo
Functions: -format_numbers()
-display_intermediary_table()
-display_intermediary_schema()
-display_intermediary_columns()
-get_distinct_values()
(by Chris Mahoney) View - Speed up the execution of the Unit Tests by changing the
setUpClass()
andtearDownClass()
structure tosetUpModule()
andtearDownModule()
(by Chris Mahoney) View - Add docstrings to the
make_dimension_table()
andreplace_columns_with_dimension_id()
functions (by Chris Mahoney) View - Fix up a few logical bugs in the
dimensions
module (by Chris Mahoney) View - Add new unit tests to now have 100% coverage of the
dimensions
module (by Chris Mahoney) View - Fix the structure of the tables used in the Unit Tests (by Chris Mahoney) View
- Fix missing
TestCase
imports in all Unit Tests (by Chris Mahoney) View - Enhance the Unit Tests setup and functionality 1. Change all
@property
to@cached_property
2. Fix all column ID's to start at index1
, not0
3. For any DataFrames including columns of typedatetime
, ensure they are cast to the proper type 4. All unit test classes are sub-classed from both thePySparkSetup
andTestCase
classes (by Chris Mahoney) View - Move the
get_column_values()
function from thecleaning
module to theinfo
module (by Chris Mahoney) View - Fix docstring for the
get_column_values()
function (by Chris Mahoney) View - Rename
get_column_values()
toextract_column_values()
(by Chris Mahoney) View - Add docstrings to the
get_distinct_values()
function (by Chris Mahoney) View - Finalise docs for the
info
module (by Chris Mahoney) View - Finish adding unit tests for the
info
module (by Chris Mahoney) View - Add all docs for the
formatting
module (by Chris Mahoney) View - Add all unit tests for the
formatting
module (by Chris Mahoney) View
null🔗
🏷️ Release Tag: v1.1.0
📅 Release Date: 2025-01-19
- Enhance initialisation scripts (by Chris Mahoney) View
- Refresh the structure of the
pyproject.toml
file (by Chris Mahoney) View - Relabel some key elements in the
io
package 1.transfer_table()
->transfer_table_by_path()
2. Change parameter in thewrite_to_path()
function:table
->data_frame
(by Chris Mahoney) View - Fix the
data_format
parameter for the functions in theio
module so they can use the pre-definedSPARK_FORMATS
Literal values (by Chris Mahoney) View - Add better headers and comments throughout the
io
module (by Chris Mahoney) View - Fix up the default options in both the
read_from_path()
andwrite_to_path()
functions (by Chris Mahoney) View - Add three new table-spcific functions to the
io
module -read_from_table()
-write_to_table()
-transfer_table_by_table()
(by Chris Mahoney) View - Add three new functions to the
io
module, which are generic switches between the*_by_path()
and_by_table()
functions -read()
-write()
-transfer()
(by Chris Mahoney) View - Add additional aliases to the functions in the
io
module, to make them more transferrable and more generic -load_from_path()
-save_to_path()
-load_from_table()
-save_to_table()
-load()
-save()
(by Chris Mahoney) View - Fix some typos (by Chris Mahoney) View
- Extend the
checks
module to add theassert_table_exists()
function (by Chris Mahoney) View - Add
assert_table_exists()
to the unit tests in thetest_io
module (by Chris Mahoney) View - Add unit tests for the
*_by_table()
functions from theio
module (by Chris Mahoney) View - Add constant values for valid write modes (by Chris Mahoney) View
- Relabel parameters in the
transfer()
function (by Chris Mahoney) View - Extend the unit tests for the
io
module to now have 100% code coverage (by Chris Mahoney) View - Fix Unit Test error (by Chris Mahoney) View
- Fix some unit tests (by Chris Mahoney) View
- Fix docstrings for the spark
*WRITE_MODES
constants in theio
module (by Chris Mahoney) View - Add additional examples to some of the functions in the
io
module 1.read_from_path()
2.write_to_path()
3.transfer_by_path()
(by Chris Mahoney) View - Add docstrings to the new functions in the
io
module -read_from_table()
-write_to_table()
-transfer_by_table()
-read()
-write()
-transfer()
(by Chris Mahoney) View - Enhance the error messages in the
_validate_table_name()
function (by Chris Mahoney) View - Enhance the initialisation commands (by Chris Mahoney) View
null🔗
🏷️ Release Tag: v1.0.0
📅 Release Date: 2024-12-29
- Add PySpark version check and custom exception if version is insufficient (by Chris Mahoney) View
- Enhance the docs Pages removed: - Roadmap Pages added: - Application - Exceptions - Warnings - Whitespaces (by Chris Mahoney) View
- Fix the order of the shields on the README (by Chris Mahoney) View
- Fix classifiers to now confirm it is stable (by Chris Mahoney) View
- Add process to automatically generate the
CHANGELOG.md
file (by Chris Mahoney) View - Add code complexity checks (by Chris Mahoney) View
- Skip code coverage report for the
PySparkVersionError
exception (by Chris Mahoney) View
null🔗
🏷️ Release Tag: v0.12.0
📅 Release Date: 2024-12-27
- Initial commit of
delta
module (by Chris Mahoney) View - Fix typos (by Chris Mahoney) View
- Improve the docs for the
delta
module (by Chris Mahoney) View - Add the docs for the
delta
module (by Chris Mahoney) View - Add missing docstrings to the
delta
module (by Chris Mahoney) View - Fix typo (by Chris Mahoney) View
- Omit code coverage for the
delta
package (by Chris Mahoney) View
null🔗
🏷️ Release Tag: v0.11.0
📅 Release Date: 2024-12-26
- Initial commit of
schema
module and Unit Tests (by Chris Mahoney) View - Fix typo (by Chris Mahoney) View
- Add docs for
schema
module (by Chris Mahoney) View - Change default line-length to be
95
characters (by Chris Mahoney) View - Enhance the docs for the
schema
module (by Chris Mahoney) View - Make linting and checking better (by Chris Mahoney) View
- Fix typo (by Chris Mahoney) View
- Allow code annotations in the docs (by Chris Mahoney) View
- Fix missing docs (by Chris Mahoney) View
null🔗
🏷️ Release Tag: v0.10.0
📅 Release Date: 2024-12-24
- Initial commit of
duplication
module (by Chris Mahoney) View - Add docs for the
duplication
module (by Chris Mahoney) View - Fix typo (by Chris Mahoney) View
null🔗
🏷️ Release Tag: v0.9.0
📅 Release Date: 2024-12-24
- Add
cleaning
module and Unit Tests (by Chris Mahoney) View - Add docs for the
cleaning
module (by Chris Mahoney) View - Fix typo (by Chris Mahoney) View
- Fix type errors (by Chris Mahoney) View
- Add more styling to docs pages (by Chris Mahoney) View
- Add clearer
Literal
values to theconstants
module (by Chris Mahoney) View - Update the docstrings for the
Column Processes
section of thecleaning
module (by Chris Mahoney) View - Fix typos (by Chris Mahoney) View
- Fix unit test errors (by Chris Mahoney) View
- Add
VALID_PYAPARK_JOIN_TYPES
andALL_PYSPARK_JOIN_TYPES
to theconstants
module (by Chris Mahoney) View - Add the
constants
module to the docs (by Chris Mahoney) View - Fix a few typos (by Chris Mahoney) View
- Enhance the docs for the
cleaning
module (by Chris Mahoney) View - Fix
pylint
errors (by Chris Mahoney) View - Fix the parameters of the
drop_matching_rows()
function, and add additional column assertions (by Chris Mahoney) View - Fix PyTest errors (by Chris Mahoney) View
- Fix docs not rendering correctly (by Chris Mahoney) View
null🔗
🏷️ Release Tag: v0.8.0
📅 Release Date: 2024-12-22
- Add
datetime
module and Unit Tests (by Chris Mahoney) View - Add docs for the
datetime
module (by Chris Mahoney) View - Add new step to the
CI
workflow to ensure that code coverage is sufficient (by Chris Mahoney) View - Fix bug (by Chris Mahoney) View
- Fix missing docs (by Chris Mahoney) View
- Add more info to the docs for the
datetime
module (by Chris Mahoney) View - Refactor type checks in
get_columns()
function to usetuple
syntax for improved readability and performance (by Chris Mahoney) View - Refactor
rename_datetime_columns()
to improve readability and streamline column selection logic (by Chris Mahoney) View - Enhance documentation for
rename_datetime_columns()
andadd_local_datetime_columns()
with additional examples and success/failure conclusions (by Chris Mahoney) View - Update frequency parameter in
pd.date_range()
functions from'H'
to'h'
for consistency and future-proofing (by Chris Mahoney) View - Enhance documentation for
add_local_datetime_column()
andadd_local_datetime_columns()
by adding examples for error handling and invalid inputs (by Chris Mahoney) View - Refactor type checks across all modules to use
is_type()
function from thetoolbox_python
package for improved readability, consistency and reliability (by Chris Mahoney) View - Refactor column handling to use
list()
function for type casting, to ensure consistency and robustness (by Chris Mahoney) View - Fix return type of
is_vaid_spark_type()
to bool and addassert_valid_spark_type()
function for type validation checks (by Chris Mahoney) View - Replace
AttributeError()
withColumnDoesNotExistError()
in column validation functions for improved error handling and make readability clearer (by Chris Mahoney) View - Add custom exceptions for improved error handling in PySpark utilities:
ColumnDoesNotExistError()
,InvalidPySparkDataTypeError()
,InvalidDataFrameNameError()
,ColumnDoesNotExistWarning()
(by Chris Mahoney) View - Refactor the
checks
module to add enhanced checking functionalities for columns being in the correct data type. New objects added: -ColumnsAreTypeResult()
-_columns_are_type()
-column_is_type()
-columns_are_type()
-assert_column_is_type()
-assert_columns_are_type()
(by Chris Mahoney) View - Fix unnecessary PyLint checks (by Chris Mahoney) View
- Fix PyTest errors (by Chris Mahoney) View
- Fix PyTest errors (by Chris Mahoney) View
- Finish unit testing the
checks
module. Now 100% coverage and all tests are passing (by Chris Mahoney) View - Add header (by Chris Mahoney) View
- Finish adding docs to the
datetime
module (by Chris Mahoney) View - Split out the docs for the
checks
module in to separate sections (by Chris Mahoney) View - Fix typo (by Chris Mahoney) View
- Move the
_validate_pyspark_datatype()
function from thetypes
module to thechecks
module (by Chris Mahoney) View - Fix the type checking, particularly for the
datetime
module, and make it more robust (by Chris Mahoney) View - Fix typo (by Chris Mahoney) View
- Complete code coverage for the
datetime
module (by Chris Mahoney) View
null🔗
🏷️ Release Tag: v0.7.0
📅 Release Date: 2024-12-18
- Add
columns
module and Unit Tests (by Chris Mahoney) View - Add docs for the
columns
module (by Chris Mahoney) View - Fix typos (by Chris Mahoney) View
- Add
warn_column_missing()
andwarn_columns_missing()
to thechecks
module (by Chris Mahoney) View - Update all docstrings for the
columns
module (by Chris Mahoney) View - Add
mermaid
diagram functionality to docs (by Chris Mahoney) View - Hide additional PyLint checks (by Chris Mahoney) View
- Fix bug (by Chris Mahoney) View
- Enhance docs fo the
columns
module (by Chris Mahoney) View - Add unit tests for the
warn_column_missing()
andwarn_columns_missing()
functions (by Chris Mahoney) View - Fix bug (by Chris Mahoney) View
- Fix code coverage of the
dimensions
module (by Chris Mahoney) View - Add check to ensure that the
CI
pipeline will always fail if the code coverage is less than 100% (by Chris Mahoney) View - Add check to ensure that the
CI
pipeline will always fail if the code coverage is less than 99% (by Chris Mahoney) View
null🔗
🏷️ Release Tag: v0.6.0
📅 Release Date: 2024-12-17
- Add
dimensions
module and Unit Tests (by Chris Mahoney) View - Add docs for the
dimensions
module (by Chris Mahoney) View - Fix mixed content (by Chris Mahoney) View
- Fix typo (by Chris Mahoney) View
null🔗
🏷️ Release Tag: v0.5.0
📅 Release Date: 2024-12-17
- Add step to the
CD
pipeline to delete the.gitignore
file from the code coverage docs (by Chris Mahoney) View - Initial commit of
scale
module and Unit Tests (by Chris Mahoney) View - Add docs for the
scale
module (by Chris Mahoney) View - Tidy up the docs for the
scale
module (by Chris Mahoney) View
null🔗
🏷️ Release Tag: v0.4.0
📅 Release Date: 2024-12-16
- Add
keys
module (by Chris Mahoney) View - Add Unit Tests for the
keys
module (by Chris Mahoney) View - Add docs for the
keys
module (by Chris Mahoney) View - Fix failing unit tests for the
keys
module (by Chris Mahoney) View - Tidy up docs for the
keys
module (by Chris Mahoney) View - Tidy up docs for the
types
module (by Chris Mahoney) View - Tidy up docs for the
checks
module (by Chris Mahoney) View - Fix missing code coverage report (by Chris Mahoney) View
- Fix bug with
CI
parallel processing config (by Chris Mahoney) View - Add a step to the
CD
workflow to check that the package can be successfully installed after deployment (by Chris Mahoney) View - Add a more robust way to install Jana on the different
os
distributions duringCI
andCD
workflows (by Chris Mahoney) View - Fix bug (by Chris Mahoney) View
- Fix
pip
commands inCI
andCD
workflows to be more generic acrossos
platforms (by Chris Mahoney) View - Fix bug (by Chris Mahoney) View
- Add debugging for
.venv
directory setups (by Chris Mahoney) View - Fix missing
poetry
install config (by Chris Mahoney) View - Fix
HADOOP_HOME
config in Unit Tests setup (by Chris Mahoney) View - Add more robust
.venv
dir checks inCI
workflow (by Chris Mahoney) View - Fix bug (by Chris Mahoney) View
- Fix bug with Unit Tests setup for different
os
's (by Chris Mahoney) View - Change
CI
checks to only run onubuntu
andmacos
(by Chris Mahoney) View
null🔗
🏷️ Release Tag: v0.3.1
📅 Release Date: 2024-12-13
- Add
types
page (by Chris Mahoney) View - Fix styling (by Chris Mahoney) View
- Fix install scripts (by Chris Mahoney) View
- Fix docstrings for
types
modules (by Chris Mahoney) View - Fix PyTest errors (by Chris Mahoney) View
null🔗
🏷️ Release Tag: v0.3.0
📅 Release Date: 2024-12-13
- Add convenienve functions for the parameterised Unit Tests (by Chris Mahoney) View
- Add unit tests for the
types
module (by Chris Mahoney) View - Fix code coverage for the
types
module (by Chris Mahoney) View - Fix code coverage for
types
(by Chris Mahoney) View
null🔗
🏷️ Release Tag: v0.2.0
📅 Release Date: 2024-12-13
- Initial commit of all required files for the
checks
module (by Chris Mahoney) View - Enhance the internal
_columns_exists()
check to return a more logical and understandable output (by Chris Mahoney) View - Fix typos (by Chris Mahoney) View
- Refresh all docstrings for all functions in the
checks
module (by Chris Mahoney) View - Add the
checks
module to the docs site (by Chris Mahoney) View - Update
CD
workflow (by Chris Mahoney) View - Fix aesthetics on the
Modules
page (by Chris Mahoney) View - Add new
Roadmap
page (by Chris Mahoney) View - Fix
pylint
check (by Chris Mahoney) View
null🔗
🏷️ Release Tag: v0.1.0
📅 Release Date: 2024-12-11
- Turn on
mkdocs
check again duringpre-commit
processes (by Chris Mahoney) View - Add better docs and examples for the
io
module (by Chris Mahoney) View - Fix
pylint
warnings (by Chris Mahoney) View