Changelog🔗
null🔗
🏷️ Release Tag: v1.2.1
📅 Release Date: 2025-01-26
- Fix issue with
nullheaders in theCHANGELOG.mdfile. --> Root cause: it was because GitHub was rate-limiting unauthenticated API calls. --> Resolution: Add authentication to thecurlcommand (by Chris Mahoney) View - Fix a few logical bugs in the docs for the
infoand theformattingmodules (by Chris Mahoney) View - For all the info about how to contribute, pull it out of the
README.mdfile and place it in to theCONTRIBUTING.mdfile. Also create a newContributingpage in the docs, add instructions for installation usinguv, and add more questions to the FAQ. (by Chris Mahoney) View
null🔗
🏷️ Release Tag: v1.2.0
📅 Release Date: 2025-01-25
- Add the
column_contains_value()function to thechecksmodule (by Chris Mahoney) View - Add missing docstrings for functions in the
checksmodule -assert_valid_spark_type() -warn_column_invalid_type()-warn_columns_invalid_type()` (by Chris Mahoney) View - For all docs in the
checksmodule, ensure they they have theSee Alsosection, which appropriately references other functions (by Chris Mahoney) View - Fix typos in the
cleaningmodule (by Chris Mahoney) View - Fix up the
assert_table_exists()function to ensure it throws the correct error, and that it is approprately documented (by Chris Mahoney) View - Add unit tests for the
column_contains_value()function (by Chris Mahoney) View - Fix missing docs references (by Chris Mahoney) View
- Fix logical flaw in the function
column_contains_value()(by Chris Mahoney) View - Fix missing unit tests (by Chris Mahoney) View
- Fix bug in the unit tests for the
checksmodule (by Chris Mahoney) View - Fix ref errors in docs (by Chris Mahoney) View
- Add new functions to the
dimensionsmodule -make_dimension_table()-replace_columns_with_dimension_id()(by Chris Mahoney) View - Add two new modules
formattingandinfoFunctions: -format_numbers()-display_intermediary_table()-display_intermediary_schema()-display_intermediary_columns()-get_distinct_values()(by Chris Mahoney) View - Speed up the execution of the Unit Tests by changing the
setUpClass()andtearDownClass()structure tosetUpModule()andtearDownModule()(by Chris Mahoney) View - Add docstrings to the
make_dimension_table()andreplace_columns_with_dimension_id()functions (by Chris Mahoney) View - Fix up a few logical bugs in the
dimensionsmodule (by Chris Mahoney) View - Add new unit tests to now have 100% coverage of the
dimensionsmodule (by Chris Mahoney) View - Fix the structure of the tables used in the Unit Tests (by Chris Mahoney) View
- Fix missing
TestCaseimports in all Unit Tests (by Chris Mahoney) View - Enhance the Unit Tests setup and functionality 1. Change all
@propertyto@cached_property2. Fix all column ID's to start at index1, not03. For any DataFrames including columns of typedatetime, ensure they are cast to the proper type 4. All unit test classes are sub-classed from both thePySparkSetupandTestCaseclasses (by Chris Mahoney) View - Move the
get_column_values()function from thecleaningmodule to theinfomodule (by Chris Mahoney) View - Fix docstring for the
get_column_values()function (by Chris Mahoney) View - Rename
get_column_values()toextract_column_values()(by Chris Mahoney) View - Add docstrings to the
get_distinct_values()function (by Chris Mahoney) View - Finalise docs for the
infomodule (by Chris Mahoney) View - Finish adding unit tests for the
infomodule (by Chris Mahoney) View - Add all docs for the
formattingmodule (by Chris Mahoney) View - Add all unit tests for the
formattingmodule (by Chris Mahoney) View
null🔗
🏷️ Release Tag: v1.1.0
📅 Release Date: 2025-01-19
- Enhance initialisation scripts (by Chris Mahoney) View
- Refresh the structure of the
pyproject.tomlfile (by Chris Mahoney) View - Relabel some key elements in the
iopackage 1.transfer_table()->transfer_table_by_path()2. Change parameter in thewrite_to_path()function:table->data_frame(by Chris Mahoney) View - Fix the
data_formatparameter for the functions in theiomodule so they can use the pre-definedSPARK_FORMATSLiteral values (by Chris Mahoney) View - Add better headers and comments throughout the
iomodule (by Chris Mahoney) View - Fix up the default options in both the
read_from_path()andwrite_to_path()functions (by Chris Mahoney) View - Add three new table-spcific functions to the
iomodule -read_from_table()-write_to_table()-transfer_table_by_table()(by Chris Mahoney) View - Add three new functions to the
iomodule, which are generic switches between the*_by_path()and_by_table()functions -read()-write()-transfer()(by Chris Mahoney) View - Add additional aliases to the functions in the
iomodule, to make them more transferrable and more generic -load_from_path()-save_to_path()-load_from_table()-save_to_table()-load()-save()(by Chris Mahoney) View - Fix some typos (by Chris Mahoney) View
- Extend the
checksmodule to add theassert_table_exists()function (by Chris Mahoney) View - Add
assert_table_exists()to the unit tests in thetest_iomodule (by Chris Mahoney) View - Add unit tests for the
*_by_table()functions from theiomodule (by Chris Mahoney) View - Add constant values for valid write modes (by Chris Mahoney) View
- Relabel parameters in the
transfer()function (by Chris Mahoney) View - Extend the unit tests for the
iomodule to now have 100% code coverage (by Chris Mahoney) View - Fix Unit Test error (by Chris Mahoney) View
- Fix some unit tests (by Chris Mahoney) View
- Fix docstrings for the spark
*WRITE_MODESconstants in theiomodule (by Chris Mahoney) View - Add additional examples to some of the functions in the
iomodule 1.read_from_path()2.write_to_path()3.transfer_by_path()(by Chris Mahoney) View - Add docstrings to the new functions in the
iomodule -read_from_table()-write_to_table()-transfer_by_table()-read()-write()-transfer()(by Chris Mahoney) View - Enhance the error messages in the
_validate_table_name()function (by Chris Mahoney) View - Enhance the initialisation commands (by Chris Mahoney) View
null🔗
🏷️ Release Tag: v1.0.0
📅 Release Date: 2024-12-29
- Add PySpark version check and custom exception if version is insufficient (by Chris Mahoney) View
- Enhance the docs Pages removed: - Roadmap Pages added: - Application - Exceptions - Warnings - Whitespaces (by Chris Mahoney) View
- Fix the order of the shields on the README (by Chris Mahoney) View
- Fix classifiers to now confirm it is stable (by Chris Mahoney) View
- Add process to automatically generate the
CHANGELOG.mdfile (by Chris Mahoney) View - Add code complexity checks (by Chris Mahoney) View
- Skip code coverage report for the
PySparkVersionErrorexception (by Chris Mahoney) View
null🔗
🏷️ Release Tag: v0.12.0
📅 Release Date: 2024-12-27
- Initial commit of
deltamodule (by Chris Mahoney) View - Fix typos (by Chris Mahoney) View
- Improve the docs for the
deltamodule (by Chris Mahoney) View - Add the docs for the
deltamodule (by Chris Mahoney) View - Add missing docstrings to the
deltamodule (by Chris Mahoney) View - Fix typo (by Chris Mahoney) View
- Omit code coverage for the
deltapackage (by Chris Mahoney) View
null🔗
🏷️ Release Tag: v0.11.0
📅 Release Date: 2024-12-26
- Initial commit of
schemamodule and Unit Tests (by Chris Mahoney) View - Fix typo (by Chris Mahoney) View
- Add docs for
schemamodule (by Chris Mahoney) View - Change default line-length to be
95characters (by Chris Mahoney) View - Enhance the docs for the
schemamodule (by Chris Mahoney) View - Make linting and checking better (by Chris Mahoney) View
- Fix typo (by Chris Mahoney) View
- Allow code annotations in the docs (by Chris Mahoney) View
- Fix missing docs (by Chris Mahoney) View
null🔗
🏷️ Release Tag: v0.10.0
📅 Release Date: 2024-12-24
- Initial commit of
duplicationmodule (by Chris Mahoney) View - Add docs for the
duplicationmodule (by Chris Mahoney) View - Fix typo (by Chris Mahoney) View
null🔗
🏷️ Release Tag: v0.9.0
📅 Release Date: 2024-12-24
- Add
cleaningmodule and Unit Tests (by Chris Mahoney) View - Add docs for the
cleaningmodule (by Chris Mahoney) View - Fix typo (by Chris Mahoney) View
- Fix type errors (by Chris Mahoney) View
- Add more styling to docs pages (by Chris Mahoney) View
- Add clearer
Literalvalues to theconstantsmodule (by Chris Mahoney) View - Update the docstrings for the
Column Processessection of thecleaningmodule (by Chris Mahoney) View - Fix typos (by Chris Mahoney) View
- Fix unit test errors (by Chris Mahoney) View
- Add
VALID_PYAPARK_JOIN_TYPESandALL_PYSPARK_JOIN_TYPESto theconstantsmodule (by Chris Mahoney) View - Add the
constantsmodule to the docs (by Chris Mahoney) View - Fix a few typos (by Chris Mahoney) View
- Enhance the docs for the
cleaningmodule (by Chris Mahoney) View - Fix
pylinterrors (by Chris Mahoney) View - Fix the parameters of the
drop_matching_rows()function, and add additional column assertions (by Chris Mahoney) View - Fix PyTest errors (by Chris Mahoney) View
- Fix docs not rendering correctly (by Chris Mahoney) View
null🔗
🏷️ Release Tag: v0.8.0
📅 Release Date: 2024-12-22
- Add
datetimemodule and Unit Tests (by Chris Mahoney) View - Add docs for the
datetimemodule (by Chris Mahoney) View - Add new step to the
CIworkflow to ensure that code coverage is sufficient (by Chris Mahoney) View - Fix bug (by Chris Mahoney) View
- Fix missing docs (by Chris Mahoney) View
- Add more info to the docs for the
datetimemodule (by Chris Mahoney) View - Refactor type checks in
get_columns()function to usetuplesyntax for improved readability and performance (by Chris Mahoney) View - Refactor
rename_datetime_columns()to improve readability and streamline column selection logic (by Chris Mahoney) View - Enhance documentation for
rename_datetime_columns()andadd_local_datetime_columns()with additional examples and success/failure conclusions (by Chris Mahoney) View - Update frequency parameter in
pd.date_range()functions from'H'to'h'for consistency and future-proofing (by Chris Mahoney) View - Enhance documentation for
add_local_datetime_column()andadd_local_datetime_columns()by adding examples for error handling and invalid inputs (by Chris Mahoney) View - Refactor type checks across all modules to use
is_type()function from thetoolbox_pythonpackage for improved readability, consistency and reliability (by Chris Mahoney) View - Refactor column handling to use
list()function for type casting, to ensure consistency and robustness (by Chris Mahoney) View - Fix return type of
is_vaid_spark_type()to bool and addassert_valid_spark_type()function for type validation checks (by Chris Mahoney) View - Replace
AttributeError()withColumnDoesNotExistError()in column validation functions for improved error handling and make readability clearer (by Chris Mahoney) View - Add custom exceptions for improved error handling in PySpark utilities:
ColumnDoesNotExistError(),InvalidPySparkDataTypeError(),InvalidDataFrameNameError(),ColumnDoesNotExistWarning()(by Chris Mahoney) View - Refactor the
checksmodule to add enhanced checking functionalities for columns being in the correct data type. New objects added: -ColumnsAreTypeResult()-_columns_are_type()-column_is_type()-columns_are_type()-assert_column_is_type()-assert_columns_are_type()(by Chris Mahoney) View - Fix unnecessary PyLint checks (by Chris Mahoney) View
- Fix PyTest errors (by Chris Mahoney) View
- Fix PyTest errors (by Chris Mahoney) View
- Finish unit testing the
checksmodule. Now 100% coverage and all tests are passing (by Chris Mahoney) View - Add header (by Chris Mahoney) View
- Finish adding docs to the
datetimemodule (by Chris Mahoney) View - Split out the docs for the
checksmodule in to separate sections (by Chris Mahoney) View - Fix typo (by Chris Mahoney) View
- Move the
_validate_pyspark_datatype()function from thetypesmodule to thechecksmodule (by Chris Mahoney) View - Fix the type checking, particularly for the
datetimemodule, and make it more robust (by Chris Mahoney) View - Fix typo (by Chris Mahoney) View
- Complete code coverage for the
datetimemodule (by Chris Mahoney) View
null🔗
🏷️ Release Tag: v0.7.0
📅 Release Date: 2024-12-18
- Add
columnsmodule and Unit Tests (by Chris Mahoney) View - Add docs for the
columnsmodule (by Chris Mahoney) View - Fix typos (by Chris Mahoney) View
- Add
warn_column_missing()andwarn_columns_missing()to thechecksmodule (by Chris Mahoney) View - Update all docstrings for the
columnsmodule (by Chris Mahoney) View - Add
mermaiddiagram functionality to docs (by Chris Mahoney) View - Hide additional PyLint checks (by Chris Mahoney) View
- Fix bug (by Chris Mahoney) View
- Enhance docs fo the
columnsmodule (by Chris Mahoney) View - Add unit tests for the
warn_column_missing()andwarn_columns_missing()functions (by Chris Mahoney) View - Fix bug (by Chris Mahoney) View
- Fix code coverage of the
dimensionsmodule (by Chris Mahoney) View - Add check to ensure that the
CIpipeline will always fail if the code coverage is less than 100% (by Chris Mahoney) View - Add check to ensure that the
CIpipeline will always fail if the code coverage is less than 99% (by Chris Mahoney) View
null🔗
🏷️ Release Tag: v0.6.0
📅 Release Date: 2024-12-17
- Add
dimensionsmodule and Unit Tests (by Chris Mahoney) View - Add docs for the
dimensionsmodule (by Chris Mahoney) View - Fix mixed content (by Chris Mahoney) View
- Fix typo (by Chris Mahoney) View
null🔗
🏷️ Release Tag: v0.5.0
📅 Release Date: 2024-12-17
- Add step to the
CDpipeline to delete the.gitignorefile from the code coverage docs (by Chris Mahoney) View - Initial commit of
scalemodule and Unit Tests (by Chris Mahoney) View - Add docs for the
scalemodule (by Chris Mahoney) View - Tidy up the docs for the
scalemodule (by Chris Mahoney) View
null🔗
🏷️ Release Tag: v0.4.0
📅 Release Date: 2024-12-16
- Add
keysmodule (by Chris Mahoney) View - Add Unit Tests for the
keysmodule (by Chris Mahoney) View - Add docs for the
keysmodule (by Chris Mahoney) View - Fix failing unit tests for the
keysmodule (by Chris Mahoney) View - Tidy up docs for the
keysmodule (by Chris Mahoney) View - Tidy up docs for the
typesmodule (by Chris Mahoney) View - Tidy up docs for the
checksmodule (by Chris Mahoney) View - Fix missing code coverage report (by Chris Mahoney) View
- Fix bug with
CIparallel processing config (by Chris Mahoney) View - Add a step to the
CDworkflow to check that the package can be successfully installed after deployment (by Chris Mahoney) View - Add a more robust way to install Jana on the different
osdistributions duringCIandCDworkflows (by Chris Mahoney) View - Fix bug (by Chris Mahoney) View
- Fix
pipcommands inCIandCDworkflows to be more generic acrossosplatforms (by Chris Mahoney) View - Fix bug (by Chris Mahoney) View
- Add debugging for
.venvdirectory setups (by Chris Mahoney) View - Fix missing
poetryinstall config (by Chris Mahoney) View - Fix
HADOOP_HOMEconfig in Unit Tests setup (by Chris Mahoney) View - Add more robust
.venvdir checks inCIworkflow (by Chris Mahoney) View - Fix bug (by Chris Mahoney) View
- Fix bug with Unit Tests setup for different
os's (by Chris Mahoney) View - Change
CIchecks to only run onubuntuandmacos(by Chris Mahoney) View
null🔗
🏷️ Release Tag: v0.3.1
📅 Release Date: 2024-12-13
- Add
typespage (by Chris Mahoney) View - Fix styling (by Chris Mahoney) View
- Fix install scripts (by Chris Mahoney) View
- Fix docstrings for
typesmodules (by Chris Mahoney) View - Fix PyTest errors (by Chris Mahoney) View
null🔗
🏷️ Release Tag: v0.3.0
📅 Release Date: 2024-12-13
- Add convenienve functions for the parameterised Unit Tests (by Chris Mahoney) View
- Add unit tests for the
typesmodule (by Chris Mahoney) View - Fix code coverage for the
typesmodule (by Chris Mahoney) View - Fix code coverage for
types(by Chris Mahoney) View
null🔗
🏷️ Release Tag: v0.2.0
📅 Release Date: 2024-12-13
- Initial commit of all required files for the
checksmodule (by Chris Mahoney) View - Enhance the internal
_columns_exists()check to return a more logical and understandable output (by Chris Mahoney) View - Fix typos (by Chris Mahoney) View
- Refresh all docstrings for all functions in the
checksmodule (by Chris Mahoney) View - Add the
checksmodule to the docs site (by Chris Mahoney) View - Update
CDworkflow (by Chris Mahoney) View - Fix aesthetics on the
Modulespage (by Chris Mahoney) View - Add new
Roadmappage (by Chris Mahoney) View - Fix
pylintcheck (by Chris Mahoney) View
null🔗
🏷️ Release Tag: v0.1.0
📅 Release Date: 2024-12-11
- Turn on
mkdocscheck again duringpre-commitprocesses (by Chris Mahoney) View - Add better docs and examples for the
iomodule (by Chris Mahoney) View - Fix
pylintwarnings (by Chris Mahoney) View