Skip to content

Changelog🔗

null🔗

🏷️ Release Tag: v1.2.1
📅 Release Date: 2025-01-26

  • Fix issue with null headers in the CHANGELOG.md file. --> Root cause: it was because GitHub was rate-limiting unauthenticated API calls. --> Resolution: Add authentication to the curl command (by Chris Mahoney) View
  • Fix a few logical bugs in the docs for the info and the formatting modules (by Chris Mahoney) View
  • For all the info about how to contribute, pull it out of the README.md file and place it in to the CONTRIBUTING.md file. Also create a new Contributing page in the docs, add instructions for installation using uv, and add more questions to the FAQ. (by Chris Mahoney) View

null🔗

🏷️ Release Tag: v1.2.0
📅 Release Date: 2025-01-25

  • Add the column_contains_value() function to the checks module (by Chris Mahoney) View
  • Add missing docstrings for functions in the checks module - assert_valid_spark_type() -warn_column_invalid_type()-warn_columns_invalid_type()` (by Chris Mahoney) View
  • For all docs in the checks module, ensure they they have the See Also section, which appropriately references other functions (by Chris Mahoney) View
  • Fix typos in the cleaning module (by Chris Mahoney) View
  • Fix up the assert_table_exists() function to ensure it throws the correct error, and that it is approprately documented (by Chris Mahoney) View
  • Add unit tests for the column_contains_value() function (by Chris Mahoney) View
  • Fix missing docs references (by Chris Mahoney) View
  • Fix logical flaw in the function column_contains_value() (by Chris Mahoney) View
  • Fix missing unit tests (by Chris Mahoney) View
  • Fix bug in the unit tests for the checks module (by Chris Mahoney) View
  • Fix ref errors in docs (by Chris Mahoney) View
  • Add new functions to the dimensions module - make_dimension_table() - replace_columns_with_dimension_id() (by Chris Mahoney) View
  • Add two new modules formatting and info Functions: - format_numbers() - display_intermediary_table() - display_intermediary_schema() - display_intermediary_columns() - get_distinct_values() (by Chris Mahoney) View
  • Speed up the execution of the Unit Tests by changing the setUpClass() and tearDownClass() structure to setUpModule() and tearDownModule() (by Chris Mahoney) View
  • Add docstrings to the make_dimension_table() and replace_columns_with_dimension_id() functions (by Chris Mahoney) View
  • Fix up a few logical bugs in the dimensions module (by Chris Mahoney) View
  • Add new unit tests to now have 100% coverage of the dimensions module (by Chris Mahoney) View
  • Fix the structure of the tables used in the Unit Tests (by Chris Mahoney) View
  • Fix missing TestCase imports in all Unit Tests (by Chris Mahoney) View
  • Enhance the Unit Tests setup and functionality 1. Change all @property to @cached_property 2. Fix all column ID's to start at index 1, not 0 3. For any DataFrames including columns of type datetime, ensure they are cast to the proper type 4. All unit test classes are sub-classed from both the PySparkSetup and TestCase classes (by Chris Mahoney) View
  • Move the get_column_values() function from the cleaning module to the info module (by Chris Mahoney) View
  • Fix docstring for the get_column_values() function (by Chris Mahoney) View
  • Rename get_column_values() to extract_column_values() (by Chris Mahoney) View
  • Add docstrings to the get_distinct_values() function (by Chris Mahoney) View
  • Finalise docs for the info module (by Chris Mahoney) View
  • Finish adding unit tests for the info module (by Chris Mahoney) View
  • Add all docs for the formatting module (by Chris Mahoney) View
  • Add all unit tests for the formatting module (by Chris Mahoney) View

null🔗

🏷️ Release Tag: v1.1.0
📅 Release Date: 2025-01-19

  • Enhance initialisation scripts (by Chris Mahoney) View
  • Refresh the structure of the pyproject.toml file (by Chris Mahoney) View
  • Relabel some key elements in the io package 1. transfer_table() -> transfer_table_by_path() 2. Change parameter in the write_to_path() function: table -> data_frame (by Chris Mahoney) View
  • Fix the data_format parameter for the functions in the io module so they can use the pre-defined SPARK_FORMATS Literal values (by Chris Mahoney) View
  • Add better headers and comments throughout the io module (by Chris Mahoney) View
  • Fix up the default options in both the read_from_path() and write_to_path() functions (by Chris Mahoney) View
  • Add three new table-spcific functions to the io module - read_from_table() - write_to_table() - transfer_table_by_table() (by Chris Mahoney) View
  • Add three new functions to the io module, which are generic switches between the *_by_path() and _by_table() functions - read() - write() - transfer() (by Chris Mahoney) View
  • Add additional aliases to the functions in the io module, to make them more transferrable and more generic - load_from_path() - save_to_path() - load_from_table() - save_to_table() - load() - save() (by Chris Mahoney) View
  • Fix some typos (by Chris Mahoney) View
  • Extend the checks module to add the assert_table_exists() function (by Chris Mahoney) View
  • Add assert_table_exists() to the unit tests in the test_io module (by Chris Mahoney) View
  • Add unit tests for the *_by_table() functions from the io module (by Chris Mahoney) View
  • Add constant values for valid write modes (by Chris Mahoney) View
  • Relabel parameters in the transfer() function (by Chris Mahoney) View
  • Extend the unit tests for the io module to now have 100% code coverage (by Chris Mahoney) View
  • Fix Unit Test error (by Chris Mahoney) View
  • Fix some unit tests (by Chris Mahoney) View
  • Fix docstrings for the spark *WRITE_MODES constants in the io module (by Chris Mahoney) View
  • Add additional examples to some of the functions in the io module 1. read_from_path() 2. write_to_path() 3. transfer_by_path() (by Chris Mahoney) View
  • Add docstrings to the new functions in the io module - read_from_table() - write_to_table() - transfer_by_table() - read() - write() - transfer() (by Chris Mahoney) View
  • Enhance the error messages in the _validate_table_name() function (by Chris Mahoney) View
  • Enhance the initialisation commands (by Chris Mahoney) View

null🔗

🏷️ Release Tag: v1.0.0
📅 Release Date: 2024-12-29

null🔗

🏷️ Release Tag: v0.12.0
📅 Release Date: 2024-12-27

null🔗

🏷️ Release Tag: v0.11.0
📅 Release Date: 2024-12-26

null🔗

🏷️ Release Tag: v0.10.0
📅 Release Date: 2024-12-24

null🔗

🏷️ Release Tag: v0.9.0
📅 Release Date: 2024-12-24

null🔗

🏷️ Release Tag: v0.8.0
📅 Release Date: 2024-12-22

  • Add datetime module and Unit Tests (by Chris Mahoney) View
  • Add docs for the datetime module (by Chris Mahoney) View
  • Add new step to the CI workflow to ensure that code coverage is sufficient (by Chris Mahoney) View
  • Fix bug (by Chris Mahoney) View
  • Fix missing docs (by Chris Mahoney) View
  • Add more info to the docs for the datetime module (by Chris Mahoney) View
  • Refactor type checks in get_columns() function to use tuple syntax for improved readability and performance (by Chris Mahoney) View
  • Refactor rename_datetime_columns() to improve readability and streamline column selection logic (by Chris Mahoney) View
  • Enhance documentation for rename_datetime_columns() and add_local_datetime_columns() with additional examples and success/failure conclusions (by Chris Mahoney) View
  • Update frequency parameter in pd.date_range() functions from 'H' to 'h' for consistency and future-proofing (by Chris Mahoney) View
  • Enhance documentation for add_local_datetime_column() and add_local_datetime_columns() by adding examples for error handling and invalid inputs (by Chris Mahoney) View
  • Refactor type checks across all modules to use is_type() function from the toolbox_python package for improved readability, consistency and reliability (by Chris Mahoney) View
  • Refactor column handling to use list() function for type casting, to ensure consistency and robustness (by Chris Mahoney) View
  • Fix return type of is_vaid_spark_type() to bool and add assert_valid_spark_type() function for type validation checks (by Chris Mahoney) View
  • Replace AttributeError() with ColumnDoesNotExistError() in column validation functions for improved error handling and make readability clearer (by Chris Mahoney) View
  • Add custom exceptions for improved error handling in PySpark utilities: ColumnDoesNotExistError(), InvalidPySparkDataTypeError(), InvalidDataFrameNameError(), ColumnDoesNotExistWarning() (by Chris Mahoney) View
  • Refactor the checks module to add enhanced checking functionalities for columns being in the correct data type. New objects added: - ColumnsAreTypeResult() - _columns_are_type() - column_is_type() - columns_are_type() - assert_column_is_type() - assert_columns_are_type() (by Chris Mahoney) View
  • Fix unnecessary PyLint checks (by Chris Mahoney) View
  • Fix PyTest errors (by Chris Mahoney) View
  • Fix PyTest errors (by Chris Mahoney) View
  • Finish unit testing the checks module. Now 100% coverage and all tests are passing (by Chris Mahoney) View
  • Add header (by Chris Mahoney) View
  • Finish adding docs to the datetime module (by Chris Mahoney) View
  • Split out the docs for the checks module in to separate sections (by Chris Mahoney) View
  • Fix typo (by Chris Mahoney) View
  • Move the _validate_pyspark_datatype() function from the types module to the checks module (by Chris Mahoney) View
  • Fix the type checking, particularly for the datetime module, and make it more robust (by Chris Mahoney) View
  • Fix typo (by Chris Mahoney) View
  • Complete code coverage for the datetime module (by Chris Mahoney) View

null🔗

🏷️ Release Tag: v0.7.0
📅 Release Date: 2024-12-18

null🔗

🏷️ Release Tag: v0.6.0
📅 Release Date: 2024-12-17

null🔗

🏷️ Release Tag: v0.5.0
📅 Release Date: 2024-12-17

null🔗

🏷️ Release Tag: v0.4.0
📅 Release Date: 2024-12-16

null🔗

🏷️ Release Tag: v0.3.1
📅 Release Date: 2024-12-13

null🔗

🏷️ Release Tag: v0.3.0
📅 Release Date: 2024-12-13

null🔗

🏷️ Release Tag: v0.2.0
📅 Release Date: 2024-12-13

null🔗

🏷️ Release Tag: v0.1.0
📅 Release Date: 2024-12-11