Test the `regularity` of a given Time-Series Dataset🔗

Introduction🔗

Summary

The more regular and repeatable patterns a time series has, the easier it is to forecast.

The 'Approximate Entropy' algorithm can be used to quantify the regularity and unpredictability of fluctuations in a time series.

The higher the approximate entropy, the more difficult it is to forecast it.

Another better alternate is the 'Sample Entropy'.

Sample Entropy is similar to approximate entropy but is more consistent in estimating the complexity even for smaller time series.

For example, a random time series with fewer data points can have a lower 'approximate entropy' than a more 'regular' time series, whereas, a longer random time series will have a higher 'approximate entropy'.

For more info, see: Time Series Analysis in Python: A Comprehensive Guide with Examples.

Info

To state that the data is 'regular' is to say that the data points are evenly spaced, regularly collected, and not missing data points (ie. do not contain excessive NA values). Logically, it is not always necessary to conduct the Test for Regularity on automatically collected data (like for example with Energy Prices, or Daily Temperature), however if this data was collected manually then it is highly recommended. If the data does not meet the requirements of Regularity, then it is necessary to return to the data collection plan, and revise the methodology used.

library	category	algorithm	short	import script	url
antropy	Regularity	Approximate Entropy	AppEn	`from antropy import app_entropy`	https://raphaelvallat.com/antropy/build/html/generated/antropy.app_entropy.html
antropy	Regularity	Sample Entropy	SampEn	`from antropy import sample_entropy`	https://raphaelvallat.com/antropy/build/html/generated/antropy.sample_entropy.html
antropy	Regularity	Permutation Entropy	PermEn	`from antropy import perm_entropy`	https://raphaelvallat.com/antropy/build/html/generated/antropy.perm_entropy.html
antropy	Regularity	Spectral Entropy	SpecEn	`from antropy import spectral_entropy`	https://raphaelvallat.com/antropy/build/html/generated/antropy.spectral_entropy.html
antropy	Regularity	SVD Entropy	SvdEn	`from antropy import svd_entropy`	https://raphaelvallat.com/antropy/build/html/generated/antropy.svd_entropy.html

For more info, see: The Future of Australian Energy Prices: Time-Series Analysis of Historic Prices and Forecast for Future Prices.

Source Library

The AntroPy package was chosen because it provides well-tested and efficient implementations of approximate entropy, sample entropy, and related complexity measures for time-series data, is built on top of the scientific Python stack (NumPy/SciPy), and is actively maintained and open source, making it a reliable choice for reproducible statistical analysis.

Source Module

All of the source code can be found within the modules:

ts_stat_tests.regularity.algorithms.
ts_stat_tests.regularity.tests.

Modules🔗

ts_stat_tests.regularity.tests 🔗

Summary

This module contains convenience functions and tests for regularity measures, allowing for easy access to different entropy algorithms.

entropy 🔗

entropy(
    x: ArrayLike,
    algorithm: str = "sample",
    order: int = 2,
    metric: VALID_KDTREE_METRIC_OPTIONS = "chebyshev",
    sf: float = 1,
    normalize: bool = True,
) -> Union[float, NDArray[np.float64]]

Summary

Test for the entropy of a given data set.

Details

This function is a convenience wrapper around the five underlying algorithms:
- approx_entropy()
- sample_entropy()
- spectral_entropy()
- permutation_entropy()
- svd_entropy()

Parameters:

Name	Type	Description	Default
`x`	`ArrayLike`	The data to be checked. Should be a `1-D` or `N-D` data array.	required
`algorithm`	`str`	Which entropy algorithm to use. - `sample_entropy()`: `["sample", "sampl", "samp"]` - `approx_entropy()`: `["app", "approx"]` - `spectral_entropy()`: `["spec", "spect", "spectral"]` - `permutation_entropy()`: `["perm", "permutation"]` - `svd_entropy()`: `["svd", "svd_entropy"]` Defaults to `"sample"`.	`'sample'`
`order`	`int`	Embedding dimension. Only relevant when `algorithm=sample` or `algorithm=approx`. Defaults to `2`.	`2`
`metric`	`VALID_KDTREE_METRIC_OPTIONS`	Name of the distance metric function used with `sklearn.neighbors.KDTree`. Default is to use the Chebyshev distance. Only relevant when `algorithm=sample` or `algorithm=approx`. Defaults to `"chebyshev"`.	`'chebyshev'`
`sf`	`float`	Sampling frequency, in Hz. Only relevant when `algorithm=spectral`. Defaults to `1`.	`1`
`normalize`	`bool`	If `True`, divide by \(log2(psd.size)\) to normalize the spectral entropy to be between \(0\) and \(1\). Otherwise, return the spectral entropy in bit. Only relevant when `algorithm=spectral`. Defaults to `True`.	`True`

Raises:

Type	Description
`ValueError`	When the given value for `algorithm` is not valid.

Returns:

Type	Description
`Union[float, NDArray[float64]]`	The calculated entropy value.

Credit

All credit goes to the AntroPy library.

Examples

Setup
>>> from ts_stat_tests.regularity.tests import entropy
>>> from ts_stat_tests.utils.data import data_normal
>>> normal = data_normal

Example 1: Sample Entropy
>>> print(entropy(x=normal, algorithm="sample"))
2.2374...

Example 2: Approx Entropy
>>> print(entropy(x=normal, algorithm="approx"))
1.6643...

Example 3: Spectral Entropy
>>> print(entropy(x=normal, algorithm="spectral", sf=1))
0.9329...

References

Richman, J. S. et al. (2000). Physiological time-series analysis using approximate entropy and sample entropy. American Journal of Physiology-Heart and Circulatory Physiology, 278(6), H2039-H2049.
https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.DistanceMetric.html
Inouye, T. et al. (1991). Quantification of EEG irregularity by use of the entropy of the power spectrum. Electroencephalography and clinical neurophysiology, 79(3), 204-210.
https://en.wikipedia.org/wiki/Spectral_density
https://en.wikipedia.org/wiki/Welch%27s_method

See Also

regularity()
approx_entropy()
sample_entropy()
spectral_entropy()
permutation_entropy()
svd_entropy()

Source code in src/ts_stat_tests/regularity/tests.py

@typechecked
def entropy(
    x: ArrayLike,
    algorithm: str = "sample",
    order: int = 2,
    metric: VALID_KDTREE_METRIC_OPTIONS = "chebyshev",
    sf: float = 1,
    normalize: bool = True,
) -> Union[float, NDArray[np.float64]]:
    """
    !!! note "Summary"
        Test for the entropy of a given data set.

    ???+ abstract "Details"
        This function is a convenience wrapper around the five underlying algorithms:<br>
        - [`approx_entropy()`][ts_stat_tests.regularity.algorithms.approx_entropy]<br>
        - [`sample_entropy()`][ts_stat_tests.regularity.algorithms.sample_entropy]<br>
        - [`spectral_entropy()`][ts_stat_tests.regularity.algorithms.spectral_entropy]<br>
        - [`permutation_entropy()`][ts_stat_tests.regularity.algorithms.permutation_entropy]<br>
        - [`svd_entropy()`][ts_stat_tests.regularity.algorithms.svd_entropy]

    Params:
        x (ArrayLike):
            The data to be checked. Should be a `1-D` or `N-D` data array.
        algorithm (str, optional):
            Which entropy algorithm to use.<br>
            - `sample_entropy()`: `["sample", "sampl", "samp"]`<br>
            - `approx_entropy()`: `["app", "approx"]`<br>
            - `spectral_entropy()`: `["spec", "spect", "spectral"]`<br>
            - `permutation_entropy()`: `["perm", "permutation"]`<br>
            - `svd_entropy()`: `["svd", "svd_entropy"]`<br>
            Defaults to `"sample"`.
        order (int, optional):
            Embedding dimension.<br>
            Only relevant when `algorithm=sample` or `algorithm=approx`.<br>
            Defaults to `2`.
        metric (VALID_KDTREE_METRIC_OPTIONS):
            Name of the distance metric function used with [`sklearn.neighbors.KDTree`](https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.KDTree.html#sklearn.neighbors.KDTree). Default is to use the [Chebyshev distance](https://en.wikipedia.org/wiki/Chebyshev_distance).<br>
            Only relevant when `algorithm=sample` or `algorithm=approx`.<br>
            Defaults to `"chebyshev"`.
        sf (float, optional):
            Sampling frequency, in Hz.<br>
            Only relevant when `algorithm=spectral`.<br>
            Defaults to `1`.
        normalize (bool, optional):
            If `True`, divide by $log2(psd.size)$ to normalize the spectral entropy to be between $0$ and $1$. Otherwise, return the spectral entropy in bit.<br>
            Only relevant when `algorithm=spectral`.<br>
            Defaults to `True`.

    Raises:
        (ValueError):
            When the given value for `algorithm` is not valid.

    Returns:
        (Union[float, NDArray[np.float64]]):
            The calculated entropy value.

    ??? success "Credit"
        All credit goes to the [`AntroPy`](https://raphaelvallat.com/antropy/) library.

    ???+ example "Examples"

        ```pycon {.py .python linenums="1" title="Setup"}
        >>> from ts_stat_tests.regularity.tests import entropy
        >>> from ts_stat_tests.utils.data import data_normal
        >>> normal = data_normal

        ```

        ```pycon {.py .python linenums="1" title="Example 1: Sample Entropy"}
        >>> print(entropy(x=normal, algorithm="sample"))
        2.2374...

        ```

        ```pycon {.py .python linenums="1" title="Example 2: Approx Entropy"}
        >>> print(entropy(x=normal, algorithm="approx"))
        1.6643...

        ```

        ```pycon {.py .python linenums="1" title="Example 3: Spectral Entropy"}
        >>> print(entropy(x=normal, algorithm="spectral", sf=1))
        0.9329...

        ```

    ??? question "References"
        - Richman, J. S. et al. (2000). Physiological time-series analysis using approximate entropy and sample entropy. American Journal of Physiology-Heart and Circulatory Physiology, 278(6), H2039-H2049.
        - https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.DistanceMetric.html
        - Inouye, T. et al. (1991). Quantification of EEG irregularity by use of the entropy of the power spectrum. Electroencephalography and clinical neurophysiology, 79(3), 204-210.
        - https://en.wikipedia.org/wiki/Spectral_density
        - https://en.wikipedia.org/wiki/Welch%27s_method

    ??? tip "See Also"
        - [`regularity()`][ts_stat_tests.regularity.tests.regularity]
        - [`approx_entropy()`][ts_stat_tests.regularity.algorithms.approx_entropy]
        - [`sample_entropy()`][ts_stat_tests.regularity.algorithms.sample_entropy]
        - [`spectral_entropy()`][ts_stat_tests.regularity.algorithms.spectral_entropy]
        - [`permutation_entropy()`][ts_stat_tests.regularity.algorithms.permutation_entropy]
        - [`svd_entropy()`][ts_stat_tests.regularity.algorithms.svd_entropy]
    """
    options: dict[str, tuple[str, ...]] = {
        "sampl": ("sample", "sampl", "samp"),
        "approx": ("app", "approx"),
        "spect": ("spec", "spect", "spectral"),
        "perm": ("perm", "permutation"),
        "svd": ("svd", "svd_entropy"),
    }
    if algorithm in options["sampl"]:
        return sample_entropy(x=x, order=order, metric=metric)
    if algorithm in options["approx"]:
        return approx_entropy(x=x, order=order, metric=metric)
    if algorithm in options["spect"]:
        return spectral_entropy(x=x, sf=sf, normalize=normalize)
    if algorithm in options["perm"]:
        return permutation_entropy(x=x, order=order, normalize=normalize)
    if algorithm in options["svd"]:
        return svd_entropy(x=x, order=order, normalize=normalize)
    raise ValueError(
        generate_error_message(
            parameter_name="algorithm",
            value_parsed=algorithm,
            options=options,
        )
    )

regularity 🔗

regularity(
    x: ArrayLike,
    algorithm: str = "sample",
    order: int = 2,
    metric: VALID_KDTREE_METRIC_OPTIONS = "chebyshev",
    sf: float = 1,
    normalize: bool = True,
) -> Union[float, NDArray[np.float64]]

Summary

Test for the regularity of a given data set.

Details

This is a pass-through, convenience wrapper around the entropy() function.

Parameters:

Name	Type	Description	Default
`x`	`ArrayLike`	The data to be checked. Should be a `1-D` or `N-D` data array.	required
`algorithm`	`str`	Which entropy algorithm to use. - `sample_entropy()`: `["sample", "sampl", "samp"]` - `approx_entropy()`: `["app", "approx"]` - `spectral_entropy()`: `["spec", "spect", "spectral"]` - `permutation_entropy()`: `["perm", "permutation"]` - `svd_entropy()`: `["svd", "svd_entropy"]` Defaults to `"sample"`.	`'sample'`
`order`	`int`	Embedding dimension. Only relevant when `algorithm=sample` or `algorithm=approx`. Defaults to `2`.	`2`
`metric`	`VALID_KDTREE_METRIC_OPTIONS`	Name of the distance metric function used with `sklearn.neighbors.KDTree`. Default is to use the Chebyshev distance. Only relevant when `algorithm=sample` or `algorithm=approx`. Defaults to `"chebyshev"`.	`'chebyshev'`
`sf`	`float`	Sampling frequency, in Hz. Only relevant when `algorithm=spectral`. Defaults to `1`.	`1`
`normalize`	`bool`	If `True`, divide by \(log2(psd.size)\) to normalize the spectral entropy to be between \(0\) and \(1\). Otherwise, return the spectral entropy in bit. Only relevant when `algorithm=spectral`. Defaults to `True`.	`True`

Returns:

Type	Description
`Union[float, NDArray[float64]]`	The calculated regularity (entropy) value.

Credit

All credit goes to the AntroPy library.

Examples

Setup
>>> from ts_stat_tests.regularity.tests import regularity
>>> from ts_stat_tests.utils.data import data_normal
>>> normal = data_normal

Example 1: Sample Entropy
>>> print(regularity(x=normal, algorithm="sample"))
2.2374...

Example 2: Approx Entropy
>>> print(regularity(x=normal, algorithm="approx"))
1.6643...

Example 3: Spectral Entropy
>>> print(regularity(x=normal, algorithm="spectral", sf=1))
0.9329...

References

Richman, J. S. et al. (2000). Physiological time-series analysis using approximate entropy and sample entropy. American Journal of Physiology-Heart and Circulatory Physiology, 278(6), H2039-H2049.
https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.DistanceMetric.html
Inouye, T. et al. (1991). Quantification of EEG irregularity by use of the entropy of the power spectrum. Electroencephalography and clinical neurophysiology, 79(3), 204-210.
https://en.wikipedia.org/wiki/Spectral_density
https://en.wikipedia.org/wiki/Welch%27s_method

See Also

entropy()
approx_entropy()
sample_entropy()
spectral_entropy()
permutation_entropy()
svd_entropy()

Source code in src/ts_stat_tests/regularity/tests.py

@typechecked
def regularity(
    x: ArrayLike,
    algorithm: str = "sample",
    order: int = 2,
    metric: VALID_KDTREE_METRIC_OPTIONS = "chebyshev",
    sf: float = 1,
    normalize: bool = True,
) -> Union[float, NDArray[np.float64]]:
    """
    !!! note "Summary"
        Test for the regularity of a given data set.

    ???+ abstract "Details"
        This is a pass-through, convenience wrapper around the [`entropy()`][ts_stat_tests.regularity.tests.entropy] function.

    Params:
        x (ArrayLike):
            The data to be checked. Should be a `1-D` or `N-D` data array.
        algorithm (str, optional):
            Which entropy algorithm to use.<br>
            - `sample_entropy()`: `["sample", "sampl", "samp"]`<br>
            - `approx_entropy()`: `["app", "approx"]`<br>
            - `spectral_entropy()`: `["spec", "spect", "spectral"]`<br>
            - `permutation_entropy()`: `["perm", "permutation"]`<br>
            - `svd_entropy()`: `["svd", "svd_entropy"]`<br>
            Defaults to `"sample"`.
        order (int, optional):
            Embedding dimension.<br>
            Only relevant when `algorithm=sample` or `algorithm=approx`.<br>
            Defaults to `2`.
        metric (VALID_KDTREE_METRIC_OPTIONS):
            Name of the distance metric function used with [`sklearn.neighbors.KDTree`](https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.KDTree.html#sklearn.neighbors.KDTree). Default is to use the [Chebyshev distance](https://en.wikipedia.org/wiki/Chebyshev_distance).<br>
            Only relevant when `algorithm=sample` or `algorithm=approx`.<br>
            Defaults to `"chebyshev"`.
        sf (float, optional):
            Sampling frequency, in Hz.<br>
            Only relevant when `algorithm=spectral`.<br>
            Defaults to `1`.
        normalize (bool, optional):
            If `True`, divide by $log2(psd.size)$ to normalize the spectral entropy to be between $0$ and $1$. Otherwise, return the spectral entropy in bit.<br>
            Only relevant when `algorithm=spectral`.<br>
            Defaults to `True`.

    Returns:
        (Union[float, NDArray[np.float64]]):
            The calculated regularity (entropy) value.

    ??? success "Credit"
        All credit goes to the [`AntroPy`](https://raphaelvallat.com/antropy/) library.

    ???+ example "Examples"

        ```pycon {.py .python linenums="1" title="Setup"}
        >>> from ts_stat_tests.regularity.tests import regularity
        >>> from ts_stat_tests.utils.data import data_normal
        >>> normal = data_normal

        ```

        ```pycon {.py .python linenums="1" title="Example 1: Sample Entropy"}
        >>> print(regularity(x=normal, algorithm="sample"))
        2.2374...

        ```

        ```pycon {.py .python linenums="1" title="Example 2: Approx Entropy"}
        >>> print(regularity(x=normal, algorithm="approx"))
        1.6643...

        ```

        ```pycon {.py .python linenums="1" title="Example 3: Spectral Entropy"}
        >>> print(regularity(x=normal, algorithm="spectral", sf=1))
        0.9329...

        ```

    ??? question "References"
        - Richman, J. S. et al. (2000). Physiological time-series analysis using approximate entropy and sample entropy. American Journal of Physiology-Heart and Circulatory Physiology, 278(6), H2039-H2049.
        - https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.DistanceMetric.html
        - Inouye, T. et al. (1991). Quantification of EEG irregularity by use of the entropy of the power spectrum. Electroencephalography and clinical neurophysiology, 79(3), 204-210.
        - https://en.wikipedia.org/wiki/Spectral_density
        - https://en.wikipedia.org/wiki/Welch%27s_method

    ??? tip "See Also"
        - [`entropy()`][ts_stat_tests.regularity.tests.entropy]
        - [`approx_entropy()`][ts_stat_tests.regularity.algorithms.approx_entropy]
        - [`sample_entropy()`][ts_stat_tests.regularity.algorithms.sample_entropy]
        - [`spectral_entropy()`][ts_stat_tests.regularity.algorithms.spectral_entropy]
        - [`permutation_entropy()`][ts_stat_tests.regularity.algorithms.permutation_entropy]
        - [`svd_entropy()`][ts_stat_tests.regularity.algorithms.svd_entropy]
    """
    return entropy(x=x, algorithm=algorithm, order=order, metric=metric, sf=sf, normalize=normalize)

is_regular 🔗

is_regular(
    x: ArrayLike,
    algorithm: str = "sample",
    order: int = 2,
    sf: float = 1,
    metric: VALID_KDTREE_METRIC_OPTIONS = "chebyshev",
    normalize: bool = True,
    tolerance: Union[str, float, int, None] = "default",
) -> dict[str, Union[str, float, bool]]

Summary

Test whether a given data set is regular or not.

Details

This function implements the given algorithm (defined in the parameter algorithm), and returns a dictionary containing the relevant data:

{
    "result": ...,  # The result of the test. Will be `True` if `entropy<tolerance`, and `False` otherwise
    "entropy": ...,  # A `float` value, the result of the `entropy()` function
    "tolerance": ...,  # A `float` value, which is the tolerance used for determining whether or not the `entropy` is `regular` or not
}

Parameters:

Name	Type	Description	Default
`x`	`ArrayLike`	The data to be checked. Should be a `1-D` or `N-D` data array.	required
`algorithm`	`str`	Which entropy algorithm to use. - `sample_entropy()`: `["sample", "sampl", "samp"]` - `approx_entropy()`: `["app", "approx"]` - `spectral_entropy()`: `["spec", "spect", "spectral"]` - `permutation_entropy()`: `["perm", "permutation"]` - `svd_entropy()`: `["svd", "svd_entropy"]` Defaults to `"sample"`.	`'sample'`
`order`	`int`	Embedding dimension. Only relevant when `algorithm=sample` or `algorithm=approx`. Defaults to `2`.	`2`
`metric`	`VALID_KDTREE_METRIC_OPTIONS`	Name of the distance metric function used with `sklearn.neighbors.KDTree`. Default is to use the Chebyshev distance. Only relevant when `algorithm=sample` or `algorithm=approx`. Defaults to `"chebyshev"`.	`'chebyshev'`
`sf`	`float`	Sampling frequency, in Hz. Only relevant when `algorithm=spectral`. Defaults to `1`.	`1`
`normalize`	`bool`	If `True`, divide by \(log2(psd.size)\) to normalize the spectral entropy to be between \(0\) and \(1\). Otherwise, return the spectral entropy in bit. Only relevant when `algorithm=spectral`. Defaults to `True`.	`True`
`tolerance`	`Union[str, float, int, None]`	The tolerance value used to determine whether or not the result is `regular` or not. - If `tolerance` is either type `int` or `float`, then this value will be used. - If `tolerance` is either `"default"` or `None`, then `tolerance` will be derived from `x` using the calculation: `tolerance = 0.2 * np.std(a=x)` - If any other value is given, then a `ValueError` error will be raised. Defaults to `"default"`.	`'default'`

Raises:

Type	Description
`ValueError`	If the given `tolerance` parameter is invalid. Valid options are: A number with type `float` or `int`, or A string with value `default`, or The value `None`.

Returns:

Type	Description
`dict[str, Union[str, float, bool]]`	A dictionary containing the test results: `result` (bool): `True` if `entropy < tolerance`. `entropy` (float): The calculated entropy value. `tolerance` (float): The threshold used for regularity.

Credit

All credit goes to the AntroPy library.

Examples

Setup
>>> from ts_stat_tests.regularity.tests import is_regular
>>> from ts_stat_tests.utils.data import data_normal
>>> normal = data_normal

Example 1: Sample Entropy
>>> print(is_regular(x=normal, algorithm="sample"))
{'result': False, 'entropy': 2.23743099781426, 'tolerance': 0.20294652904313437}

Example 2: Approx Entropy
>>> print(is_regular(x=normal, algorithm="approx", tolerance=0.5))
{'result': False, 'entropy': 1.6643808251518548, 'tolerance': 0.5}

References

Richman, J. S. et al. (2000). Physiological time-series analysis using approximate entropy and sample entropy. American Journal of Physiology-Heart and Circulatory Physiology, 278(6), H2039-H2049.
https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.DistanceMetric.html
Inouye, T. et al. (1991). Quantification of EEG irregularity by use of the entropy of the power spectrum. Electroencephalography and clinical neurophysiology, 79(3), 204-210.
https://en.wikipedia.org/wiki/Spectral_density
https://en.wikipedia.org/wiki/Welch%27s_method

See Also

entropy()
regularity()
approx_entropy()
sample_entropy()
spectral_entropy()
permutation_entropy()
svd_entropy()

Source code in src/ts_stat_tests/regularity/tests.py

@typechecked
def is_regular(
    x: ArrayLike,
    algorithm: str = "sample",
    order: int = 2,
    sf: float = 1,
    metric: VALID_KDTREE_METRIC_OPTIONS = "chebyshev",
    normalize: bool = True,
    tolerance: Union[str, float, int, None] = "default",
) -> dict[str, Union[str, float, bool]]:
    """
    !!! note "Summary"
        Test whether a given data set is `regular` or not.

    ???+ abstract "Details"
        This function implements the given algorithm (defined in the parameter `algorithm`), and returns a dictionary containing the relevant data:
        ```python
        {
            "result": ...,  # The result of the test. Will be `True` if `entropy<tolerance`, and `False` otherwise
            "entropy": ...,  # A `float` value, the result of the `entropy()` function
            "tolerance": ...,  # A `float` value, which is the tolerance used for determining whether or not the `entropy` is `regular` or not
        }
        ```

    Params:
        x (ArrayLike):
            The data to be checked. Should be a `1-D` or `N-D` data array.
        algorithm (str, optional):
            Which entropy algorithm to use.<br>
            - `sample_entropy()`: `["sample", "sampl", "samp"]`<br>
            - `approx_entropy()`: `["app", "approx"]`<br>
            - `spectral_entropy()`: `["spec", "spect", "spectral"]`<br>
            - `permutation_entropy()`: `["perm", "permutation"]`<br>
            - `svd_entropy()`: `["svd", "svd_entropy"]`<br>
            Defaults to `"sample"`.
        order (int, optional):
            Embedding dimension.<br>
            Only relevant when `algorithm=sample` or `algorithm=approx`.<br>
            Defaults to `2`.
        metric (VALID_KDTREE_METRIC_OPTIONS):
            Name of the distance metric function used with [`sklearn.neighbors.KDTree`](https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.KDTree.html#sklearn.neighbors.KDTree). Default is to use the [Chebyshev distance](https://en.wikipedia.org/wiki/Chebyshev_distance).<br>
            Only relevant when `algorithm=sample` or `algorithm=approx`.<br>
            Defaults to `"chebyshev"`.
        sf (float, optional):
            Sampling frequency, in Hz.<br>
            Only relevant when `algorithm=spectral`.<br>
            Defaults to `1`.
        normalize (bool, optional):
            If `True`, divide by $log2(psd.size)$ to normalize the spectral entropy to be between $0$ and $1$. Otherwise, return the spectral entropy in bit.<br>
            Only relevant when `algorithm=spectral`.<br>
            Defaults to `True`.
        tolerance (Union[str, float, int, None], optional):
            The tolerance value used to determine whether or not the result is `regular` or not.<br>
            - If `tolerance` is either type `int` or `float`, then this value will be used.<br>
            - If `tolerance` is either `"default"` or `None`, then `tolerance` will be derived from `x` using the calculation:
                ```python
                tolerance = 0.2 * np.std(a=x)
                ```
            - If any other value is given, then a `ValueError` error will be raised.<br>
            Defaults to `"default"`.

    Raises:
        (ValueError):
            If the given `tolerance` parameter is invalid.

            Valid options are:

            - A number with type `float` or `int`, or
            - A string with value `default`, or
            - The value `None`.

    Returns:
        (dict[str, Union[str, float, bool]]):
            A dictionary containing the test results:

            - `result` (bool): `True` if `entropy < tolerance`.
            - `entropy` (float): The calculated entropy value.
            - `tolerance` (float): The threshold used for regularity.

    ??? success "Credit"
        All credit goes to the [`AntroPy`](https://raphaelvallat.com/antropy/) library.

    ???+ example "Examples"

        ```pycon {.py .python linenums="1" title="Setup"}
        >>> from ts_stat_tests.regularity.tests import is_regular
        >>> from ts_stat_tests.utils.data import data_normal
        >>> normal = data_normal

        ```

        ```pycon {.py .python linenums="1" title="Example 1: Sample Entropy"}
        >>> print(is_regular(x=normal, algorithm="sample"))
        {'result': False, 'entropy': 2.23743099781426, 'tolerance': 0.20294652904313437}

        ```

        ```pycon {.py .python linenums="1" title="Example 2: Approx Entropy"}
        >>> print(is_regular(x=normal, algorithm="approx", tolerance=0.5))
        {'result': False, 'entropy': 1.6643808251518548, 'tolerance': 0.5}

        ```

    ??? question "References"
        - Richman, J. S. et al. (2000). Physiological time-series analysis using approximate entropy and sample entropy. American Journal of Physiology-Heart and Circulatory Physiology, 278(6), H2039-H2049.
        - https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.DistanceMetric.html
        - Inouye, T. et al. (1991). Quantification of EEG irregularity by use of the entropy of the power spectrum. Electroencephalography and clinical neurophysiology, 79(3), 204-210.
        - https://en.wikipedia.org/wiki/Spectral_density
        - https://en.wikipedia.org/wiki/Welch%27s_method

    ??? tip "See Also"
        - [`entropy()`][ts_stat_tests.regularity.tests.entropy]
        - [`regularity()`][ts_stat_tests.regularity.tests.regularity]
        - [`approx_entropy()`][ts_stat_tests.regularity.algorithms.approx_entropy]
        - [`sample_entropy()`][ts_stat_tests.regularity.algorithms.sample_entropy]
        - [`spectral_entropy()`][ts_stat_tests.regularity.algorithms.spectral_entropy]
        - [`permutation_entropy()`][ts_stat_tests.regularity.algorithms.permutation_entropy]
        - [`svd_entropy()`][ts_stat_tests.regularity.algorithms.svd_entropy]
    """
    if isinstance(tolerance, (float, int)):
        tol = tolerance
    elif tolerance in ["default", None]:
        tol = 0.2 * np.std(a=np.asarray(x))
    else:
        raise ValueError(
            f"Invalid option for `tolerance` parameter: {tolerance}.\n"
            f"Valid options are:\n"
            f"- A number with type `float` or `int`,\n"
            f"- A string with value `default`,\n"
            f"- The value `None`."
        )
    value = regularity(x=x, order=order, sf=sf, metric=metric, algorithm=algorithm, normalize=normalize)
    result = value < tol
    return {
        "result": bool(result),
        "entropy": float(value),
        "tolerance": float(tol),
    }

ts_stat_tests.regularity.algorithms 🔗

Summary

This module contains algorithms to compute regularity measures for time series data, including approximate entropy, sample entropy, spectral entropy, and permutation entropy.

VALID_KDTREE_METRIC_OPTIONS `module-attribute` 🔗

VALID_KDTREE_METRIC_OPTIONS = Literal[
    "euclidean",
    "l2",
    "minkowski",
    "p",
    "manhattan",
    "cityblock",
    "l1",
    "chebyshev",
    "infinity",
]

VALID_SPECTRAL_ENTROPY_METHOD_OPTIONS `module-attribute` 🔗

VALID_SPECTRAL_ENTROPY_METHOD_OPTIONS = Literal[
    "fft", "welch"
]

approx_entropy 🔗

approx_entropy(
    x: ArrayLike,
    order: int = 2,
    tolerance: Optional[float] = None,
    metric: VALID_KDTREE_METRIC_OPTIONS = "chebyshev",
) -> float

Summary

Approximate entropy is a measure of the amount of regularity or predictability in a time series. It is used to quantify the degree of self-similarity of a signal over different time scales, and can be useful for detecting underlying patterns or trends in data.

This function implements the app_entropy() function from the AntroPy library.

Details

Approximate entropy is a technique used to quantify the amount of regularity and the unpredictability of fluctuations over time-series data. Smaller values indicate that the data is more regular and predictable.

To calculate approximate entropy, we first need to define a window size or scale factor, which determines the length of the subsequences that are used to compare the similarity of the time series. We then compare all possible pairs of subsequences within the time series and calculate the probability that two subsequences are within a certain tolerance level of each other, where the tolerance level is usually expressed as a percentage of the standard deviation of the time series.

The approximate entropy is then defined as the negative natural logarithm of the average probability of similarity across all possible pairs of subsequences, normalized by the length of the time series and the scale factor.

The approximate entropy measure is useful in a variety of applications, such as the analysis of physiological signals, financial time series, and climate data. It can be used to detect changes in the regularity or predictability of a time series over time, and can provide insights into the underlying dynamics or mechanisms that generate the signal.

Parameters:

Name	Type	Description	Default
`x`	`ArrayLike`	One-dimensional time series of shape `(n_times,)`.	required
`order`	`int`	Embedding dimension. Defaults to `2`.	`2`
`tolerance`	`Optional[float]`	Tolerance level or similarity criterion. If `None` (default), it is set to \(0.2 \times \text{std}(x)\). Defaults to `None`.	`None`
`metric`	`VALID_KDTREE_METRIC_OPTIONS`	Name of the distance metric function used with `sklearn.neighbors.KDTree`. Default is to use the Chebyshev distance. For a full list of all available metrics, see `sklearn.metrics.pairwise.distance_metrics` and `scipy.spatial.distance` Defaults to `"chebyshev"`.	`'chebyshev'`

Returns:

Type	Description
`float`	The approximate entropy score.

Examples

Setup
>>> from ts_stat_tests.regularity.algorithms import approx_entropy
>>> from ts_stat_tests.utils.data import data_airline, data_random
>>> airline = data_airline.values
>>> random = data_random

Example 1: Airline Passengers Data
>>> print(f"{approx_entropy(x=airline):.4f}")
0.6451

Example 2: Random Data
>>> print(f"{approx_entropy(x=random):.4f}")
1.8177

Calculation

The equation for ApEn is:

\[ \text{ApEn}(m, r, N) = \phi_m(r) - \phi_{m+1}(r) \]

where:

\(m\) is the embedding dimension,
\(r\) is the tolerance or similarity criterion,
\(N\) is the length of the time series, and
\(\phi_m(r)\) and \(\phi_{m+1}(r)\) are the logarithms of the probabilities that two sequences of \(m\) data points in the time series that are similar to each other within a tolerance \(r\) remain similar for the next data point, for \(m\) and \(m+1\), respectively.

Notes

Inputs: x is a 1-dimensional array. It represents time-series data, ideally with each element in the array being a measurement or value taken at regular time intervals.
Settings: order is used for determining the number of values that are used to construct each permutation pattern. If the embedding dimension is too small, we may miss important patterns. If it's too large, we may overfit noise.
Metric: The Chebyshev metric is often used because it is a robust and computationally efficient way to measure the distance between two time series.

Credit

All credit goes to the AntroPy library.

References

See Also

Source code in src/ts_stat_tests/regularity/algorithms.py

@typechecked
def approx_entropy(
    x: ArrayLike,
    order: int = 2,
    tolerance: Optional[float] = None,
    metric: VALID_KDTREE_METRIC_OPTIONS = "chebyshev",
) -> float:
    r"""
    !!! note "Summary"
        Approximate entropy is a measure of the amount of regularity or predictability in a time series. It is used to quantify the degree of self-similarity of a signal over different time scales, and can be useful for detecting underlying patterns or trends in data.

        This function implements the [`app_entropy()`](https://raphaelvallat.com/antropy/build/html/generated/antropy.app_entropy.html) function from the [`AntroPy`](https://raphaelvallat.com/antropy/) library.

    ???+ abstract "Details"
        Approximate entropy is a technique used to quantify the amount of regularity and the unpredictability of fluctuations over time-series data. Smaller values indicate that the data is more regular and predictable.

        To calculate approximate entropy, we first need to define a window size or scale factor, which determines the length of the subsequences that are used to compare the similarity of the time series. We then compare all possible pairs of subsequences within the time series and calculate the probability that two subsequences are within a certain tolerance level of each other, where the tolerance level is usually expressed as a percentage of the standard deviation of the time series.

        The approximate entropy is then defined as the negative natural logarithm of the average probability of similarity across all possible pairs of subsequences, normalized by the length of the time series and the scale factor.

        The approximate entropy measure is useful in a variety of applications, such as the analysis of physiological signals, financial time series, and climate data. It can be used to detect changes in the regularity or predictability of a time series over time, and can provide insights into the underlying dynamics or mechanisms that generate the signal.

    Params:
        x (ArrayLike):
            One-dimensional time series of shape `(n_times,)`.
        order (int, optional):
            Embedding dimension.<br>
            Defaults to `2`.
        tolerance (Optional[float], optional):
            Tolerance level or similarity criterion. If `None` (default), it is set to $0.2 \times \text{std}(x)$.<br>
            Defaults to `None`.
        metric (VALID_KDTREE_METRIC_OPTIONS, optional):
            Name of the distance metric function used with [`sklearn.neighbors.KDTree`](https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.KDTree.html#sklearn.neighbors.KDTree). Default is to use the [Chebyshev distance](https://en.wikipedia.org/wiki/Chebyshev_distance). For a full list of all available metrics, see [`sklearn.metrics.pairwise.distance_metrics`](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.pairwise_distances.html) and [`scipy.spatial.distance`](https://docs.scipy.org/doc/scipy/reference/spatial.distance.html)<br>
            Defaults to `"chebyshev"`.

    Returns:
        (float):
            The approximate entropy score.

    ???+ example "Examples"

        ```pycon {.py .python linenums="1" title="Setup"}
        >>> from ts_stat_tests.regularity.algorithms import approx_entropy
        >>> from ts_stat_tests.utils.data import data_airline, data_random
        >>> airline = data_airline.values
        >>> random = data_random

        ```

        ```pycon {.py .python linenums="1" title="Example 1: Airline Passengers Data"}
        >>> print(f"{approx_entropy(x=airline):.4f}")
        0.6451

        ```

        ```pycon {.py .python linenums="1" title="Example 2: Random Data"}
        >>> print(f"{approx_entropy(x=random):.4f}")
        1.8177

        ```

    ??? equation "Calculation"
        The equation for ApEn is:

        $$
        \text{ApEn}(m, r, N) = \phi_m(r) - \phi_{m+1}(r)
        $$

        where:

        - $m$ is the embedding dimension,
        - $r$ is the tolerance or similarity criterion,
        - $N$ is the length of the time series, and
        - $\phi_m(r)$ and $\phi_{m+1}(r)$ are the logarithms of the probabilities that two sequences of $m$ data points in the time series that are similar to each other within a tolerance $r$ remain similar for the next data point, for $m$ and $m+1$, respectively.

    ??? note "Notes"
        - **Inputs**: `x` is a 1-dimensional array. It represents time-series data, ideally with each element in the array being a measurement or value taken at regular time intervals.
        - **Settings**: `order` is used for determining the number of values that are used to construct each permutation pattern. If the embedding dimension is too small, we may miss important patterns. If it's too large, we may overfit noise.
        - **Metric**: The Chebyshev metric is often used because it is a robust and computationally efficient way to measure the distance between two time series.

    ??? success "Credit"
        All credit goes to the [`AntroPy`](https://raphaelvallat.com/antropy/) library.

    ??? question "References"
        - [Richman, J. S. et al. (2000). Physiological time-series analysis using approximate entropy and sample entropy. American Journal of Physiology-Heart and Circulatory Physiology, 278(6), H2039-H2049](https://journals.physiology.org/doi/epdf/10.1152/ajpheart.2000.278.6.H2039)
        - [SK-Learn: Pairwise metrics, Affinities and Kernels](https://scikit-learn.org/stable/modules/metrics.html#metrics)
        - [Spatial data structures and algorithms](https://docs.scipy.org/doc/scipy/tutorial/spatial.html)

    ??? tip "See Also"
        - [`antropy.app_entropy`](https://raphaelvallat.com/antropy/build/html/generated/antropy.app_entropy.html)
        - [`antropy.sample_entropy`](https://raphaelvallat.com/antropy/build/html/generated/antropy.sample_entropy.html)
        - [`antropy.perm_entropy`](https://raphaelvallat.com/antropy/build/html/generated/antropy.perm_entropy.html)
        - [`antropy.spectral_entropy`](https://raphaelvallat.com/antropy/build/html/generated/antropy.spectral_entropy.html)
    """
    return a_app_entropy(
        x=x,
        order=order,
        tolerance=tolerance,
        metric=metric,
    )

sample_entropy 🔗

sample_entropy(
    x: ArrayLike,
    order: int = 2,
    tolerance: Optional[float] = None,
    metric: VALID_KDTREE_METRIC_OPTIONS = "chebyshev",
) -> float

Summary

Sample entropy is a measure of the amount of regularity or predictability in a time series. It is used to quantify the degree of self-similarity of a signal over different time scales, and can be useful for detecting underlying patterns or trends in data.

This function implements the sample_entropy() function from the AntroPy library.

Details

Sample entropy is a modification of approximate entropy, used for assessing the complexity of physiological time-series signals. It has two advantages over approximate entropy: data length independence and a relatively trouble-free implementation. Large values indicate high complexity whereas smaller values characterize more self-similar and regular signals.

The value of SampEn ranges from zero (\(0\)) to infinity (\(\infty\)), with lower values indicating higher regularity or predictability in the time series. A time series with high \(SampEn\) is more unpredictable or irregular, whereas a time series with low \(SampEn\) is more regular or predictable.

Sample entropy is often used in time series forecasting to assess the complexity of the data and to determine whether a time series is suitable for modeling with a particular forecasting method, such as ARIMA or neural networks.

Choosing an appropriate embedding dimension is crucial in ensuring that the permutation entropy calculation is robust and reliable, and captures the essential features of the time series in a meaningful way. This allows us to make more accurate and informative inferences about the behavior of the system that generated the data, and can be useful in a wide range of applications, from signal processing to data analysis and beyond.

Parameters:

Name	Type	Description	Default
`x`	`ArrayLike`	One-dimensional time series of shape `(n_times,)`.	required
`order`	`int`	Embedding dimension. Defaults to `2`.	`2`
`tolerance`	`Optional[float]`	Tolerance level or similarity criterion. If `None` (default), it is set to \(0.2 \times \text{std}(x)\). Defaults to `None`.	`None`
`metric`	`VALID_KDTREE_METRIC_OPTIONS`	Name of the distance metric function used with `sklearn.neighbors.KDTree`. Default is to use the Chebyshev distance. For a full list of all available metrics, see `sklearn.metrics.pairwise.distance_metrics` and `scipy.spatial.distance` Defaults to `"chebyshev"`.	`'chebyshev'`

Returns:

Type	Description
`float`	The sample entropy score.

Examples

Setup
>>> from ts_stat_tests.regularity.algorithms import sample_entropy
>>> from ts_stat_tests.utils.data import data_airline, data_random
>>> airline = data_airline.values
>>> random = data_random

Example 1: Airline Passengers Data
>>> print(f"{sample_entropy(x=airline):.4f}")
0.6177

Example 2: Random Data
>>> print(f"{sample_entropy(x=random):.4f}")
2.2017

Calculation

The equation for sample entropy (SampEn) is as follows:

\[ \text{SampEn}(m, r, N) = - \log \left( \frac {C_m(r)} {C_{m+1}(r)} \right) \]

where:

\(m\) is the embedding dimension,
\(r\) is the tolerance or similarity criterion,
\(N\) is the length of the time series, and
\(C_m(r)\) and \(C_{m+1}(r)\) are the number of \(m\)-tuples (vectors of \(m\) consecutive data points) that have a distance less than or equal to \(r\), and \((m+1)\)-tuples with the same property, respectively.

The calculation of sample entropy involves the following steps:

Choose the values of \(m\) and \(r\).
Construct \(m\)-tuples from the time series data.
Compute the number of \(m\)-tuples that are within a distance \(r\) of each other (\(C_m(r)\)).
Compute the number of \((m+1)\)-tuples that are within a distance \(r\) of each other (\(C_{m+1}(r)\)).
Compute the value of \(SampEn\) using the formula above.

Notes

Note that if metric == 'chebyshev' and len(x) < 5000 points, then the sample entropy is computed using a fast custom Numba script. For other distance metric or longer time-series, the sample entropy is computed using a code from the mne-features package by Jean-Baptiste Schiratti and Alexandre Gramfort (requires sklearn).
The embedding dimension is important in the calculation of sample entropy because it affects the sensitivity of the measure to different patterns in the data. If the embedding dimension is too small, we may miss important patterns or variations. If it is too large, we may overfit the data.

Credit

All credit goes to the AntroPy library.

References

See Also

Source code in src/ts_stat_tests/regularity/algorithms.py

@typechecked
def sample_entropy(
    x: ArrayLike,
    order: int = 2,
    tolerance: Optional[float] = None,
    metric: VALID_KDTREE_METRIC_OPTIONS = "chebyshev",
) -> float:
    r"""
    !!! note "Summary"
        Sample entropy is a measure of the amount of regularity or predictability in a time series. It is used to quantify the degree of self-similarity of a signal over different time scales, and can be useful for detecting underlying patterns or trends in data.

        This function implements the [`sample_entropy()`](https://raphaelvallat.com/antropy/build/html/generated/antropy.sample_entropy.html) function from the [`AntroPy`](https://raphaelvallat.com/antropy/) library.

    ???+ abstract "Details"
        Sample entropy is a modification of approximate entropy, used for assessing the complexity of physiological time-series signals. It has two advantages over approximate entropy: data length independence and a relatively trouble-free implementation. Large values indicate high complexity whereas smaller values characterize more self-similar and regular signals.

        The value of SampEn ranges from zero ($0$) to infinity ($\infty$), with lower values indicating higher regularity or predictability in the time series. A time series with high $SampEn$ is more unpredictable or irregular, whereas a time series with low $SampEn$ is more regular or predictable.

        Sample entropy is often used in time series forecasting to assess the complexity of the data and to determine whether a time series is suitable for modeling with a particular forecasting method, such as ARIMA or neural networks.

        Choosing an appropriate embedding dimension is crucial in ensuring that the permutation entropy calculation is robust and reliable, and captures the essential features of the time series in a meaningful way. This allows us to make more accurate and informative inferences about the behavior of the system that generated the data, and can be useful in a wide range of applications, from signal processing to data analysis and beyond.

    Params:
        x (ArrayLike):
            One-dimensional time series of shape `(n_times,)`.
        order (int, optional):
            Embedding dimension.<br>
            Defaults to `2`.
        tolerance (Optional[float], optional):
            Tolerance level or similarity criterion. If `None` (default), it is set to $0.2 \times \text{std}(x)$.<br>
            Defaults to `None`.
        metric (VALID_KDTREE_METRIC_OPTIONS, optional):
            Name of the distance metric function used with [`sklearn.neighbors.KDTree`](https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.KDTree.html#sklearn.neighbors.KDTree). Default is to use the [Chebyshev distance](https://en.wikipedia.org/wiki/Chebyshev_distance). For a full list of all available metrics, see [`sklearn.metrics.pairwise.distance_metrics`](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.pairwise_distances.html) and [`scipy.spatial.distance`](https://docs.scipy.org/doc/scipy/reference/spatial.distance.html)<br>
            Defaults to `"chebyshev"`.

    Returns:
        (float):
            The sample entropy score.

    ???+ example "Examples"

        ```pycon {.py .python linenums="1" title="Setup"}
        >>> from ts_stat_tests.regularity.algorithms import sample_entropy
        >>> from ts_stat_tests.utils.data import data_airline, data_random
        >>> airline = data_airline.values
        >>> random = data_random

        ```

        ```pycon {.py .python linenums="1" title="Example 1: Airline Passengers Data"}
        >>> print(f"{sample_entropy(x=airline):.4f}")
        0.6177

        ```

        ```pycon {.py .python linenums="1" title="Example 2: Random Data"}
        >>> print(f"{sample_entropy(x=random):.4f}")
        2.2017

        ```

    ??? equation "Calculation"
        The equation for sample entropy (SampEn) is as follows:

        $$
        \text{SampEn}(m, r, N) = - \log \left( \frac {C_m(r)} {C_{m+1}(r)} \right)
        $$

        where:

        - $m$ is the embedding dimension,
        - $r$ is the tolerance or similarity criterion,
        - $N$ is the length of the time series, and
        - $C_m(r)$ and $C_{m+1}(r)$ are the number of $m$-tuples (vectors of $m$ consecutive data points) that have a distance less than or equal to $r$, and $(m+1)$-tuples with the same property, respectively.

        The calculation of sample entropy involves the following steps:

        1. Choose the values of $m$ and $r$.
        2. Construct $m$-tuples from the time series data.
        3. Compute the number of $m$-tuples that are within a distance $r$ of each other ($C_m(r)$).
        4. Compute the number of $(m+1)$-tuples that are within a distance $r$ of each other ($C_{m+1}(r)$).
        5. Compute the value of $SampEn$ using the formula above.

    ??? note "Notes"
        - Note that if `metric == 'chebyshev'` and `len(x) < 5000` points, then the sample entropy is computed using a fast custom Numba script. For other distance metric or longer time-series, the sample entropy is computed using a code from the [`mne-features`](https://mne.tools/mne-features/) package by Jean-Baptiste Schiratti and Alexandre Gramfort (requires sklearn).
        - The embedding dimension is important in the calculation of sample entropy because it affects the sensitivity of the measure to different patterns in the data. If the embedding dimension is too small, we may miss important patterns or variations. If it is too large, we may overfit the data.

    ??? success "Credit"
        All credit goes to the [`AntroPy`](https://raphaelvallat.com/antropy/) library.

    ??? question "References"
        - [Richman, J. S. et al. (2000). Physiological time-series analysis using approximate entropy and sample entropy. American Journal of Physiology-Heart and Circulatory Physiology, 278(6), H2039-H2049](https://journals.physiology.org/doi/epdf/10.1152/ajpheart.2000.278.6.H2039)
        - [SK-Learn: Pairwise metrics, Affinities and Kernels](https://scikit-learn.org/stable/modules/metrics.html#metrics)
        - [Spatial data structures and algorithms](https://docs.scipy.org/doc/scipy/tutorial/spatial.html)

    ??? tip "See Also"
        - [`antropy.app_entropy`](https://raphaelvallat.com/antropy/build/html/generated/antropy.app_entropy.html)
        - [`antropy.sample_entropy`](https://raphaelvallat.com/antropy/build/html/generated/antropy.sample_entropy.html)
        - [`antropy.perm_entropy`](https://raphaelvallat.com/antropy/build/html/generated/antropy.perm_entropy.html)
        - [`antropy.spectral_entropy`](https://raphaelvallat.com/antropy/build/html/generated/antropy.spectral_entropy.html)
        - [`sklearn.neighbors.KDTree`](https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.KDTree.html)
        - [`sklearn.metrics.pairwise_distances`](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.pairwise_distances.html)
        - [`scipy.spatial.distance`](https://docs.scipy.org/doc/scipy/reference/spatial.distance.html)
    """
    return a_sample_entropy(
        x=x,
        order=order,
        tolerance=tolerance,
        metric=metric,
    )

permutation_entropy 🔗

permutation_entropy(
    x: ArrayLike,
    order: int = 3,
    delay: Union[int, list, NDArray[int64]] = 1,
    normalize: bool = False,
) -> float

Summary

Permutation entropy is a measure of the complexity or randomness of a time series. It is based on the idea of permuting the order of the values in the time series and calculating the entropy of the resulting permutation patterns.

This function implements the perm_entropy() function from the AntroPy library.

Details

The permutation entropy is a complexity measure for time-series first introduced by Bandt and Pompe in 2002.

It is particularly useful for detecting nonlinear dynamics and nonstationarity in the data. The value of permutation entropy ranges from \(0\) to \(\log_2(\text{order}!)\), where the lower bound is attained for an increasing or decreasing sequence of values, and the upper bound for a completely random system where all possible permutations appear with the same probability.

Choosing an appropriate embedding dimension is crucial in ensuring that the permutation entropy calculation is robust and reliable, and captures the essential features of the time series in a meaningful way.

Parameters:

Name	Type	Description	Default
`x`	`ArrayLike`	One-dimensional time series of shape `(n_times,)`.	required
`order`	`int`	Order of permutation entropy. Defaults to `3`.	`3`
`delay`	`Union[int, list, NDArray[int64]]`	Time delay (lag). If multiple values are passed, the average permutation entropy across all these delays is calculated. Defaults to `1`.	`1`
`normalize`	`bool`	If `True`, divide by \(\log_2(\text{order}!)\) to normalize the entropy between \(0\) and \(1\). Otherwise, return the permutation entropy in bits. Defaults to `False`.	`False`

Returns:

Type	Description
`Union[float, NDArray[float64]]`	The permutation entropy of the data set.

Examples

Setup
>>> from ts_stat_tests.regularity.algorithms import permutation_entropy
>>> from ts_stat_tests.utils.data import data_airline, data_random
>>> airline = data_airline.values
>>> random = data_random

Example 1: Airline Passengers Data
>>> print(f"{permutation_entropy(x=airline):.4f}")
2.3601

Example 2: Random Data (Normalized)
>>> print(f"{permutation_entropy(x=random, normalize=True):.4f}")
0.9997

Calculation

The formula for permutation entropy (\(PE\)) is as follows:

\[ PE(n) = - \sum_{i=0}^{n!} p(i) \times \log_2(p(i)) \]

where:

\(n\) is the embedding dimension (order),
\(p(i)\) is the probability of the \(i\)-th ordinal pattern.

The embedded matrix \(Y\) is created by:

\[ \begin{align} y(i) &= [x_i, x_{i+\text{delay}}, \dots, x_{i+(\text{order}-1) \times \text{delay}}] \\ Y &= [y(1), y(2), \dots, y(N-(\text{order}-1) \times \text{delay})]^T \end{align} \]

Notes

The embedding dimension (order) determines the number of values used to construct each permutation pattern. If too small, patterns may be missed. If too large, overfitting to noise may occur.

Credit

All credit goes to the AntroPy library.

References

Bandt, Christoph, and Bernd Pompe. "Permutation entropy: a natural complexity measure for time series." Physical review letters 88.17 (2002): 174102

See Also

Source code in src/ts_stat_tests/regularity/algorithms.py

@typechecked
def permutation_entropy(
    x: ArrayLike,
    order: int = 3,
    delay: Union[int, list, NDArray[np.int64]] = 1,
    normalize: bool = False,
) -> float:
    r"""
    !!! note "Summary"
        Permutation entropy is a measure of the complexity or randomness of a time series. It is based on the idea of permuting the order of the values in the time series and calculating the entropy of the resulting permutation patterns.

        This function implements the [`perm_entropy()`](https://raphaelvallat.com/antropy/build/html/generated/antropy.perm_entropy.html) function from the [`AntroPy`](https://raphaelvallat.com/antropy/) library.

    ???+ abstract "Details"
        The permutation entropy is a complexity measure for time-series first introduced by Bandt and Pompe in 2002.

        It is particularly useful for detecting nonlinear dynamics and nonstationarity in the data. The value of permutation entropy ranges from $0$ to $\log_2(\text{order}!)$, where the lower bound is attained for an increasing or decreasing sequence of values, and the upper bound for a completely random system where all possible permutations appear with the same probability.

        Choosing an appropriate embedding dimension is crucial in ensuring that the permutation entropy calculation is robust and reliable, and captures the essential features of the time series in a meaningful way.

    Params:
        x (ArrayLike):
            One-dimensional time series of shape `(n_times,)`.
        order (int, optional):
            Order of permutation entropy.<br>
            Defaults to `3`.
        delay (Union[int, list, NDArray[np.int64]], optional):
            Time delay (lag). If multiple values are passed, the average permutation entropy across all these delays is calculated.<br>
            Defaults to `1`.
        normalize (bool, optional):
            If `True`, divide by $\log_2(\text{order}!)$ to normalize the entropy between $0$ and $1$. Otherwise, return the permutation entropy in bits.<br>
            Defaults to `False`.

    Returns:
        (Union[float, NDArray[np.float64]]):
            The permutation entropy of the data set.

    ???+ example "Examples"

        ```pycon {.py .python linenums="1" title="Setup"}
        >>> from ts_stat_tests.regularity.algorithms import permutation_entropy
        >>> from ts_stat_tests.utils.data import data_airline, data_random
        >>> airline = data_airline.values
        >>> random = data_random

        ```

        ```pycon {.py .python linenums="1" title="Example 1: Airline Passengers Data"}
        >>> print(f"{permutation_entropy(x=airline):.4f}")
        2.3601

        ```

        ```pycon {.py .python linenums="1" title="Example 2: Random Data (Normalized)"}
        >>> print(f"{permutation_entropy(x=random, normalize=True):.4f}")
        0.9997

        ```

    ??? equation "Calculation"
        The formula for permutation entropy ($PE$) is as follows:

        $$
        PE(n) = - \sum_{i=0}^{n!} p(i) \times \log_2(p(i))
        $$

        where:

        - $n$ is the embedding dimension (`order`),
        - $p(i)$ is the probability of the $i$-th ordinal pattern.

        The embedded matrix $Y$ is created by:

        $$
        \begin{align}
            y(i) &= [x_i, x_{i+\text{delay}}, \dots, x_{i+(\text{order}-1) \times \text{delay}}] \\
            Y &= [y(1), y(2), \dots, y(N-(\text{order}-1) \times \text{delay})]^T
        \end{align}
        $$

    ??? note "Notes"
        - The embedding dimension (`order`) determines the number of values used to construct each permutation pattern. If too small, patterns may be missed. If too large, overfitting to noise may occur.

    ??? success "Credit"
        All credit goes to the [`AntroPy`](https://raphaelvallat.com/antropy/) library.

    ??? question "References"
        - [Bandt, Christoph, and Bernd Pompe. "Permutation entropy: a natural complexity measure for time series." Physical review letters 88.17 (2002): 174102](http://materias.df.uba.ar/dnla2019c1/files/2019/03/permutation_entropy.pdf)

    ??? tip "See Also"
        - [`antropy.perm_entropy`](https://raphaelvallat.com/antropy/build/html/generated/antropy.perm_entropy.html)
        - [`antropy.app_entropy`](https://raphaelvallat.com/antropy/build/html/generated/antropy.app_entropy.html)
        - [`antropy.sample_entropy`](https://raphaelvallat.com/antropy/build/html/generated/antropy.sample_entropy.html)
        - [`antropy.spectral_entropy`](https://raphaelvallat.com/antropy/build/html/generated/antropy.spectral_entropy.html)
    """
    return a_perm_entropy(
        x=x,
        order=order,
        delay=delay,  # type: ignore[arg-type]  # antropy function can handle Union[int, list[int], NDArray[np.int64]], however the function signature is not annotated as such
        normalize=normalize,
    )

spectral_entropy 🔗

spectral_entropy(
    x: ArrayLike,
    sf: float = 1,
    method: VALID_SPECTRAL_ENTROPY_METHOD_OPTIONS = "fft",
    nperseg: Optional[int] = None,
    normalize: bool = False,
    axis: int = -1,
) -> Union[float, NDArray[np.float64]]

Summary

Spectral entropy is a measure of the amount of complexity or unpredictability in a signal's frequency domain representation. It is used to quantify the degree of randomness or regularity in the power spectrum of a signal.

This function implements the spectral_entropy() function from the AntroPy library.

Details

Spectral Entropy is defined to be the Shannon entropy of the power spectral density (PSD) of the data. It is based on the Shannon entropy, which is a measure of the uncertainty or information content of a probability distribution.

The value of spectral entropy ranges from \(0\) to \(\log_2(N)\), where \(N\) is the number of frequency bands. Lower values indicate a more concentrated or regular distribution of power, while higher values indicate a more spread-out or irregular distribution.

Spectral entropy is particularly useful for detecting periodicity and cyclical patterns, as well as changes in the frequency distribution over time.

Parameters:

Name	Type	Description	Default
`x`	`ArrayLike`	One-dimensional or N-dimensional data array.	required
`sf`	`float`	Sampling frequency, in Hz. Defaults to `1`.	`1`
`method`	`VALID_SPECTRAL_ENTROPY_METHOD_OPTIONS`	Spectral estimation method: `'fft'` or `'welch'`. - `'fft'`: Fourier Transformation (`scipy.signal.periodogram()`) - `'welch'`: Welch periodogram (`scipy.signal.welch()`) Defaults to `"fft"`.	`'fft'`
`nperseg`	`Optional[int]`	Length of each FFT segment for Welch method. If `None`, uses `scipy`'s default of 256 samples. Defaults to `None`.	`None`
`normalize`	`bool`	If `True`, divide by \(\log_2(\text{psd.size})\) to normalize the spectral entropy to be between \(0\) and \(1\). Otherwise, return the spectral entropy in bits. Defaults to `False`.	`False`
`axis`	`int`	The axis along which the entropy is calculated. Default is the last axis. Defaults to `-1`.	`-1`

Returns:

Type	Description
`Union[float, NDArray[float64]]`	The spectral entropy score. Returned as a float for 1D input, or a numpy array for N-dimensional input.

Examples

Setup
>>> from ts_stat_tests.regularity.algorithms import spectral_entropy
>>> from ts_stat_tests.utils.data import data_airline
>>> airline = data_airline.values

Example 1: Airline Passengers Data
>>> print(f"{spectral_entropy(x=airline, sf=12):.4f}")
2.6538

Example 2: Welch method for spectral entropy
>>> data_sine = np.sin(2 * np.pi * 1 * np.arange(400) / 100)
>>> print(f"{spectral_entropy(x=data_sine, sf=100, method='welch'):.4f}")
1.2938

Calculation

The spectral entropy (\(SE\)) is defined as:

\[ H(x, f_s) = - \sum_{i=0}^{f_s/2} P(i) \times \log_2(P(i)) \]

where:

\(P(i)\) is the normalized power spectral density (PSD) at the \(i\)-th frequency band,
\(f_s\) is the sampling frequency.

Notes

The power spectrum represents the energy of the signal at different frequencies. High spectral entropy indicates multiple sources or processes with different frequencies, while low spectral entropy suggests a dominant frequency or periodicity.

Credit

All credit goes to the AntroPy library.

References

See Also

Source code in src/ts_stat_tests/regularity/algorithms.py

@typechecked
def spectral_entropy(
    x: ArrayLike,
    sf: float = 1,
    method: VALID_SPECTRAL_ENTROPY_METHOD_OPTIONS = "fft",
    nperseg: Optional[int] = None,
    normalize: bool = False,
    axis: int = -1,
) -> Union[float, NDArray[np.float64]]:
    r"""
    !!! note "Summary"
        Spectral entropy is a measure of the amount of complexity or unpredictability in a signal's frequency domain representation. It is used to quantify the degree of randomness or regularity in the power spectrum of a signal.

        This function implements the [`spectral_entropy()`](https://raphaelvallat.com/antropy/build/html/generated/antropy.spectral_entropy.html) function from the [`AntroPy`](https://raphaelvallat.com/antropy/) library.

    ???+ abstract "Details"
        Spectral Entropy is defined to be the Shannon entropy of the power spectral density (PSD) of the data. It is based on the Shannon entropy, which is a measure of the uncertainty or information content of a probability distribution.

        The value of spectral entropy ranges from $0$ to $\log_2(N)$, where $N$ is the number of frequency bands. Lower values indicate a more concentrated or regular distribution of power, while higher values indicate a more spread-out or irregular distribution.

        Spectral entropy is particularly useful for detecting periodicity and cyclical patterns, as well as changes in the frequency distribution over time.

    Params:
        x (ArrayLike):
            One-dimensional or N-dimensional data array.
        sf (float, optional):
            Sampling frequency, in Hz.<br>
            Defaults to `1`.
        method (VALID_SPECTRAL_ENTROPY_METHOD_OPTIONS, optional):
            Spectral estimation method: `'fft'` or `'welch'`.<br>
            - `'fft'`: Fourier Transformation ([`scipy.signal.periodogram()`](https://docs.scipy.org/doc/scipy/reference/generated/scipy.signal.periodogram.html#scipy.signal.periodogram))<br>
            - `'welch'`: Welch periodogram ([`scipy.signal.welch()`](https://docs.scipy.org/doc/scipy/reference/generated/scipy.signal.welch.html#scipy.signal.welch))<br>
            Defaults to `"fft"`.
        nperseg (Optional[int], optional):
            Length of each FFT segment for Welch method. If `None`, uses `scipy`'s default of 256 samples.<br>
            Defaults to `None`.
        normalize (bool, optional):
            If `True`, divide by $\log_2(\text{psd.size})$ to normalize the spectral entropy to be between $0$ and $1$. Otherwise, return the spectral entropy in bits.<br>
            Defaults to `False`.
        axis (int, optional):
            The axis along which the entropy is calculated. Default is the last axis.<br>
            Defaults to `-1`.

    Returns:
        (Union[float, NDArray[np.float64]]):
            The spectral entropy score. Returned as a float for 1D input, or a numpy array for N-dimensional input.

    ???+ example "Examples"

        ```pycon {.py .python linenums="1" title="Setup"}
        >>> from ts_stat_tests.regularity.algorithms import spectral_entropy
        >>> from ts_stat_tests.utils.data import data_airline
        >>> airline = data_airline.values

        ```

        ```pycon {.py .python linenums="1" title="Example 1: Airline Passengers Data"}
        >>> print(f"{spectral_entropy(x=airline, sf=12):.4f}")
        2.6538

        ```

        ```pycon {.py .python linenums="1" title="Example 2: Welch method for spectral entropy"}
        >>> data_sine = np.sin(2 * np.pi * 1 * np.arange(400) / 100)
        >>> print(f"{spectral_entropy(x=data_sine, sf=100, method='welch'):.4f}")
        1.2938

        ```

    ??? equation "Calculation"
        The spectral entropy ($SE$) is defined as:

        $$
        H(x, f_s) = - \sum_{i=0}^{f_s/2} P(i) \times \log_2(P(i))
        $$

        where:

        - $P(i)$ is the normalized power spectral density (PSD) at the $i$-th frequency band,
        - $f_s$ is the sampling frequency.

    ??? note "Notes"
        - The power spectrum represents the energy of the signal at different frequencies. High spectral entropy indicates multiple sources or processes with different frequencies, while low spectral entropy suggests a dominant frequency or periodicity.

    ??? success "Credit"
        All credit goes to the [`AntroPy`](https://raphaelvallat.com/antropy/) library.

    ??? question "References"
        - [Inouye, T. et al. (1991). Quantification of EEG irregularity by use of the entropy of the power spectrum. Electroencephalography and clinical neurophysiology, 79(3), 204-210.](https://pubmed.ncbi.nlm.nih.gov/1714811/)
        - [Wikipedia: Spectral density](https://en.wikipedia.org/wiki/Spectral_density)
        - [Wikipedia: Welch's method](https://en.wikipedia.org/wiki/Welch%27s_method)

    ??? tip "See Also"
        - [`antropy.spectral_entropy`](https://raphaelvallat.com/antropy/build/html/generated/antropy.spectral_entropy.html)
        - [`antropy.app_entropy`](https://raphaelvallat.com/antropy/build/html/generated/antropy.app_entropy.html)
        - [`antropy.sample_entropy`](https://raphaelvallat.com/antropy/build/html/generated/antropy.sample_entropy.html)
        - [`antropy.perm_entropy`](https://raphaelvallat.com/antropy/build/html/generated/antropy.perm_entropy.html)
    """
    return a_spectral_entropy(
        x=x,
        sf=sf,
        method=method,
        nperseg=nperseg,
        normalize=normalize,
        axis=axis,
    )

svd_entropy 🔗

svd_entropy(
    x: ArrayLike,
    order: int = 3,
    delay: int = 1,
    normalize: bool = False,
) -> float

Summary

SVD entropy is a measure of the complexity or randomness of a time series based on Singular Value Decomposition (SVD).

This function implements the svd_entropy() function from the AntroPy library.

Details

SVD entropy is calculated by first embedding the time series into a matrix, then performing SVD on that matrix to obtain the singular values. The entropy is then calculated from the normalized singular values.

Parameters:

Name	Type	Description	Default
`x`	`ArrayLike`	One-dimensional time series of shape `(n_times,)`.	required
`order`	`int`	Order of the SVD entropy (embedding dimension). Defaults to `3`.	`3`
`delay`	`int`	Time delay (lag). Defaults to `1`.	`1`
`normalize`	`bool`	If `True`, divide by \(\log_2(\text{order}!)\) to normalize the entropy between \(0\) and \(1\). Defaults to `False`.	`False`

Returns:

Type	Description
`float`	The SVD entropy of the data set.

Examples

Setup
>>> from ts_stat_tests.regularity.algorithms import svd_entropy
>>> from ts_stat_tests.utils.data import data_random
>>> random = data_random

Example 1: Basic SVD entropy
>>> print(f"{svd_entropy(random):.4f}")
1.3514

Calculation

The SVD entropy is calculated as the Shannon entropy of the singular values of the embedded matrix.

Notes

Singular Value Decomposition (SVD) is a factorization of a real or complex matrix. It is the generalization of the eigendecomposition of a positive semidefinite normal matrix.

Credit

All credit goes to the AntroPy library.

See Also

antropy.svd_entropy
ts_stat_tests.regularity.algorithms.approx_entropy
ts_stat_tests.regularity.algorithms.sample_entropy
ts_stat_tests.regularity.algorithms.permutation_entropy
ts_stat_tests.regularity.algorithms.spectral_entropy

Source code in src/ts_stat_tests/regularity/algorithms.py

@typechecked
def svd_entropy(
    x: ArrayLike,
    order: int = 3,
    delay: int = 1,
    normalize: bool = False,
) -> float:
    r"""
    !!! note "Summary"
        SVD entropy is a measure of the complexity or randomness of a time series based on Singular Value Decomposition (SVD).

        This function implements the [`svd_entropy()`](https://raphaelvallat.com/antropy/build/html/generated/antropy.svd_entropy.html) function from the [`AntroPy`](https://raphaelvallat.com/antropy/) library.

    ???+ abstract "Details"
        SVD entropy is calculated by first embedding the time series into a matrix, then performing SVD on that matrix to obtain the singular values. The entropy is then calculated from the normalized singular values.

    Params:
        x (ArrayLike):
            One-dimensional time series of shape `(n_times,)`.
        order (int, optional):
            Order of the SVD entropy (embedding dimension).<br>
            Defaults to `3`.
        delay (int, optional):
            Time delay (lag).<br>
            Defaults to `1`.
        normalize (bool, optional):
            If `True`, divide by $\log_2(\text{order}!)$ to normalize the entropy between $0$ and $1$.<br>
            Defaults to `False`.

    Returns:
        (float):
            The SVD entropy of the data set.

    ???+ example "Examples"

        ```pycon {.py .python linenums="1" title="Setup"}
        >>> from ts_stat_tests.regularity.algorithms import svd_entropy
        >>> from ts_stat_tests.utils.data import data_random
        >>> random = data_random

        ```

        ```pycon {.py .python linenums="1" title="Example 1: Basic SVD entropy"}
        >>> print(f"{svd_entropy(random):.4f}")
        1.3514

        ```

    ??? equation "Calculation"
        The SVD entropy is calculated as the Shannon entropy of the singular values of the embedded matrix.

    ??? note "Notes"
        - Singular Value Decomposition (SVD) is a factorization of a real or complex matrix. It is the generalization of the eigendecomposition of a positive semidefinite normal matrix.

    ??? success "Credit"
        All credit goes to the [`AntroPy`](https://raphaelvallat.com/antropy/) library.

    ??? tip "See Also"
        - [`antropy.svd_entropy`](https://raphaelvallat.com/antropy/build/html/generated/antropy.svd_entropy.html)
        - [`ts_stat_tests.regularity.algorithms.approx_entropy`][ts_stat_tests.regularity.algorithms.approx_entropy]
        - [`ts_stat_tests.regularity.algorithms.sample_entropy`][ts_stat_tests.regularity.algorithms.sample_entropy]
        - [`ts_stat_tests.regularity.algorithms.permutation_entropy`][ts_stat_tests.regularity.algorithms.permutation_entropy]
        - [`ts_stat_tests.regularity.algorithms.spectral_entropy`][ts_stat_tests.regularity.algorithms.spectral_entropy]
    """
    return a_svd_entropy(
        x=x,
        order=order,
        delay=delay,
        normalize=normalize,
    )

Test the regularity of a given Time-Series Dataset🔗

Introduction🔗

Modules🔗

ts_stat_tests.regularity.tests 🔗

entropy 🔗

regularity 🔗

is_regular 🔗

ts_stat_tests.regularity.algorithms 🔗

VALID_KDTREE_METRIC_OPTIONS module-attribute 🔗

VALID_SPECTRAL_ENTROPY_METHOD_OPTIONS module-attribute 🔗

approx_entropy 🔗

sample_entropy 🔗

permutation_entropy 🔗

spectral_entropy 🔗

svd_entropy 🔗

Test the `regularity` of a given Time-Series Dataset🔗

VALID_KDTREE_METRIC_OPTIONS `module-attribute` 🔗

VALID_SPECTRAL_ENTROPY_METHOD_OPTIONS `module-attribute` 🔗