Skip to content

Test the regularity of a given Time-Series Dataset🔗

Introduction🔗

Summary

As stated by Selva Prabhakaran:

The more regular and repeatable patterns a time series has, the easier it is to forecast.

The 'Approximate Entropy' algorithm can be used to quantify the regularity and unpredictability of fluctuations in a time series.

The higher the approximate entropy, the more difficult it is to forecast it.

Another better alternate is the 'Sample Entropy'.

Sample Entropy is similar to approximate entropy but is more consistent in estimating the complexity even for smaller time series.

For example, a random time series with fewer data points can have a lower 'approximate entropy' than a more 'regular' time series, whereas, a longer random time series will have a higher 'approximate entropy'.


For more info, see: Time Series Analysis in Python: A Comprehensive Guide with Examples.

Info

To state that the data is 'regular' is to say that the data points are evenly spaced, regularly collected, and not missing data points (ie. do not contain excessive NA values). Logically, it is not always necessary to conduct the Test for Regularity on automatically collected data (like for example with Energy Prices, or Daily Temperature), however if this data was collected manually then it is highly recommended. If the data does not meet the requirements of Regularity, then it is necessary to return to the data collection plan, and revise the methodology used.

library category algorithm short import script url
antropy Regularity Approximate Entropy AppEn from antropy import app_entropy https://raphaelvallat.com/antropy/build/html/generated/antropy.app_entropy.html
antropy Regularity Sample Entropy SampEn from antropy import sample_entropy https://raphaelvallat.com/antropy/build/html/generated/antropy.sample_entropy.html
antropy Regularity Permutation Entropy PermEn from antropy import perm_entropy https://raphaelvallat.com/antropy/build/html/generated/antropy.perm_entropy.html
antropy Regularity Spectral Entropy SpecEn from antropy import spectral_entropy https://raphaelvallat.com/antropy/build/html/generated/antropy.spectral_entropy.html
antropy Regularity SVD Entropy SvdEn from antropy import svd_entropy https://raphaelvallat.com/antropy/build/html/generated/antropy.svd_entropy.html

For more info, see: The Future of Australian Energy Prices: Time-Series Analysis of Historic Prices and Forecast for Future Prices.

Source Library

The AntroPy package was chosen because it provides well-tested and efficient implementations of approximate entropy, sample entropy, and related complexity measures for time-series data, is built on top of the scientific Python stack (NumPy/SciPy), and is actively maintained and open source, making it a reliable choice for reproducible statistical analysis.

Source Module

All of the source code can be found within the modules:

Modules🔗

ts_stat_tests.regularity.tests 🔗

Summary

This module contains convenience functions and tests for regularity measures, allowing for easy access to different entropy algorithms.

entropy 🔗

entropy(
    x: ArrayLike,
    algorithm: str = "sample",
    order: int = 2,
    metric: VALID_KDTREE_METRIC_OPTIONS = "chebyshev",
    sf: float = 1,
    normalize: bool = True,
) -> Union[float, NDArray[np.float64]]

Summary

Test for the entropy of a given data set.

Details

This function is a convenience wrapper around the five underlying algorithms:
- approx_entropy()
- sample_entropy()
- spectral_entropy()
- permutation_entropy()
- svd_entropy()

Parameters:

Name Type Description Default
x ArrayLike

The data to be checked. Should be a 1-D or N-D data array.

required
algorithm str

Which entropy algorithm to use.
- sample_entropy(): ["sample", "sampl", "samp"]
- approx_entropy(): ["app", "approx"]
- spectral_entropy(): ["spec", "spect", "spectral"]
- permutation_entropy(): ["perm", "permutation"]
- svd_entropy(): ["svd", "svd_entropy"]
Defaults to "sample".

'sample'
order int

Embedding dimension.
Only relevant when algorithm=sample or algorithm=approx.
Defaults to 2.

2
metric VALID_KDTREE_METRIC_OPTIONS

Name of the distance metric function used with sklearn.neighbors.KDTree. Default is to use the Chebyshev distance.
Only relevant when algorithm=sample or algorithm=approx.
Defaults to "chebyshev".

'chebyshev'
sf float

Sampling frequency, in Hz.
Only relevant when algorithm=spectral.
Defaults to 1.

1
normalize bool

If True, divide by \(log2(psd.size)\) to normalize the spectral entropy to be between \(0\) and \(1\). Otherwise, return the spectral entropy in bit.
Only relevant when algorithm=spectral.
Defaults to True.

True

Raises:

Type Description
ValueError

When the given value for algorithm is not valid.

Returns:

Type Description
Union[float, NDArray[float64]]

The calculated entropy value.

Credit

All credit goes to the AntroPy library.

Examples
Setup
1
2
3
>>> from ts_stat_tests.regularity.tests import entropy
>>> from ts_stat_tests.utils.data import data_normal
>>> normal = data_normal
Example 1: Sample Entropy
1
2
>>> print(entropy(x=normal, algorithm="sample"))
2.2374...
Example 2: Approx Entropy
1
2
>>> print(entropy(x=normal, algorithm="approx"))
1.6643...
Example 3: Spectral Entropy
1
2
>>> print(entropy(x=normal, algorithm="spectral", sf=1))
0.9329...
References
See Also
Source code in src/ts_stat_tests/regularity/tests.py
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
@typechecked
def entropy(
    x: ArrayLike,
    algorithm: str = "sample",
    order: int = 2,
    metric: VALID_KDTREE_METRIC_OPTIONS = "chebyshev",
    sf: float = 1,
    normalize: bool = True,
) -> Union[float, NDArray[np.float64]]:
    """
    !!! note "Summary"
        Test for the entropy of a given data set.

    ???+ abstract "Details"
        This function is a convenience wrapper around the five underlying algorithms:<br>
        - [`approx_entropy()`][ts_stat_tests.regularity.algorithms.approx_entropy]<br>
        - [`sample_entropy()`][ts_stat_tests.regularity.algorithms.sample_entropy]<br>
        - [`spectral_entropy()`][ts_stat_tests.regularity.algorithms.spectral_entropy]<br>
        - [`permutation_entropy()`][ts_stat_tests.regularity.algorithms.permutation_entropy]<br>
        - [`svd_entropy()`][ts_stat_tests.regularity.algorithms.svd_entropy]

    Params:
        x (ArrayLike):
            The data to be checked. Should be a `1-D` or `N-D` data array.
        algorithm (str, optional):
            Which entropy algorithm to use.<br>
            - `sample_entropy()`: `["sample", "sampl", "samp"]`<br>
            - `approx_entropy()`: `["app", "approx"]`<br>
            - `spectral_entropy()`: `["spec", "spect", "spectral"]`<br>
            - `permutation_entropy()`: `["perm", "permutation"]`<br>
            - `svd_entropy()`: `["svd", "svd_entropy"]`<br>
            Defaults to `"sample"`.
        order (int, optional):
            Embedding dimension.<br>
            Only relevant when `algorithm=sample` or `algorithm=approx`.<br>
            Defaults to `2`.
        metric (VALID_KDTREE_METRIC_OPTIONS):
            Name of the distance metric function used with [`sklearn.neighbors.KDTree`](https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.KDTree.html#sklearn.neighbors.KDTree). Default is to use the [Chebyshev distance](https://en.wikipedia.org/wiki/Chebyshev_distance).<br>
            Only relevant when `algorithm=sample` or `algorithm=approx`.<br>
            Defaults to `"chebyshev"`.
        sf (float, optional):
            Sampling frequency, in Hz.<br>
            Only relevant when `algorithm=spectral`.<br>
            Defaults to `1`.
        normalize (bool, optional):
            If `True`, divide by $log2(psd.size)$ to normalize the spectral entropy to be between $0$ and $1$. Otherwise, return the spectral entropy in bit.<br>
            Only relevant when `algorithm=spectral`.<br>
            Defaults to `True`.

    Raises:
        (ValueError):
            When the given value for `algorithm` is not valid.

    Returns:
        (Union[float, NDArray[np.float64]]):
            The calculated entropy value.

    ??? success "Credit"
        All credit goes to the [`AntroPy`](https://raphaelvallat.com/antropy/) library.

    ???+ example "Examples"

        ```pycon {.py .python linenums="1" title="Setup"}
        >>> from ts_stat_tests.regularity.tests import entropy
        >>> from ts_stat_tests.utils.data import data_normal
        >>> normal = data_normal

        ```

        ```pycon {.py .python linenums="1" title="Example 1: Sample Entropy"}
        >>> print(entropy(x=normal, algorithm="sample"))
        2.2374...

        ```

        ```pycon {.py .python linenums="1" title="Example 2: Approx Entropy"}
        >>> print(entropy(x=normal, algorithm="approx"))
        1.6643...

        ```

        ```pycon {.py .python linenums="1" title="Example 3: Spectral Entropy"}
        >>> print(entropy(x=normal, algorithm="spectral", sf=1))
        0.9329...

        ```

    ??? question "References"
        - Richman, J. S. et al. (2000). Physiological time-series analysis using approximate entropy and sample entropy. American Journal of Physiology-Heart and Circulatory Physiology, 278(6), H2039-H2049.
        - https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.DistanceMetric.html
        - Inouye, T. et al. (1991). Quantification of EEG irregularity by use of the entropy of the power spectrum. Electroencephalography and clinical neurophysiology, 79(3), 204-210.
        - https://en.wikipedia.org/wiki/Spectral_density
        - https://en.wikipedia.org/wiki/Welch%27s_method

    ??? tip "See Also"
        - [`regularity()`][ts_stat_tests.regularity.tests.regularity]
        - [`approx_entropy()`][ts_stat_tests.regularity.algorithms.approx_entropy]
        - [`sample_entropy()`][ts_stat_tests.regularity.algorithms.sample_entropy]
        - [`spectral_entropy()`][ts_stat_tests.regularity.algorithms.spectral_entropy]
        - [`permutation_entropy()`][ts_stat_tests.regularity.algorithms.permutation_entropy]
        - [`svd_entropy()`][ts_stat_tests.regularity.algorithms.svd_entropy]
    """
    options: dict[str, tuple[str, ...]] = {
        "sampl": ("sample", "sampl", "samp"),
        "approx": ("app", "approx"),
        "spect": ("spec", "spect", "spectral"),
        "perm": ("perm", "permutation"),
        "svd": ("svd", "svd_entropy"),
    }
    if algorithm in options["sampl"]:
        return sample_entropy(x=x, order=order, metric=metric)
    if algorithm in options["approx"]:
        return approx_entropy(x=x, order=order, metric=metric)
    if algorithm in options["spect"]:
        return spectral_entropy(x=x, sf=sf, normalize=normalize)
    if algorithm in options["perm"]:
        return permutation_entropy(x=x, order=order, normalize=normalize)
    if algorithm in options["svd"]:
        return svd_entropy(x=x, order=order, normalize=normalize)
    raise ValueError(
        generate_error_message(
            parameter_name="algorithm",
            value_parsed=algorithm,
            options=options,
        )
    )

regularity 🔗

regularity(
    x: ArrayLike,
    algorithm: str = "sample",
    order: int = 2,
    metric: VALID_KDTREE_METRIC_OPTIONS = "chebyshev",
    sf: float = 1,
    normalize: bool = True,
) -> Union[float, NDArray[np.float64]]

Summary

Test for the regularity of a given data set.

Details

This is a pass-through, convenience wrapper around the entropy() function.

Parameters:

Name Type Description Default
x ArrayLike

The data to be checked. Should be a 1-D or N-D data array.

required
algorithm str

Which entropy algorithm to use.
- sample_entropy(): ["sample", "sampl", "samp"]
- approx_entropy(): ["app", "approx"]
- spectral_entropy(): ["spec", "spect", "spectral"]
- permutation_entropy(): ["perm", "permutation"]
- svd_entropy(): ["svd", "svd_entropy"]
Defaults to "sample".

'sample'
order int

Embedding dimension.
Only relevant when algorithm=sample or algorithm=approx.
Defaults to 2.

2
metric VALID_KDTREE_METRIC_OPTIONS

Name of the distance metric function used with sklearn.neighbors.KDTree. Default is to use the Chebyshev distance.
Only relevant when algorithm=sample or algorithm=approx.
Defaults to "chebyshev".

'chebyshev'
sf float

Sampling frequency, in Hz.
Only relevant when algorithm=spectral.
Defaults to 1.

1
normalize bool

If True, divide by \(log2(psd.size)\) to normalize the spectral entropy to be between \(0\) and \(1\). Otherwise, return the spectral entropy in bit.
Only relevant when algorithm=spectral.
Defaults to True.

True

Returns:

Type Description
Union[float, NDArray[float64]]

The calculated regularity (entropy) value.

Credit

All credit goes to the AntroPy library.

Examples
Setup
1
2
3
>>> from ts_stat_tests.regularity.tests import regularity
>>> from ts_stat_tests.utils.data import data_normal
>>> normal = data_normal
Example 1: Sample Entropy
1
2
>>> print(regularity(x=normal, algorithm="sample"))
2.2374...
Example 2: Approx Entropy
1
2
>>> print(regularity(x=normal, algorithm="approx"))
1.6643...
Example 3: Spectral Entropy
1
2
>>> print(regularity(x=normal, algorithm="spectral", sf=1))
0.9329...
References
See Also
Source code in src/ts_stat_tests/regularity/tests.py
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
@typechecked
def regularity(
    x: ArrayLike,
    algorithm: str = "sample",
    order: int = 2,
    metric: VALID_KDTREE_METRIC_OPTIONS = "chebyshev",
    sf: float = 1,
    normalize: bool = True,
) -> Union[float, NDArray[np.float64]]:
    """
    !!! note "Summary"
        Test for the regularity of a given data set.

    ???+ abstract "Details"
        This is a pass-through, convenience wrapper around the [`entropy()`][ts_stat_tests.regularity.tests.entropy] function.

    Params:
        x (ArrayLike):
            The data to be checked. Should be a `1-D` or `N-D` data array.
        algorithm (str, optional):
            Which entropy algorithm to use.<br>
            - `sample_entropy()`: `["sample", "sampl", "samp"]`<br>
            - `approx_entropy()`: `["app", "approx"]`<br>
            - `spectral_entropy()`: `["spec", "spect", "spectral"]`<br>
            - `permutation_entropy()`: `["perm", "permutation"]`<br>
            - `svd_entropy()`: `["svd", "svd_entropy"]`<br>
            Defaults to `"sample"`.
        order (int, optional):
            Embedding dimension.<br>
            Only relevant when `algorithm=sample` or `algorithm=approx`.<br>
            Defaults to `2`.
        metric (VALID_KDTREE_METRIC_OPTIONS):
            Name of the distance metric function used with [`sklearn.neighbors.KDTree`](https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.KDTree.html#sklearn.neighbors.KDTree). Default is to use the [Chebyshev distance](https://en.wikipedia.org/wiki/Chebyshev_distance).<br>
            Only relevant when `algorithm=sample` or `algorithm=approx`.<br>
            Defaults to `"chebyshev"`.
        sf (float, optional):
            Sampling frequency, in Hz.<br>
            Only relevant when `algorithm=spectral`.<br>
            Defaults to `1`.
        normalize (bool, optional):
            If `True`, divide by $log2(psd.size)$ to normalize the spectral entropy to be between $0$ and $1$. Otherwise, return the spectral entropy in bit.<br>
            Only relevant when `algorithm=spectral`.<br>
            Defaults to `True`.

    Returns:
        (Union[float, NDArray[np.float64]]):
            The calculated regularity (entropy) value.

    ??? success "Credit"
        All credit goes to the [`AntroPy`](https://raphaelvallat.com/antropy/) library.

    ???+ example "Examples"

        ```pycon {.py .python linenums="1" title="Setup"}
        >>> from ts_stat_tests.regularity.tests import regularity
        >>> from ts_stat_tests.utils.data import data_normal
        >>> normal = data_normal

        ```

        ```pycon {.py .python linenums="1" title="Example 1: Sample Entropy"}
        >>> print(regularity(x=normal, algorithm="sample"))
        2.2374...

        ```

        ```pycon {.py .python linenums="1" title="Example 2: Approx Entropy"}
        >>> print(regularity(x=normal, algorithm="approx"))
        1.6643...

        ```

        ```pycon {.py .python linenums="1" title="Example 3: Spectral Entropy"}
        >>> print(regularity(x=normal, algorithm="spectral", sf=1))
        0.9329...

        ```

    ??? question "References"
        - Richman, J. S. et al. (2000). Physiological time-series analysis using approximate entropy and sample entropy. American Journal of Physiology-Heart and Circulatory Physiology, 278(6), H2039-H2049.
        - https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.DistanceMetric.html
        - Inouye, T. et al. (1991). Quantification of EEG irregularity by use of the entropy of the power spectrum. Electroencephalography and clinical neurophysiology, 79(3), 204-210.
        - https://en.wikipedia.org/wiki/Spectral_density
        - https://en.wikipedia.org/wiki/Welch%27s_method

    ??? tip "See Also"
        - [`entropy()`][ts_stat_tests.regularity.tests.entropy]
        - [`approx_entropy()`][ts_stat_tests.regularity.algorithms.approx_entropy]
        - [`sample_entropy()`][ts_stat_tests.regularity.algorithms.sample_entropy]
        - [`spectral_entropy()`][ts_stat_tests.regularity.algorithms.spectral_entropy]
        - [`permutation_entropy()`][ts_stat_tests.regularity.algorithms.permutation_entropy]
        - [`svd_entropy()`][ts_stat_tests.regularity.algorithms.svd_entropy]
    """
    return entropy(x=x, algorithm=algorithm, order=order, metric=metric, sf=sf, normalize=normalize)

is_regular 🔗

is_regular(
    x: ArrayLike,
    algorithm: str = "sample",
    order: int = 2,
    sf: float = 1,
    metric: VALID_KDTREE_METRIC_OPTIONS = "chebyshev",
    normalize: bool = True,
    tolerance: Union[str, float, int, None] = "default",
) -> dict[str, Union[str, float, bool]]

Summary

Test whether a given data set is regular or not.

Details

This function implements the given algorithm (defined in the parameter algorithm), and returns a dictionary containing the relevant data:

{
    "result": ...,  # The result of the test. Will be `True` if `entropy<tolerance`, and `False` otherwise
    "entropy": ...,  # A `float` value, the result of the `entropy()` function
    "tolerance": ...,  # A `float` value, which is the tolerance used for determining whether or not the `entropy` is `regular` or not
}

Parameters:

Name Type Description Default
x ArrayLike

The data to be checked. Should be a 1-D or N-D data array.

required
algorithm str

Which entropy algorithm to use.
- sample_entropy(): ["sample", "sampl", "samp"]
- approx_entropy(): ["app", "approx"]
- spectral_entropy(): ["spec", "spect", "spectral"]
- permutation_entropy(): ["perm", "permutation"]
- svd_entropy(): ["svd", "svd_entropy"]
Defaults to "sample".

'sample'
order int

Embedding dimension.
Only relevant when algorithm=sample or algorithm=approx.
Defaults to 2.

2
metric VALID_KDTREE_METRIC_OPTIONS

Name of the distance metric function used with sklearn.neighbors.KDTree. Default is to use the Chebyshev distance.
Only relevant when algorithm=sample or algorithm=approx.
Defaults to "chebyshev".

'chebyshev'
sf float

Sampling frequency, in Hz.
Only relevant when algorithm=spectral.
Defaults to 1.

1
normalize bool

If True, divide by \(log2(psd.size)\) to normalize the spectral entropy to be between \(0\) and \(1\). Otherwise, return the spectral entropy in bit.
Only relevant when algorithm=spectral.
Defaults to True.

True
tolerance Union[str, float, int, None]

The tolerance value used to determine whether or not the result is regular or not.
- If tolerance is either type int or float, then this value will be used.
- If tolerance is either "default" or None, then tolerance will be derived from x using the calculation:

tolerance = 0.2 * np.std(a=x)
- If any other value is given, then a ValueError error will be raised.
Defaults to "default".

'default'

Raises:

Type Description
ValueError

If the given tolerance parameter is invalid.

Valid options are:

  • A number with type float or int, or
  • A string with value default, or
  • The value None.

Returns:

Type Description
dict[str, Union[str, float, bool]]

A dictionary containing the test results:

  • result (bool): True if entropy < tolerance.
  • entropy (float): The calculated entropy value.
  • tolerance (float): The threshold used for regularity.
Credit

All credit goes to the AntroPy library.

Examples
Setup
1
2
3
>>> from ts_stat_tests.regularity.tests import is_regular
>>> from ts_stat_tests.utils.data import data_normal
>>> normal = data_normal
Example 1: Sample Entropy
1
2
>>> print(is_regular(x=normal, algorithm="sample"))
{'result': False, 'entropy': 2.23743099781426, 'tolerance': 0.20294652904313437}
Example 2: Approx Entropy
1
2
>>> print(is_regular(x=normal, algorithm="approx", tolerance=0.5))
{'result': False, 'entropy': 1.6643808251518548, 'tolerance': 0.5}
References
See Also
Source code in src/ts_stat_tests/regularity/tests.py
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
@typechecked
def is_regular(
    x: ArrayLike,
    algorithm: str = "sample",
    order: int = 2,
    sf: float = 1,
    metric: VALID_KDTREE_METRIC_OPTIONS = "chebyshev",
    normalize: bool = True,
    tolerance: Union[str, float, int, None] = "default",
) -> dict[str, Union[str, float, bool]]:
    """
    !!! note "Summary"
        Test whether a given data set is `regular` or not.

    ???+ abstract "Details"
        This function implements the given algorithm (defined in the parameter `algorithm`), and returns a dictionary containing the relevant data:
        ```python
        {
            "result": ...,  # The result of the test. Will be `True` if `entropy<tolerance`, and `False` otherwise
            "entropy": ...,  # A `float` value, the result of the `entropy()` function
            "tolerance": ...,  # A `float` value, which is the tolerance used for determining whether or not the `entropy` is `regular` or not
        }
        ```

    Params:
        x (ArrayLike):
            The data to be checked. Should be a `1-D` or `N-D` data array.
        algorithm (str, optional):
            Which entropy algorithm to use.<br>
            - `sample_entropy()`: `["sample", "sampl", "samp"]`<br>
            - `approx_entropy()`: `["app", "approx"]`<br>
            - `spectral_entropy()`: `["spec", "spect", "spectral"]`<br>
            - `permutation_entropy()`: `["perm", "permutation"]`<br>
            - `svd_entropy()`: `["svd", "svd_entropy"]`<br>
            Defaults to `"sample"`.
        order (int, optional):
            Embedding dimension.<br>
            Only relevant when `algorithm=sample` or `algorithm=approx`.<br>
            Defaults to `2`.
        metric (VALID_KDTREE_METRIC_OPTIONS):
            Name of the distance metric function used with [`sklearn.neighbors.KDTree`](https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.KDTree.html#sklearn.neighbors.KDTree). Default is to use the [Chebyshev distance](https://en.wikipedia.org/wiki/Chebyshev_distance).<br>
            Only relevant when `algorithm=sample` or `algorithm=approx`.<br>
            Defaults to `"chebyshev"`.
        sf (float, optional):
            Sampling frequency, in Hz.<br>
            Only relevant when `algorithm=spectral`.<br>
            Defaults to `1`.
        normalize (bool, optional):
            If `True`, divide by $log2(psd.size)$ to normalize the spectral entropy to be between $0$ and $1$. Otherwise, return the spectral entropy in bit.<br>
            Only relevant when `algorithm=spectral`.<br>
            Defaults to `True`.
        tolerance (Union[str, float, int, None], optional):
            The tolerance value used to determine whether or not the result is `regular` or not.<br>
            - If `tolerance` is either type `int` or `float`, then this value will be used.<br>
            - If `tolerance` is either `"default"` or `None`, then `tolerance` will be derived from `x` using the calculation:
                ```python
                tolerance = 0.2 * np.std(a=x)
                ```
            - If any other value is given, then a `ValueError` error will be raised.<br>
            Defaults to `"default"`.

    Raises:
        (ValueError):
            If the given `tolerance` parameter is invalid.

            Valid options are:

            - A number with type `float` or `int`, or
            - A string with value `default`, or
            - The value `None`.

    Returns:
        (dict[str, Union[str, float, bool]]):
            A dictionary containing the test results:

            - `result` (bool): `True` if `entropy < tolerance`.
            - `entropy` (float): The calculated entropy value.
            - `tolerance` (float): The threshold used for regularity.

    ??? success "Credit"
        All credit goes to the [`AntroPy`](https://raphaelvallat.com/antropy/) library.

    ???+ example "Examples"

        ```pycon {.py .python linenums="1" title="Setup"}
        >>> from ts_stat_tests.regularity.tests import is_regular
        >>> from ts_stat_tests.utils.data import data_normal
        >>> normal = data_normal

        ```

        ```pycon {.py .python linenums="1" title="Example 1: Sample Entropy"}
        >>> print(is_regular(x=normal, algorithm="sample"))
        {'result': False, 'entropy': 2.23743099781426, 'tolerance': 0.20294652904313437}

        ```

        ```pycon {.py .python linenums="1" title="Example 2: Approx Entropy"}
        >>> print(is_regular(x=normal, algorithm="approx", tolerance=0.5))
        {'result': False, 'entropy': 1.6643808251518548, 'tolerance': 0.5}

        ```

    ??? question "References"
        - Richman, J. S. et al. (2000). Physiological time-series analysis using approximate entropy and sample entropy. American Journal of Physiology-Heart and Circulatory Physiology, 278(6), H2039-H2049.
        - https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.DistanceMetric.html
        - Inouye, T. et al. (1991). Quantification of EEG irregularity by use of the entropy of the power spectrum. Electroencephalography and clinical neurophysiology, 79(3), 204-210.
        - https://en.wikipedia.org/wiki/Spectral_density
        - https://en.wikipedia.org/wiki/Welch%27s_method

    ??? tip "See Also"
        - [`entropy()`][ts_stat_tests.regularity.tests.entropy]
        - [`regularity()`][ts_stat_tests.regularity.tests.regularity]
        - [`approx_entropy()`][ts_stat_tests.regularity.algorithms.approx_entropy]
        - [`sample_entropy()`][ts_stat_tests.regularity.algorithms.sample_entropy]
        - [`spectral_entropy()`][ts_stat_tests.regularity.algorithms.spectral_entropy]
        - [`permutation_entropy()`][ts_stat_tests.regularity.algorithms.permutation_entropy]
        - [`svd_entropy()`][ts_stat_tests.regularity.algorithms.svd_entropy]
    """
    if isinstance(tolerance, (float, int)):
        tol = tolerance
    elif tolerance in ["default", None]:
        tol = 0.2 * np.std(a=np.asarray(x))
    else:
        raise ValueError(
            f"Invalid option for `tolerance` parameter: {tolerance}.\n"
            f"Valid options are:\n"
            f"- A number with type `float` or `int`,\n"
            f"- A string with value `default`,\n"
            f"- The value `None`."
        )
    value = regularity(x=x, order=order, sf=sf, metric=metric, algorithm=algorithm, normalize=normalize)
    result = value < tol
    return {
        "result": bool(result),
        "entropy": float(value),
        "tolerance": float(tol),
    }

ts_stat_tests.regularity.algorithms 🔗

Summary

This module contains algorithms to compute regularity measures for time series data, including approximate entropy, sample entropy, spectral entropy, and permutation entropy.

VALID_KDTREE_METRIC_OPTIONS module-attribute 🔗

VALID_KDTREE_METRIC_OPTIONS = Literal[
    "euclidean",
    "l2",
    "minkowski",
    "p",
    "manhattan",
    "cityblock",
    "l1",
    "chebyshev",
    "infinity",
]

VALID_SPECTRAL_ENTROPY_METHOD_OPTIONS module-attribute 🔗

VALID_SPECTRAL_ENTROPY_METHOD_OPTIONS = Literal[
    "fft", "welch"
]

approx_entropy 🔗

approx_entropy(
    x: ArrayLike,
    order: int = 2,
    tolerance: Optional[float] = None,
    metric: VALID_KDTREE_METRIC_OPTIONS = "chebyshev",
) -> float

Summary

Approximate entropy is a measure of the amount of regularity or predictability in a time series. It is used to quantify the degree of self-similarity of a signal over different time scales, and can be useful for detecting underlying patterns or trends in data.

This function implements the app_entropy() function from the AntroPy library.

Details

Approximate entropy is a technique used to quantify the amount of regularity and the unpredictability of fluctuations over time-series data. Smaller values indicate that the data is more regular and predictable.

To calculate approximate entropy, we first need to define a window size or scale factor, which determines the length of the subsequences that are used to compare the similarity of the time series. We then compare all possible pairs of subsequences within the time series and calculate the probability that two subsequences are within a certain tolerance level of each other, where the tolerance level is usually expressed as a percentage of the standard deviation of the time series.

The approximate entropy is then defined as the negative natural logarithm of the average probability of similarity across all possible pairs of subsequences, normalized by the length of the time series and the scale factor.

The approximate entropy measure is useful in a variety of applications, such as the analysis of physiological signals, financial time series, and climate data. It can be used to detect changes in the regularity or predictability of a time series over time, and can provide insights into the underlying dynamics or mechanisms that generate the signal.

Parameters:

Name Type Description Default
x ArrayLike

One-dimensional time series of shape (n_times,).

required
order int

Embedding dimension.
Defaults to 2.

2
tolerance Optional[float]

Tolerance level or similarity criterion. If None (default), it is set to \(0.2 \times \text{std}(x)\).
Defaults to None.

None
metric VALID_KDTREE_METRIC_OPTIONS

Name of the distance metric function used with sklearn.neighbors.KDTree. Default is to use the Chebyshev distance. For a full list of all available metrics, see sklearn.metrics.pairwise.distance_metrics and scipy.spatial.distance
Defaults to "chebyshev".

'chebyshev'

Returns:

Type Description
float

The approximate entropy score.

Examples
Setup
1
2
3
4
>>> from ts_stat_tests.regularity.algorithms import approx_entropy
>>> from ts_stat_tests.utils.data import data_airline, data_random
>>> airline = data_airline.values
>>> random = data_random
Example 1: Airline Passengers Data
1
2
>>> print(f"{approx_entropy(x=airline):.4f}")
0.6451
Example 2: Random Data
1
2
>>> print(f"{approx_entropy(x=random):.4f}")
1.8177
Calculation

The equation for ApEn is:

\[ \text{ApEn}(m, r, N) = \phi_m(r) - \phi_{m+1}(r) \]

where:

  • \(m\) is the embedding dimension,
  • \(r\) is the tolerance or similarity criterion,
  • \(N\) is the length of the time series, and
  • \(\phi_m(r)\) and \(\phi_{m+1}(r)\) are the logarithms of the probabilities that two sequences of \(m\) data points in the time series that are similar to each other within a tolerance \(r\) remain similar for the next data point, for \(m\) and \(m+1\), respectively.
Notes
  • Inputs: x is a 1-dimensional array. It represents time-series data, ideally with each element in the array being a measurement or value taken at regular time intervals.
  • Settings: order is used for determining the number of values that are used to construct each permutation pattern. If the embedding dimension is too small, we may miss important patterns. If it's too large, we may overfit noise.
  • Metric: The Chebyshev metric is often used because it is a robust and computationally efficient way to measure the distance between two time series.
Credit

All credit goes to the AntroPy library.

References
See Also
Source code in src/ts_stat_tests/regularity/algorithms.py
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
@typechecked
def approx_entropy(
    x: ArrayLike,
    order: int = 2,
    tolerance: Optional[float] = None,
    metric: VALID_KDTREE_METRIC_OPTIONS = "chebyshev",
) -> float:
    r"""
    !!! note "Summary"
        Approximate entropy is a measure of the amount of regularity or predictability in a time series. It is used to quantify the degree of self-similarity of a signal over different time scales, and can be useful for detecting underlying patterns or trends in data.

        This function implements the [`app_entropy()`](https://raphaelvallat.com/antropy/build/html/generated/antropy.app_entropy.html) function from the [`AntroPy`](https://raphaelvallat.com/antropy/) library.

    ???+ abstract "Details"
        Approximate entropy is a technique used to quantify the amount of regularity and the unpredictability of fluctuations over time-series data. Smaller values indicate that the data is more regular and predictable.

        To calculate approximate entropy, we first need to define a window size or scale factor, which determines the length of the subsequences that are used to compare the similarity of the time series. We then compare all possible pairs of subsequences within the time series and calculate the probability that two subsequences are within a certain tolerance level of each other, where the tolerance level is usually expressed as a percentage of the standard deviation of the time series.

        The approximate entropy is then defined as the negative natural logarithm of the average probability of similarity across all possible pairs of subsequences, normalized by the length of the time series and the scale factor.

        The approximate entropy measure is useful in a variety of applications, such as the analysis of physiological signals, financial time series, and climate data. It can be used to detect changes in the regularity or predictability of a time series over time, and can provide insights into the underlying dynamics or mechanisms that generate the signal.

    Params:
        x (ArrayLike):
            One-dimensional time series of shape `(n_times,)`.
        order (int, optional):
            Embedding dimension.<br>
            Defaults to `2`.
        tolerance (Optional[float], optional):
            Tolerance level or similarity criterion. If `None` (default), it is set to $0.2 \times \text{std}(x)$.<br>
            Defaults to `None`.
        metric (VALID_KDTREE_METRIC_OPTIONS, optional):
            Name of the distance metric function used with [`sklearn.neighbors.KDTree`](https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.KDTree.html#sklearn.neighbors.KDTree). Default is to use the [Chebyshev distance](https://en.wikipedia.org/wiki/Chebyshev_distance). For a full list of all available metrics, see [`sklearn.metrics.pairwise.distance_metrics`](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.pairwise_distances.html) and [`scipy.spatial.distance`](https://docs.scipy.org/doc/scipy/reference/spatial.distance.html)<br>
            Defaults to `"chebyshev"`.

    Returns:
        (float):
            The approximate entropy score.

    ???+ example "Examples"

        ```pycon {.py .python linenums="1" title="Setup"}
        >>> from ts_stat_tests.regularity.algorithms import approx_entropy
        >>> from ts_stat_tests.utils.data import data_airline, data_random
        >>> airline = data_airline.values
        >>> random = data_random

        ```

        ```pycon {.py .python linenums="1" title="Example 1: Airline Passengers Data"}
        >>> print(f"{approx_entropy(x=airline):.4f}")
        0.6451

        ```

        ```pycon {.py .python linenums="1" title="Example 2: Random Data"}
        >>> print(f"{approx_entropy(x=random):.4f}")
        1.8177

        ```

    ??? equation "Calculation"
        The equation for ApEn is:

        $$
        \text{ApEn}(m, r, N) = \phi_m(r) - \phi_{m+1}(r)
        $$

        where:

        - $m$ is the embedding dimension,
        - $r$ is the tolerance or similarity criterion,
        - $N$ is the length of the time series, and
        - $\phi_m(r)$ and $\phi_{m+1}(r)$ are the logarithms of the probabilities that two sequences of $m$ data points in the time series that are similar to each other within a tolerance $r$ remain similar for the next data point, for $m$ and $m+1$, respectively.

    ??? note "Notes"
        - **Inputs**: `x` is a 1-dimensional array. It represents time-series data, ideally with each element in the array being a measurement or value taken at regular time intervals.
        - **Settings**: `order` is used for determining the number of values that are used to construct each permutation pattern. If the embedding dimension is too small, we may miss important patterns. If it's too large, we may overfit noise.
        - **Metric**: The Chebyshev metric is often used because it is a robust and computationally efficient way to measure the distance between two time series.

    ??? success "Credit"
        All credit goes to the [`AntroPy`](https://raphaelvallat.com/antropy/) library.

    ??? question "References"
        - [Richman, J. S. et al. (2000). Physiological time-series analysis using approximate entropy and sample entropy. American Journal of Physiology-Heart and Circulatory Physiology, 278(6), H2039-H2049](https://journals.physiology.org/doi/epdf/10.1152/ajpheart.2000.278.6.H2039)
        - [SK-Learn: Pairwise metrics, Affinities and Kernels](https://scikit-learn.org/stable/modules/metrics.html#metrics)
        - [Spatial data structures and algorithms](https://docs.scipy.org/doc/scipy/tutorial/spatial.html)

    ??? tip "See Also"
        - [`antropy.app_entropy`](https://raphaelvallat.com/antropy/build/html/generated/antropy.app_entropy.html)
        - [`antropy.sample_entropy`](https://raphaelvallat.com/antropy/build/html/generated/antropy.sample_entropy.html)
        - [`antropy.perm_entropy`](https://raphaelvallat.com/antropy/build/html/generated/antropy.perm_entropy.html)
        - [`antropy.spectral_entropy`](https://raphaelvallat.com/antropy/build/html/generated/antropy.spectral_entropy.html)
    """
    return a_app_entropy(
        x=x,
        order=order,
        tolerance=tolerance,
        metric=metric,
    )

sample_entropy 🔗

sample_entropy(
    x: ArrayLike,
    order: int = 2,
    tolerance: Optional[float] = None,
    metric: VALID_KDTREE_METRIC_OPTIONS = "chebyshev",
) -> float

Summary

Sample entropy is a measure of the amount of regularity or predictability in a time series. It is used to quantify the degree of self-similarity of a signal over different time scales, and can be useful for detecting underlying patterns or trends in data.

This function implements the sample_entropy() function from the AntroPy library.

Details

Sample entropy is a modification of approximate entropy, used for assessing the complexity of physiological time-series signals. It has two advantages over approximate entropy: data length independence and a relatively trouble-free implementation. Large values indicate high complexity whereas smaller values characterize more self-similar and regular signals.

The value of SampEn ranges from zero (\(0\)) to infinity (\(\infty\)), with lower values indicating higher regularity or predictability in the time series. A time series with high \(SampEn\) is more unpredictable or irregular, whereas a time series with low \(SampEn\) is more regular or predictable.

Sample entropy is often used in time series forecasting to assess the complexity of the data and to determine whether a time series is suitable for modeling with a particular forecasting method, such as ARIMA or neural networks.

Choosing an appropriate embedding dimension is crucial in ensuring that the permutation entropy calculation is robust and reliable, and captures the essential features of the time series in a meaningful way. This allows us to make more accurate and informative inferences about the behavior of the system that generated the data, and can be useful in a wide range of applications, from signal processing to data analysis and beyond.

Parameters:

Name Type Description Default
x ArrayLike

One-dimensional time series of shape (n_times,).

required
order int

Embedding dimension.
Defaults to 2.

2
tolerance Optional[float]

Tolerance level or similarity criterion. If None (default), it is set to \(0.2 \times \text{std}(x)\).
Defaults to None.

None
metric VALID_KDTREE_METRIC_OPTIONS

Name of the distance metric function used with sklearn.neighbors.KDTree. Default is to use the Chebyshev distance. For a full list of all available metrics, see sklearn.metrics.pairwise.distance_metrics and scipy.spatial.distance
Defaults to "chebyshev".

'chebyshev'

Returns:

Type Description
float

The sample entropy score.

Examples
Setup
1
2
3
4
>>> from ts_stat_tests.regularity.algorithms import sample_entropy
>>> from ts_stat_tests.utils.data import data_airline, data_random
>>> airline = data_airline.values
>>> random = data_random
Example 1: Airline Passengers Data
1
2
>>> print(f"{sample_entropy(x=airline):.4f}")
0.6177
Example 2: Random Data
1
2
>>> print(f"{sample_entropy(x=random):.4f}")
2.2017
Calculation

The equation for sample entropy (SampEn) is as follows:

\[ \text{SampEn}(m, r, N) = - \log \left( \frac {C_m(r)} {C_{m+1}(r)} \right) \]

where:

  • \(m\) is the embedding dimension,
  • \(r\) is the tolerance or similarity criterion,
  • \(N\) is the length of the time series, and
  • \(C_m(r)\) and \(C_{m+1}(r)\) are the number of \(m\)-tuples (vectors of \(m\) consecutive data points) that have a distance less than or equal to \(r\), and \((m+1)\)-tuples with the same property, respectively.

The calculation of sample entropy involves the following steps:

  1. Choose the values of \(m\) and \(r\).
  2. Construct \(m\)-tuples from the time series data.
  3. Compute the number of \(m\)-tuples that are within a distance \(r\) of each other (\(C_m(r)\)).
  4. Compute the number of \((m+1)\)-tuples that are within a distance \(r\) of each other (\(C_{m+1}(r)\)).
  5. Compute the value of \(SampEn\) using the formula above.
Notes
  • Note that if metric == 'chebyshev' and len(x) < 5000 points, then the sample entropy is computed using a fast custom Numba script. For other distance metric or longer time-series, the sample entropy is computed using a code from the mne-features package by Jean-Baptiste Schiratti and Alexandre Gramfort (requires sklearn).
  • The embedding dimension is important in the calculation of sample entropy because it affects the sensitivity of the measure to different patterns in the data. If the embedding dimension is too small, we may miss important patterns or variations. If it is too large, we may overfit the data.
Credit

All credit goes to the AntroPy library.

References
See Also
Source code in src/ts_stat_tests/regularity/algorithms.py
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
@typechecked
def sample_entropy(
    x: ArrayLike,
    order: int = 2,
    tolerance: Optional[float] = None,
    metric: VALID_KDTREE_METRIC_OPTIONS = "chebyshev",
) -> float:
    r"""
    !!! note "Summary"
        Sample entropy is a measure of the amount of regularity or predictability in a time series. It is used to quantify the degree of self-similarity of a signal over different time scales, and can be useful for detecting underlying patterns or trends in data.

        This function implements the [`sample_entropy()`](https://raphaelvallat.com/antropy/build/html/generated/antropy.sample_entropy.html) function from the [`AntroPy`](https://raphaelvallat.com/antropy/) library.

    ???+ abstract "Details"
        Sample entropy is a modification of approximate entropy, used for assessing the complexity of physiological time-series signals. It has two advantages over approximate entropy: data length independence and a relatively trouble-free implementation. Large values indicate high complexity whereas smaller values characterize more self-similar and regular signals.

        The value of SampEn ranges from zero ($0$) to infinity ($\infty$), with lower values indicating higher regularity or predictability in the time series. A time series with high $SampEn$ is more unpredictable or irregular, whereas a time series with low $SampEn$ is more regular or predictable.

        Sample entropy is often used in time series forecasting to assess the complexity of the data and to determine whether a time series is suitable for modeling with a particular forecasting method, such as ARIMA or neural networks.

        Choosing an appropriate embedding dimension is crucial in ensuring that the permutation entropy calculation is robust and reliable, and captures the essential features of the time series in a meaningful way. This allows us to make more accurate and informative inferences about the behavior of the system that generated the data, and can be useful in a wide range of applications, from signal processing to data analysis and beyond.

    Params:
        x (ArrayLike):
            One-dimensional time series of shape `(n_times,)`.
        order (int, optional):
            Embedding dimension.<br>
            Defaults to `2`.
        tolerance (Optional[float], optional):
            Tolerance level or similarity criterion. If `None` (default), it is set to $0.2 \times \text{std}(x)$.<br>
            Defaults to `None`.
        metric (VALID_KDTREE_METRIC_OPTIONS, optional):
            Name of the distance metric function used with [`sklearn.neighbors.KDTree`](https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.KDTree.html#sklearn.neighbors.KDTree). Default is to use the [Chebyshev distance](https://en.wikipedia.org/wiki/Chebyshev_distance). For a full list of all available metrics, see [`sklearn.metrics.pairwise.distance_metrics`](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.pairwise_distances.html) and [`scipy.spatial.distance`](https://docs.scipy.org/doc/scipy/reference/spatial.distance.html)<br>
            Defaults to `"chebyshev"`.

    Returns:
        (float):
            The sample entropy score.

    ???+ example "Examples"

        ```pycon {.py .python linenums="1" title="Setup"}
        >>> from ts_stat_tests.regularity.algorithms import sample_entropy
        >>> from ts_stat_tests.utils.data import data_airline, data_random
        >>> airline = data_airline.values
        >>> random = data_random

        ```

        ```pycon {.py .python linenums="1" title="Example 1: Airline Passengers Data"}
        >>> print(f"{sample_entropy(x=airline):.4f}")
        0.6177

        ```

        ```pycon {.py .python linenums="1" title="Example 2: Random Data"}
        >>> print(f"{sample_entropy(x=random):.4f}")
        2.2017

        ```

    ??? equation "Calculation"
        The equation for sample entropy (SampEn) is as follows:

        $$
        \text{SampEn}(m, r, N) = - \log \left( \frac {C_m(r)} {C_{m+1}(r)} \right)
        $$

        where:

        - $m$ is the embedding dimension,
        - $r$ is the tolerance or similarity criterion,
        - $N$ is the length of the time series, and
        - $C_m(r)$ and $C_{m+1}(r)$ are the number of $m$-tuples (vectors of $m$ consecutive data points) that have a distance less than or equal to $r$, and $(m+1)$-tuples with the same property, respectively.

        The calculation of sample entropy involves the following steps:

        1. Choose the values of $m$ and $r$.
        2. Construct $m$-tuples from the time series data.
        3. Compute the number of $m$-tuples that are within a distance $r$ of each other ($C_m(r)$).
        4. Compute the number of $(m+1)$-tuples that are within a distance $r$ of each other ($C_{m+1}(r)$).
        5. Compute the value of $SampEn$ using the formula above.

    ??? note "Notes"
        - Note that if `metric == 'chebyshev'` and `len(x) < 5000` points, then the sample entropy is computed using a fast custom Numba script. For other distance metric or longer time-series, the sample entropy is computed using a code from the [`mne-features`](https://mne.tools/mne-features/) package by Jean-Baptiste Schiratti and Alexandre Gramfort (requires sklearn).
        - The embedding dimension is important in the calculation of sample entropy because it affects the sensitivity of the measure to different patterns in the data. If the embedding dimension is too small, we may miss important patterns or variations. If it is too large, we may overfit the data.

    ??? success "Credit"
        All credit goes to the [`AntroPy`](https://raphaelvallat.com/antropy/) library.

    ??? question "References"
        - [Richman, J. S. et al. (2000). Physiological time-series analysis using approximate entropy and sample entropy. American Journal of Physiology-Heart and Circulatory Physiology, 278(6), H2039-H2049](https://journals.physiology.org/doi/epdf/10.1152/ajpheart.2000.278.6.H2039)
        - [SK-Learn: Pairwise metrics, Affinities and Kernels](https://scikit-learn.org/stable/modules/metrics.html#metrics)
        - [Spatial data structures and algorithms](https://docs.scipy.org/doc/scipy/tutorial/spatial.html)

    ??? tip "See Also"
        - [`antropy.app_entropy`](https://raphaelvallat.com/antropy/build/html/generated/antropy.app_entropy.html)
        - [`antropy.sample_entropy`](https://raphaelvallat.com/antropy/build/html/generated/antropy.sample_entropy.html)
        - [`antropy.perm_entropy`](https://raphaelvallat.com/antropy/build/html/generated/antropy.perm_entropy.html)
        - [`antropy.spectral_entropy`](https://raphaelvallat.com/antropy/build/html/generated/antropy.spectral_entropy.html)
        - [`sklearn.neighbors.KDTree`](https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.KDTree.html)
        - [`sklearn.metrics.pairwise_distances`](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.pairwise_distances.html)
        - [`scipy.spatial.distance`](https://docs.scipy.org/doc/scipy/reference/spatial.distance.html)
    """
    return a_sample_entropy(
        x=x,
        order=order,
        tolerance=tolerance,
        metric=metric,
    )

permutation_entropy 🔗

permutation_entropy(
    x: ArrayLike,
    order: int = 3,
    delay: Union[int, list, NDArray[int64]] = 1,
    normalize: bool = False,
) -> float

Summary

Permutation entropy is a measure of the complexity or randomness of a time series. It is based on the idea of permuting the order of the values in the time series and calculating the entropy of the resulting permutation patterns.

This function implements the perm_entropy() function from the AntroPy library.

Details

The permutation entropy is a complexity measure for time-series first introduced by Bandt and Pompe in 2002.

It is particularly useful for detecting nonlinear dynamics and nonstationarity in the data. The value of permutation entropy ranges from \(0\) to \(\log_2(\text{order}!)\), where the lower bound is attained for an increasing or decreasing sequence of values, and the upper bound for a completely random system where all possible permutations appear with the same probability.

Choosing an appropriate embedding dimension is crucial in ensuring that the permutation entropy calculation is robust and reliable, and captures the essential features of the time series in a meaningful way.

Parameters:

Name Type Description Default
x ArrayLike

One-dimensional time series of shape (n_times,).

required
order int

Order of permutation entropy.
Defaults to 3.

3
delay Union[int, list, NDArray[int64]]

Time delay (lag). If multiple values are passed, the average permutation entropy across all these delays is calculated.
Defaults to 1.

1
normalize bool

If True, divide by \(\log_2(\text{order}!)\) to normalize the entropy between \(0\) and \(1\). Otherwise, return the permutation entropy in bits.
Defaults to False.

False

Returns:

Type Description
Union[float, NDArray[float64]]

The permutation entropy of the data set.

Examples
Setup
1
2
3
4
>>> from ts_stat_tests.regularity.algorithms import permutation_entropy
>>> from ts_stat_tests.utils.data import data_airline, data_random
>>> airline = data_airline.values
>>> random = data_random
Example 1: Airline Passengers Data
1
2
>>> print(f"{permutation_entropy(x=airline):.4f}")
2.3601
Example 2: Random Data (Normalized)
1
2
>>> print(f"{permutation_entropy(x=random, normalize=True):.4f}")
0.9997
Calculation

The formula for permutation entropy (\(PE\)) is as follows:

\[ PE(n) = - \sum_{i=0}^{n!} p(i) \times \log_2(p(i)) \]

where:

  • \(n\) is the embedding dimension (order),
  • \(p(i)\) is the probability of the \(i\)-th ordinal pattern.

The embedded matrix \(Y\) is created by:

\[ \begin{align} y(i) &= [x_i, x_{i+\text{delay}}, \dots, x_{i+(\text{order}-1) \times \text{delay}}] \\ Y &= [y(1), y(2), \dots, y(N-(\text{order}-1) \times \text{delay})]^T \end{align} \]
Notes
  • The embedding dimension (order) determines the number of values used to construct each permutation pattern. If too small, patterns may be missed. If too large, overfitting to noise may occur.
Credit

All credit goes to the AntroPy library.

References
See Also
Source code in src/ts_stat_tests/regularity/algorithms.py
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
@typechecked
def permutation_entropy(
    x: ArrayLike,
    order: int = 3,
    delay: Union[int, list, NDArray[np.int64]] = 1,
    normalize: bool = False,
) -> float:
    r"""
    !!! note "Summary"
        Permutation entropy is a measure of the complexity or randomness of a time series. It is based on the idea of permuting the order of the values in the time series and calculating the entropy of the resulting permutation patterns.

        This function implements the [`perm_entropy()`](https://raphaelvallat.com/antropy/build/html/generated/antropy.perm_entropy.html) function from the [`AntroPy`](https://raphaelvallat.com/antropy/) library.

    ???+ abstract "Details"
        The permutation entropy is a complexity measure for time-series first introduced by Bandt and Pompe in 2002.

        It is particularly useful for detecting nonlinear dynamics and nonstationarity in the data. The value of permutation entropy ranges from $0$ to $\log_2(\text{order}!)$, where the lower bound is attained for an increasing or decreasing sequence of values, and the upper bound for a completely random system where all possible permutations appear with the same probability.

        Choosing an appropriate embedding dimension is crucial in ensuring that the permutation entropy calculation is robust and reliable, and captures the essential features of the time series in a meaningful way.

    Params:
        x (ArrayLike):
            One-dimensional time series of shape `(n_times,)`.
        order (int, optional):
            Order of permutation entropy.<br>
            Defaults to `3`.
        delay (Union[int, list, NDArray[np.int64]], optional):
            Time delay (lag). If multiple values are passed, the average permutation entropy across all these delays is calculated.<br>
            Defaults to `1`.
        normalize (bool, optional):
            If `True`, divide by $\log_2(\text{order}!)$ to normalize the entropy between $0$ and $1$. Otherwise, return the permutation entropy in bits.<br>
            Defaults to `False`.

    Returns:
        (Union[float, NDArray[np.float64]]):
            The permutation entropy of the data set.

    ???+ example "Examples"

        ```pycon {.py .python linenums="1" title="Setup"}
        >>> from ts_stat_tests.regularity.algorithms import permutation_entropy
        >>> from ts_stat_tests.utils.data import data_airline, data_random
        >>> airline = data_airline.values
        >>> random = data_random

        ```

        ```pycon {.py .python linenums="1" title="Example 1: Airline Passengers Data"}
        >>> print(f"{permutation_entropy(x=airline):.4f}")
        2.3601

        ```

        ```pycon {.py .python linenums="1" title="Example 2: Random Data (Normalized)"}
        >>> print(f"{permutation_entropy(x=random, normalize=True):.4f}")
        0.9997

        ```

    ??? equation "Calculation"
        The formula for permutation entropy ($PE$) is as follows:

        $$
        PE(n) = - \sum_{i=0}^{n!} p(i) \times \log_2(p(i))
        $$

        where:

        - $n$ is the embedding dimension (`order`),
        - $p(i)$ is the probability of the $i$-th ordinal pattern.

        The embedded matrix $Y$ is created by:

        $$
        \begin{align}
            y(i) &= [x_i, x_{i+\text{delay}}, \dots, x_{i+(\text{order}-1) \times \text{delay}}] \\
            Y &= [y(1), y(2), \dots, y(N-(\text{order}-1) \times \text{delay})]^T
        \end{align}
        $$

    ??? note "Notes"
        - The embedding dimension (`order`) determines the number of values used to construct each permutation pattern. If too small, patterns may be missed. If too large, overfitting to noise may occur.

    ??? success "Credit"
        All credit goes to the [`AntroPy`](https://raphaelvallat.com/antropy/) library.

    ??? question "References"
        - [Bandt, Christoph, and Bernd Pompe. "Permutation entropy: a natural complexity measure for time series." Physical review letters 88.17 (2002): 174102](http://materias.df.uba.ar/dnla2019c1/files/2019/03/permutation_entropy.pdf)

    ??? tip "See Also"
        - [`antropy.perm_entropy`](https://raphaelvallat.com/antropy/build/html/generated/antropy.perm_entropy.html)
        - [`antropy.app_entropy`](https://raphaelvallat.com/antropy/build/html/generated/antropy.app_entropy.html)
        - [`antropy.sample_entropy`](https://raphaelvallat.com/antropy/build/html/generated/antropy.sample_entropy.html)
        - [`antropy.spectral_entropy`](https://raphaelvallat.com/antropy/build/html/generated/antropy.spectral_entropy.html)
    """
    return a_perm_entropy(
        x=x,
        order=order,
        delay=delay,  # type: ignore[arg-type]  # antropy function can handle Union[int, list[int], NDArray[np.int64]], however the function signature is not annotated as such
        normalize=normalize,
    )

spectral_entropy 🔗

spectral_entropy(
    x: ArrayLike,
    sf: float = 1,
    method: VALID_SPECTRAL_ENTROPY_METHOD_OPTIONS = "fft",
    nperseg: Optional[int] = None,
    normalize: bool = False,
    axis: int = -1,
) -> Union[float, NDArray[np.float64]]

Summary

Spectral entropy is a measure of the amount of complexity or unpredictability in a signal's frequency domain representation. It is used to quantify the degree of randomness or regularity in the power spectrum of a signal.

This function implements the spectral_entropy() function from the AntroPy library.

Details

Spectral Entropy is defined to be the Shannon entropy of the power spectral density (PSD) of the data. It is based on the Shannon entropy, which is a measure of the uncertainty or information content of a probability distribution.

The value of spectral entropy ranges from \(0\) to \(\log_2(N)\), where \(N\) is the number of frequency bands. Lower values indicate a more concentrated or regular distribution of power, while higher values indicate a more spread-out or irregular distribution.

Spectral entropy is particularly useful for detecting periodicity and cyclical patterns, as well as changes in the frequency distribution over time.

Parameters:

Name Type Description Default
x ArrayLike

One-dimensional or N-dimensional data array.

required
sf float

Sampling frequency, in Hz.
Defaults to 1.

1
method VALID_SPECTRAL_ENTROPY_METHOD_OPTIONS

Spectral estimation method: 'fft' or 'welch'.
- 'fft': Fourier Transformation (scipy.signal.periodogram())
- 'welch': Welch periodogram (scipy.signal.welch())
Defaults to "fft".

'fft'
nperseg Optional[int]

Length of each FFT segment for Welch method. If None, uses scipy's default of 256 samples.
Defaults to None.

None
normalize bool

If True, divide by \(\log_2(\text{psd.size})\) to normalize the spectral entropy to be between \(0\) and \(1\). Otherwise, return the spectral entropy in bits.
Defaults to False.

False
axis int

The axis along which the entropy is calculated. Default is the last axis.
Defaults to -1.

-1

Returns:

Type Description
Union[float, NDArray[float64]]

The spectral entropy score. Returned as a float for 1D input, or a numpy array for N-dimensional input.

Examples
Setup
1
2
3
>>> from ts_stat_tests.regularity.algorithms import spectral_entropy
>>> from ts_stat_tests.utils.data import data_airline
>>> airline = data_airline.values
Example 1: Airline Passengers Data
1
2
>>> print(f"{spectral_entropy(x=airline, sf=12):.4f}")
2.6538
Example 2: Welch method for spectral entropy
1
2
3
>>> data_sine = np.sin(2 * np.pi * 1 * np.arange(400) / 100)
>>> print(f"{spectral_entropy(x=data_sine, sf=100, method='welch'):.4f}")
1.2938
Calculation

The spectral entropy (\(SE\)) is defined as:

\[ H(x, f_s) = - \sum_{i=0}^{f_s/2} P(i) \times \log_2(P(i)) \]

where:

  • \(P(i)\) is the normalized power spectral density (PSD) at the \(i\)-th frequency band,
  • \(f_s\) is the sampling frequency.
Notes
  • The power spectrum represents the energy of the signal at different frequencies. High spectral entropy indicates multiple sources or processes with different frequencies, while low spectral entropy suggests a dominant frequency or periodicity.
Credit

All credit goes to the AntroPy library.

References
See Also
Source code in src/ts_stat_tests/regularity/algorithms.py
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
@typechecked
def spectral_entropy(
    x: ArrayLike,
    sf: float = 1,
    method: VALID_SPECTRAL_ENTROPY_METHOD_OPTIONS = "fft",
    nperseg: Optional[int] = None,
    normalize: bool = False,
    axis: int = -1,
) -> Union[float, NDArray[np.float64]]:
    r"""
    !!! note "Summary"
        Spectral entropy is a measure of the amount of complexity or unpredictability in a signal's frequency domain representation. It is used to quantify the degree of randomness or regularity in the power spectrum of a signal.

        This function implements the [`spectral_entropy()`](https://raphaelvallat.com/antropy/build/html/generated/antropy.spectral_entropy.html) function from the [`AntroPy`](https://raphaelvallat.com/antropy/) library.

    ???+ abstract "Details"
        Spectral Entropy is defined to be the Shannon entropy of the power spectral density (PSD) of the data. It is based on the Shannon entropy, which is a measure of the uncertainty or information content of a probability distribution.

        The value of spectral entropy ranges from $0$ to $\log_2(N)$, where $N$ is the number of frequency bands. Lower values indicate a more concentrated or regular distribution of power, while higher values indicate a more spread-out or irregular distribution.

        Spectral entropy is particularly useful for detecting periodicity and cyclical patterns, as well as changes in the frequency distribution over time.

    Params:
        x (ArrayLike):
            One-dimensional or N-dimensional data array.
        sf (float, optional):
            Sampling frequency, in Hz.<br>
            Defaults to `1`.
        method (VALID_SPECTRAL_ENTROPY_METHOD_OPTIONS, optional):
            Spectral estimation method: `'fft'` or `'welch'`.<br>
            - `'fft'`: Fourier Transformation ([`scipy.signal.periodogram()`](https://docs.scipy.org/doc/scipy/reference/generated/scipy.signal.periodogram.html#scipy.signal.periodogram))<br>
            - `'welch'`: Welch periodogram ([`scipy.signal.welch()`](https://docs.scipy.org/doc/scipy/reference/generated/scipy.signal.welch.html#scipy.signal.welch))<br>
            Defaults to `"fft"`.
        nperseg (Optional[int], optional):
            Length of each FFT segment for Welch method. If `None`, uses `scipy`'s default of 256 samples.<br>
            Defaults to `None`.
        normalize (bool, optional):
            If `True`, divide by $\log_2(\text{psd.size})$ to normalize the spectral entropy to be between $0$ and $1$. Otherwise, return the spectral entropy in bits.<br>
            Defaults to `False`.
        axis (int, optional):
            The axis along which the entropy is calculated. Default is the last axis.<br>
            Defaults to `-1`.

    Returns:
        (Union[float, NDArray[np.float64]]):
            The spectral entropy score. Returned as a float for 1D input, or a numpy array for N-dimensional input.

    ???+ example "Examples"

        ```pycon {.py .python linenums="1" title="Setup"}
        >>> from ts_stat_tests.regularity.algorithms import spectral_entropy
        >>> from ts_stat_tests.utils.data import data_airline
        >>> airline = data_airline.values

        ```

        ```pycon {.py .python linenums="1" title="Example 1: Airline Passengers Data"}
        >>> print(f"{spectral_entropy(x=airline, sf=12):.4f}")
        2.6538

        ```

        ```pycon {.py .python linenums="1" title="Example 2: Welch method for spectral entropy"}
        >>> data_sine = np.sin(2 * np.pi * 1 * np.arange(400) / 100)
        >>> print(f"{spectral_entropy(x=data_sine, sf=100, method='welch'):.4f}")
        1.2938

        ```

    ??? equation "Calculation"
        The spectral entropy ($SE$) is defined as:

        $$
        H(x, f_s) = - \sum_{i=0}^{f_s/2} P(i) \times \log_2(P(i))
        $$

        where:

        - $P(i)$ is the normalized power spectral density (PSD) at the $i$-th frequency band,
        - $f_s$ is the sampling frequency.

    ??? note "Notes"
        - The power spectrum represents the energy of the signal at different frequencies. High spectral entropy indicates multiple sources or processes with different frequencies, while low spectral entropy suggests a dominant frequency or periodicity.

    ??? success "Credit"
        All credit goes to the [`AntroPy`](https://raphaelvallat.com/antropy/) library.

    ??? question "References"
        - [Inouye, T. et al. (1991). Quantification of EEG irregularity by use of the entropy of the power spectrum. Electroencephalography and clinical neurophysiology, 79(3), 204-210.](https://pubmed.ncbi.nlm.nih.gov/1714811/)
        - [Wikipedia: Spectral density](https://en.wikipedia.org/wiki/Spectral_density)
        - [Wikipedia: Welch's method](https://en.wikipedia.org/wiki/Welch%27s_method)

    ??? tip "See Also"
        - [`antropy.spectral_entropy`](https://raphaelvallat.com/antropy/build/html/generated/antropy.spectral_entropy.html)
        - [`antropy.app_entropy`](https://raphaelvallat.com/antropy/build/html/generated/antropy.app_entropy.html)
        - [`antropy.sample_entropy`](https://raphaelvallat.com/antropy/build/html/generated/antropy.sample_entropy.html)
        - [`antropy.perm_entropy`](https://raphaelvallat.com/antropy/build/html/generated/antropy.perm_entropy.html)
    """
    return a_spectral_entropy(
        x=x,
        sf=sf,
        method=method,
        nperseg=nperseg,
        normalize=normalize,
        axis=axis,
    )

svd_entropy 🔗

svd_entropy(
    x: ArrayLike,
    order: int = 3,
    delay: int = 1,
    normalize: bool = False,
) -> float

Summary

SVD entropy is a measure of the complexity or randomness of a time series based on Singular Value Decomposition (SVD).

This function implements the svd_entropy() function from the AntroPy library.

Details

SVD entropy is calculated by first embedding the time series into a matrix, then performing SVD on that matrix to obtain the singular values. The entropy is then calculated from the normalized singular values.

Parameters:

Name Type Description Default
x ArrayLike

One-dimensional time series of shape (n_times,).

required
order int

Order of the SVD entropy (embedding dimension).
Defaults to 3.

3
delay int

Time delay (lag).
Defaults to 1.

1
normalize bool

If True, divide by \(\log_2(\text{order}!)\) to normalize the entropy between \(0\) and \(1\).
Defaults to False.

False

Returns:

Type Description
float

The SVD entropy of the data set.

Examples
Setup
1
2
3
>>> from ts_stat_tests.regularity.algorithms import svd_entropy
>>> from ts_stat_tests.utils.data import data_random
>>> random = data_random
Example 1: Basic SVD entropy
1
2
>>> print(f"{svd_entropy(random):.4f}")
1.3514
Calculation

The SVD entropy is calculated as the Shannon entropy of the singular values of the embedded matrix.

Notes
  • Singular Value Decomposition (SVD) is a factorization of a real or complex matrix. It is the generalization of the eigendecomposition of a positive semidefinite normal matrix.
Credit

All credit goes to the AntroPy library.

See Also
Source code in src/ts_stat_tests/regularity/algorithms.py
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
@typechecked
def svd_entropy(
    x: ArrayLike,
    order: int = 3,
    delay: int = 1,
    normalize: bool = False,
) -> float:
    r"""
    !!! note "Summary"
        SVD entropy is a measure of the complexity or randomness of a time series based on Singular Value Decomposition (SVD).

        This function implements the [`svd_entropy()`](https://raphaelvallat.com/antropy/build/html/generated/antropy.svd_entropy.html) function from the [`AntroPy`](https://raphaelvallat.com/antropy/) library.

    ???+ abstract "Details"
        SVD entropy is calculated by first embedding the time series into a matrix, then performing SVD on that matrix to obtain the singular values. The entropy is then calculated from the normalized singular values.

    Params:
        x (ArrayLike):
            One-dimensional time series of shape `(n_times,)`.
        order (int, optional):
            Order of the SVD entropy (embedding dimension).<br>
            Defaults to `3`.
        delay (int, optional):
            Time delay (lag).<br>
            Defaults to `1`.
        normalize (bool, optional):
            If `True`, divide by $\log_2(\text{order}!)$ to normalize the entropy between $0$ and $1$.<br>
            Defaults to `False`.

    Returns:
        (float):
            The SVD entropy of the data set.

    ???+ example "Examples"

        ```pycon {.py .python linenums="1" title="Setup"}
        >>> from ts_stat_tests.regularity.algorithms import svd_entropy
        >>> from ts_stat_tests.utils.data import data_random
        >>> random = data_random

        ```

        ```pycon {.py .python linenums="1" title="Example 1: Basic SVD entropy"}
        >>> print(f"{svd_entropy(random):.4f}")
        1.3514

        ```

    ??? equation "Calculation"
        The SVD entropy is calculated as the Shannon entropy of the singular values of the embedded matrix.

    ??? note "Notes"
        - Singular Value Decomposition (SVD) is a factorization of a real or complex matrix. It is the generalization of the eigendecomposition of a positive semidefinite normal matrix.

    ??? success "Credit"
        All credit goes to the [`AntroPy`](https://raphaelvallat.com/antropy/) library.

    ??? tip "See Also"
        - [`antropy.svd_entropy`](https://raphaelvallat.com/antropy/build/html/generated/antropy.svd_entropy.html)
        - [`ts_stat_tests.regularity.algorithms.approx_entropy`][ts_stat_tests.regularity.algorithms.approx_entropy]
        - [`ts_stat_tests.regularity.algorithms.sample_entropy`][ts_stat_tests.regularity.algorithms.sample_entropy]
        - [`ts_stat_tests.regularity.algorithms.permutation_entropy`][ts_stat_tests.regularity.algorithms.permutation_entropy]
        - [`ts_stat_tests.regularity.algorithms.spectral_entropy`][ts_stat_tests.regularity.algorithms.spectral_entropy]
    """
    return a_svd_entropy(
        x=x,
        order=order,
        delay=delay,
        normalize=normalize,
    )