Test the `stability` of a given Time-Series Dataset🔗

Introduction🔗

Summary

As stated by Bocharov, Chickering & Heckerman

In technical terms the cases of long range forecasting instability are characterized by rapid growth of the mean absolute prediction error with time, which may or may not be accompanied by significant growth of the predicted standard deviation. In practice, the cases of instability where predicted standard deviation stays tame are especially misleading, since they can furnish unreliable predictions with little or no visual cues that would characterize them as unreliable.

For more info, see: Stability analysis of time series forecasting with ART models.

Info

The test for stability is a measure of how much the data varies over each period of time. If a data-set is to be 'stable' that means that the means of each time period do not vary dramatically over time. In other words, the higher the variance between the means of each time-period, the more unstable the data is.

There are two tests that can be used: The test for Stability and for Lumpiness (see sources here and here). While the Stability test measures the variance of the means, the Lumpiness test measures the variance of the variances. For both of these measures, they simply indicate the extent to which each series varies by. These measures are non-negative and have no fixed upper bound; a score of $0$ indicates a perfectly stable (or perfectly smooth) data set, while larger values indicate increasingly unstable (or more sporadic) data.

For more info, see: The Future of Australian Energy Prices: Time-Series Analysis of Historic Prices and Forecast for Future Prices.

Source Library

The tsfeatures package was chosen because it provides well-tested implementations of the stability and lumpiness features described in the literature, closely follows the original R tsfeatures API, and serves as a reliable reference implementation for our time-series stability tests.

Source Module

All of the source code can be found within these modules:

ts_stat_tests.stability.algorithms.
ts_stat_tests.stability.tests.

Modules🔗

ts_stat_tests.stability.tests 🔗

Summary

This module contains convenience functions and tests for stability measures, allowing for easy access to stability and lumpiness algorithms.

is_stable 🔗

is_stable(
    data: Union[NDArray[float64], DataFrame, Series],
    freq: int = 1,
    alpha: float = 0.5,
) -> bool

Summary

Determine if a time series is stable based on a variance threshold.

Details

A time series is considered stable if the variance of its windowed means (the stability metric) is below a specified threshold (alpha). High stability values indicate that the mean level of the series changes significantly over time (e.g., due to trends or structural breaks).

Parameters:

Name	Type	Description	Default
`data`	`Union[NDArray[float64], DataFrame, Series]`	The time series data to analyse.	required
`freq`	`int`	The number of observations per seasonal period or the desired window size for tiling. Default: `1`	`1`
`alpha`	`float`	The threshold for stability. The series is considered stable if the calculated stability metric is less than this value. Default: `0.5`	`0.5`

Returns:

Type	Description
`bool`	`True` if the stability metric is less than `alpha` (the series is stable), otherwise `False`.

Examples

Setup
>>> import numpy as np
>>> from ts_stat_tests.stability.tests import is_stable
>>> from ts_stat_tests.utils.data import load_airline

Example 1: Check if airline data is stable
>>> data = load_airline().values
>>> is_stable(data, freq=12, alpha=1.0)
False

Example 2: Check if random noise is stable
>>> rng = np.random.RandomState(42)
>>> data_random = rng.normal(0, 1, 144)
>>> is_stable(data_random, freq=12, alpha=1.0)
True

See Also

stability()

Source code in src/ts_stat_tests/stability/tests.py

@typechecked
def is_stable(
    data: Union[NDArray[np.float64], pd.DataFrame, pd.Series],
    freq: int = 1,
    alpha: float = 0.5,
) -> bool:
    r"""
    !!! note "Summary"
        Determine if a time series is stable based on a variance threshold.

    ???+ abstract "Details"
        A time series is considered stable if the variance of its windowed means (the stability metric) is below a specified threshold (`alpha`). High stability values indicate that the mean level of the series changes significantly over time (e.g., due to trends or structural breaks).

    Params:
        data (Union[NDArray[np.float64], pd.DataFrame, pd.Series]):
            The time series data to analyse.
        freq (int):
            The number of observations per seasonal period or the desired window size for tiling.
            Default: `1`
        alpha (float):
            The threshold for stability. The series is considered stable if the calculated stability metric is less than this value.
            Default: `0.5`

    Returns:
        (bool):
            `True` if the stability metric is less than `alpha` (the series is stable), otherwise `False`.

    ???+ example "Examples"

        ```pycon {.py .python linenums="1" title="Setup"}
        >>> import numpy as np
        >>> from ts_stat_tests.stability.tests import is_stable
        >>> from ts_stat_tests.utils.data import load_airline

        ```

        ```pycon {.py .python linenums="1" title="Example 1: Check if airline data is stable"}
        >>> data = load_airline().values
        >>> is_stable(data, freq=12, alpha=1.0)
        False

        ```

        ```pycon {.py .python linenums="1" title="Example 2: Check if random noise is stable"}
        >>> rng = np.random.RandomState(42)
        >>> data_random = rng.normal(0, 1, 144)
        >>> is_stable(data_random, freq=12, alpha=1.0)
        True

        ```

    ??? tip "See Also"
        - [`stability()`][ts_stat_tests.stability.algorithms.stability]
    """
    return True if _stability(data=data, freq=freq) < alpha else False

is_lumpy 🔗

is_lumpy(
    data: Union[NDArray[float64], DataFrame, Series],
    freq: int = 1,
    alpha: float = 0.5,
) -> bool

Summary

Determine if a time series is lumpy based on a variance threshold.

Details

A time series is considered lumpy if the variance of its windowed variances (the lumpiness metric) exceeds a specified threshold (alpha). High lumpiness values indicate that the volatility of the series is inconsistently distributed across time.

Parameters:

Name	Type	Description	Default
`data`	`Union[NDArray[float64], DataFrame, Series]`	The time series data to analyse.	required
`freq`	`int`	The number of observations per seasonal period or the desired window size for tiling. Default: `1`	`1`
`alpha`	`float`	The threshold for lumpiness. The series is considered lumpy if the calculated lumpiness metric is greater than this value. Default: `0.5`	`0.5`

Returns:

Type	Description
`bool`	`True` if the lumpiness metric is greater than `alpha` (the series is lumpy), otherwise `False`.

Examples

Setup
>>> import numpy as np
>>> from ts_stat_tests.stability.tests import is_lumpy
>>> from ts_stat_tests.utils.data import load_airline

Example 1: Check if airline data is lumpy
>>> data = load_airline().values
>>> is_lumpy(data, freq=12, alpha=1.0)
True

Example 2: Check if random noise is lumpy
>>> rng = np.random.RandomState(42)
>>> data_random = rng.normal(0, 1, 144)
>>> is_lumpy(data_random, freq=12, alpha=1.0)
False

See Also

lumpiness()

Source code in src/ts_stat_tests/stability/tests.py

@typechecked
def is_lumpy(
    data: Union[NDArray[np.float64], pd.DataFrame, pd.Series],
    freq: int = 1,
    alpha: float = 0.5,
) -> bool:
    r"""
    !!! note "Summary"
        Determine if a time series is lumpy based on a variance threshold.

    ???+ abstract "Details"
        A time series is considered lumpy if the variance of its windowed variances (the lumpiness metric) exceeds a specified threshold (`alpha`). High lumpiness values indicate that the volatility of the series is inconsistently distributed across time.

    Params:
        data (Union[NDArray[np.float64], pd.DataFrame, pd.Series]):
            The time series data to analyse.
        freq (int):
            The number of observations per seasonal period or the desired window size for tiling.
            Default: `1`
        alpha (float):
            The threshold for lumpiness. The series is considered lumpy if the calculated lumpiness metric is greater than this value.
            Default: `0.5`

    Returns:
        (bool):
            `True` if the lumpiness metric is greater than `alpha` (the series is lumpy), otherwise `False`.

    ???+ example "Examples"

        ```pycon {.py .python linenums="1" title="Setup"}
        >>> import numpy as np
        >>> from ts_stat_tests.stability.tests import is_lumpy
        >>> from ts_stat_tests.utils.data import load_airline

        ```

        ```pycon {.py .python linenums="1" title="Example 1: Check if airline data is lumpy"}
        >>> data = load_airline().values
        >>> is_lumpy(data, freq=12, alpha=1.0)
        True

        ```

        ```pycon {.py .python linenums="1" title="Example 2: Check if random noise is lumpy"}
        >>> rng = np.random.RandomState(42)
        >>> data_random = rng.normal(0, 1, 144)
        >>> is_lumpy(data_random, freq=12, alpha=1.0)
        False

        ```

    ??? tip "See Also"
        - [`lumpiness()`][ts_stat_tests.stability.algorithms.lumpiness]
    """
    return True if _lumpiness(data=data, freq=freq) > alpha else False

ts_stat_tests.stability.algorithms 🔗

Summary

This module provides algorithms to measure the stability and lumpiness of time series data.

stability 🔗

stability(
    data: Union[NDArray[float64], DataFrame, Series],
    freq: int = 1,
) -> float

Summary

Measure the stability of a time series by calculating the variance of the means across non-overlapping windows.

Details

Stability is a feature extracted from time series data that quantifies how much the mean level of the series changes over time. It is particularly useful for identifying series with structural breaks or varying levels.

The series is divided into non-overlapping "tiles" (windows) of length equal to the specified frequency. The mean of each tile is computed, and the stability is defined as the variance of these means. A higher value indicates lower stability (greater changes in the mean level).

Parameters:

Name	Type	Description	Default
`data`	`Union[NDArray[float64], DataFrame, Series]`	The time series data to analyse.	required
`freq`	`int`	The number of observations per seasonal period or the desired window size for tiling. Default: `1`	`1`

Returns:

Type	Description
`float`	The calculated stability value.

Examples

Setup
>>> import numpy as np
>>> from ts_stat_tests.stability.algorithms import stability
>>> from ts_stat_tests.utils.data import load_airline

Example 1: Measure stability of airline data
>>> data = load_airline().values
>>> res = stability(data, freq=12)
>>> print(f"{res:.2f}")
13428.67

Example 2: Measure stability of random noise
>>> rng = np.random.RandomState(42)
>>> data_random = rng.normal(0, 1, 144)
>>> res = stability(data_random, freq=12)
>>> print(f"{res:.4f}")
0.0547

Calculation

The stability $S$ is calculated by:

Dividing the time series $X$ into $k$ non-overlapping windows $W_1, W_2, \dots, W_k$ of size $freq$.
Computing the mean $\mu_i$ for each window $W_i$.
Calculating the variance of these means: $$ S = \text{Var}(\mu_1, \mu_2, \dots, \mu_k) $$

References

Hyndman, R.J., Wang, X., & Laptev, N. (2015). Large-scale unusual time series detection. In Proceedings of the IEEE International Conference on Data Mining (ICDM 2015).

See Also

lumpiness()

Source code in src/ts_stat_tests/stability/algorithms.py

@typechecked
def stability(data: Union[NDArray[np.float64], pd.DataFrame, pd.Series], freq: int = 1) -> float:
    r"""
    !!! note "Summary"
        Measure the stability of a time series by calculating the variance of the means across non-overlapping windows.

    ???+ abstract "Details"
        Stability is a feature extracted from time series data that quantifies how much the mean level of the series changes over time. It is particularly useful for identifying series with structural breaks or varying levels.

        The series is divided into non-overlapping "tiles" (windows) of length equal to the specified frequency. The mean of each tile is computed, and the stability is defined as the variance of these means. A higher value indicates lower stability (greater changes in the mean level).

    Params:
        data (Union[NDArray[np.float64], pd.DataFrame, pd.Series]):
            The time series data to analyse.
        freq (int):
            The number of observations per seasonal period or the desired window size for tiling.
            Default: `1`

    Returns:
        (float):
            The calculated stability value.

    ???+ example "Examples"

        ```pycon {.py .python linenums="1" title="Setup"}
        >>> import numpy as np
        >>> from ts_stat_tests.stability.algorithms import stability
        >>> from ts_stat_tests.utils.data import load_airline

        ```

        ```pycon {.py .python linenums="1" title="Example 1: Measure stability of airline data"}
        >>> data = load_airline().values
        >>> res = stability(data, freq=12)
        >>> print(f"{res:.2f}")
        13428.67

        ```

        ```pycon {.py .python linenums="1" title="Example 2: Measure stability of random noise"}
        >>> rng = np.random.RandomState(42)
        >>> data_random = rng.normal(0, 1, 144)
        >>> res = stability(data_random, freq=12)
        >>> print(f"{res:.4f}")
        0.0547

        ```

    ??? equation "Calculation"
        The stability $S$ is calculated by:

        1. Dividing the time series $X$ into $k$ non-overlapping windows $W_1, W_2, \dots, W_k$ of size $freq$.
        2. Computing the mean $\mu_i$ for each window $W_i$.
        3. Calculating the variance of these means:
        $$
        S = \text{Var}(\mu_1, \mu_2, \dots, \mu_k)
        $$

    ??? question "References"
        - Hyndman, R.J., Wang, X., & Laptev, N. (2015). Large-scale unusual time series detection. In Proceedings of the IEEE International Conference on Data Mining (ICDM 2015).

    ??? tip "See Also"
        - [`lumpiness()`][ts_stat_tests.stability.algorithms.lumpiness]
    """
    return ts_stability(x=data, freq=freq)["stability"]

lumpiness 🔗

lumpiness(
    data: Union[NDArray[float64], DataFrame, Series],
    freq: int = 1,
) -> float

Summary

Measure the lumpiness of a time series by calculating the variance of the variances across non-overlapping windows.

Details

Lumpiness quantifies the extent to which the variance of a time series changes over time. It is useful for detecting series with "lumpy" patterns, where volatility is concentrated in certain periods.

Similar to stability, the series is divided into non-overlapping tiles of length freq. Instead of means, the variance of each tile is computed. The lumpiness is defined as the variance of these tile variances. A higher value indicates greater "lumpiness" or inconsistent volatility.

Parameters:

Name	Type	Description	Default
`data`	`Union[NDArray[float64], DataFrame, Series]`	The time series data to analyse.	required
`freq`	`int`	The number of observations per seasonal period or the desired window size for tiling. Default: `1`	`1`

Returns:

Type	Description
`float`	The calculated lumpiness value.

Examples

Setup
>>> import numpy as np
>>> from ts_stat_tests.stability.algorithms import lumpiness
>>> from ts_stat_tests.utils.data import load_airline

Example 1: Measure lumpiness of airline data
>>> data = load_airline().values
>>> res = lumpiness(data, freq=12)
>>> print(f"{res:.2f}")
3986791.94

Example 2: Measure lumpiness of random noise
>>> rng = np.random.RandomState(42)
>>> data_random = rng.normal(0, 1, 144)
>>> res = lumpiness(data_random, freq=12)
>>> print(f"{res:.4f}")
0.0925

Calculation

The lumpiness $L$ is calculated by:

Dividing the time series $X$ into $k$ non-overlapping windows $W_1, W_2, \dots, W_k$ of size $freq$.
Computing the variance $\sigma^2_i$ for each window $W_i$.
Calculating the variance of these variances: $$ L = \text{Var}(\sigma^2_1, \sigma^2_2, \dots, \sigma^2_k) $$

References

Hyndman, R.J., Wang, X., & Laptev, N. (2015). Large-scale unusual time series detection. In Proceedings of the IEEE International Conference on Data Mining (ICDM 2015).

See Also

stability()

Source code in src/ts_stat_tests/stability/algorithms.py

@typechecked
def lumpiness(data: Union[NDArray[np.float64], pd.DataFrame, pd.Series], freq: int = 1) -> float:
    r"""
    !!! note "Summary"
        Measure the lumpiness of a time series by calculating the variance of the variances across non-overlapping windows.

    ???+ abstract "Details"
        Lumpiness quantifies the extent to which the variance of a time series changes over time. It is useful for detecting series with "lumpy" patterns, where volatility is concentrated in certain periods.

        Similar to stability, the series is divided into non-overlapping tiles of length `freq`. Instead of means, the variance of each tile is computed. The lumpiness is defined as the variance of these tile variances. A higher value indicates greater "lumpiness" or inconsistent volatility.

    Params:
        data (Union[NDArray[np.float64], pd.DataFrame, pd.Series]):
            The time series data to analyse.
        freq (int):
            The number of observations per seasonal period or the desired window size for tiling.
            Default: `1`

    Returns:
        (float):
            The calculated lumpiness value.

    ???+ example "Examples"

        ```pycon {.py .python linenums="1" title="Setup"}
        >>> import numpy as np
        >>> from ts_stat_tests.stability.algorithms import lumpiness
        >>> from ts_stat_tests.utils.data import load_airline

        ```

        ```pycon {.py .python linenums="1" title="Example 1: Measure lumpiness of airline data"}
        >>> data = load_airline().values
        >>> res = lumpiness(data, freq=12)
        >>> print(f"{res:.2f}")
        3986791.94

        ```

        ```pycon {.py .python linenums="1" title="Example 2: Measure lumpiness of random noise"}
        >>> rng = np.random.RandomState(42)
        >>> data_random = rng.normal(0, 1, 144)
        >>> res = lumpiness(data_random, freq=12)
        >>> print(f"{res:.4f}")
        0.0925

        ```

    ??? equation "Calculation"
        The lumpiness $L$ is calculated by:

        1. Dividing the time series $X$ into $k$ non-overlapping windows $W_1, W_2, \dots, W_k$ of size $freq$.
        2. Computing the variance $\sigma^2_i$ for each window $W_i$.
        3. Calculating the variance of these variances:
        $$
        L = \text{Var}(\sigma^2_1, \sigma^2_2, \dots, \sigma^2_k)
        $$

    ??? question "References"
        - Hyndman, R.J., Wang, X., & Laptev, N. (2015). Large-scale unusual time series detection. In Proceedings of the IEEE International Conference on Data Mining (ICDM 2015).

    ??? tip "See Also"
        - [`stability()`][ts_stat_tests.stability.algorithms.stability]
    """
    return ts_lumpiness(x=data, freq=freq)["lumpiness"]

Test the stability of a given Time-Series Dataset🔗

Introduction🔗

Modules🔗

ts_stat_tests.stability.tests 🔗

is_stable 🔗

is_lumpy 🔗

ts_stat_tests.stability.algorithms 🔗

stability 🔗

lumpiness 🔗

Test the `stability` of a given Time-Series Dataset🔗