Test the stability of a given Time-Series Dataset🔗
Introduction🔗
Summary
As stated by Bocharov, Chickering & Heckerman
In technical terms the cases of long range forecasting instability are characterized by rapid growth of the mean absolute prediction error with time, which may or may not be accompanied by significant growth of the predicted standard deviation. In practice, the cases of instability where predicted standard deviation stays tame are especially misleading, since they can furnish unreliable predictions with little or no visual cues that would characterize them as unreliable.
For more info, see: Stability analysis of time series forecasting with ART models.
Info
The test for stability is a measure of how much the data varies over each period of time. If a data-set is to be 'stable' that means that the means of each time period do not vary dramatically over time. In other words, the higher the variance between the means of each time-period, the more unstable the data is.
There are two tests that can be used: The test for Stability and for Lumpiness (see sources here and here). While the Stability test measures the variance of the means, the Lumpiness test measures the variance of the variances. For both of these measures, they simply indicate the extent to which each series varies by. These measures are non-negative and have no fixed upper bound; a score of \(0\) indicates a perfectly stable (or perfectly smooth) data set, while larger values indicate increasingly unstable (or more sporadic) data.
For more info, see: The Future of Australian Energy Prices: Time-Series Analysis of Historic Prices and Forecast for Future Prices.
Source Library
The tsfeatures package was chosen because it provides well-tested implementations of the stability and lumpiness features described in the literature, closely follows the original R tsfeatures API, and serves as a reliable reference implementation for our time-series stability tests.
Source Module
All of the source code can be found within these modules:
Modules🔗
ts_stat_tests.stability.tests
🔗
Summary
This module contains convenience functions and tests for stability measures, allowing for easy access to stability and lumpiness algorithms.
is_stable
🔗
is_stable(
data: Union[NDArray[float64], DataFrame, Series],
freq: int = 1,
alpha: float = 0.5,
) -> bool
Summary
Determine if a time series is stable based on a variance threshold.
Details
A time series is considered stable if the variance of its windowed means (the stability metric) is below a specified threshold (alpha). High stability values indicate that the mean level of the series changes significantly over time (e.g., due to trends or structural breaks).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
Union[NDArray[float64], DataFrame, Series]
|
The time series data to analyse. |
required |
freq
|
int
|
The number of observations per seasonal period or the desired window size for tiling.
Default: |
1
|
alpha
|
float
|
The threshold for stability. The series is considered stable if the calculated stability metric is less than this value.
Default: |
0.5
|
Returns:
| Type | Description |
|---|---|
bool
|
|
Examples
| Setup | |
|---|---|
1 2 3 | |
| Example 1: Check if airline data is stable | |
|---|---|
1 2 3 | |
| Example 2: Check if random noise is stable | |
|---|---|
1 2 3 4 | |
See Also
Source code in src/ts_stat_tests/stability/tests.py
75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 | |
is_lumpy
🔗
is_lumpy(
data: Union[NDArray[float64], DataFrame, Series],
freq: int = 1,
alpha: float = 0.5,
) -> bool
Summary
Determine if a time series is lumpy based on a variance threshold.
Details
A time series is considered lumpy if the variance of its windowed variances (the lumpiness metric) exceeds a specified threshold (alpha). High lumpiness values indicate that the volatility of the series is inconsistently distributed across time.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
Union[NDArray[float64], DataFrame, Series]
|
The time series data to analyse. |
required |
freq
|
int
|
The number of observations per seasonal period or the desired window size for tiling.
Default: |
1
|
alpha
|
float
|
The threshold for lumpiness. The series is considered lumpy if the calculated lumpiness metric is greater than this value.
Default: |
0.5
|
Returns:
| Type | Description |
|---|---|
bool
|
|
Examples
| Setup | |
|---|---|
1 2 3 | |
| Example 1: Check if airline data is lumpy | |
|---|---|
1 2 3 | |
| Example 2: Check if random noise is lumpy | |
|---|---|
1 2 3 4 | |
See Also
Source code in src/ts_stat_tests/stability/tests.py
137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 | |
ts_stat_tests.stability.algorithms
🔗
Summary
This module provides algorithms to measure the stability and lumpiness of time series data.
stability
🔗
stability(
data: Union[NDArray[float64], DataFrame, Series],
freq: int = 1,
) -> float
Summary
Measure the stability of a time series by calculating the variance of the means across non-overlapping windows.
Details
Stability is a feature extracted from time series data that quantifies how much the mean level of the series changes over time. It is particularly useful for identifying series with structural breaks or varying levels.
The series is divided into non-overlapping "tiles" (windows) of length equal to the specified frequency. The mean of each tile is computed, and the stability is defined as the variance of these means. A higher value indicates lower stability (greater changes in the mean level).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
Union[NDArray[float64], DataFrame, Series]
|
The time series data to analyse. |
required |
freq
|
int
|
The number of observations per seasonal period or the desired window size for tiling.
Default: |
1
|
Returns:
| Type | Description |
|---|---|
float
|
The calculated stability value. |
Examples
| Setup | |
|---|---|
1 2 3 | |
| Example 1: Measure stability of airline data | |
|---|---|
1 2 3 4 | |
| Example 2: Measure stability of random noise | |
|---|---|
1 2 3 4 5 | |
Calculation
The stability \(S\) is calculated by:
- Dividing the time series \(X\) into \(k\) non-overlapping windows \(W_1, W_2, \dots, W_k\) of size \(freq\).
- Computing the mean \(\mu_i\) for each window \(W_i\).
- Calculating the variance of these means: $$ S = \text{Var}(\mu_1, \mu_2, \dots, \mu_k) $$
References
- Hyndman, R.J., Wang, X., & Laptev, N. (2015). Large-scale unusual time series detection. In Proceedings of the IEEE International Conference on Data Mining (ICDM 2015).
See Also
Source code in src/ts_stat_tests/stability/algorithms.py
70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 | |
lumpiness
🔗
lumpiness(
data: Union[NDArray[float64], DataFrame, Series],
freq: int = 1,
) -> float
Summary
Measure the lumpiness of a time series by calculating the variance of the variances across non-overlapping windows.
Details
Lumpiness quantifies the extent to which the variance of a time series changes over time. It is useful for detecting series with "lumpy" patterns, where volatility is concentrated in certain periods.
Similar to stability, the series is divided into non-overlapping tiles of length freq. Instead of means, the variance of each tile is computed. The lumpiness is defined as the variance of these tile variances. A higher value indicates greater "lumpiness" or inconsistent volatility.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
Union[NDArray[float64], DataFrame, Series]
|
The time series data to analyse. |
required |
freq
|
int
|
The number of observations per seasonal period or the desired window size for tiling.
Default: |
1
|
Returns:
| Type | Description |
|---|---|
float
|
The calculated lumpiness value. |
Examples
| Setup | |
|---|---|
1 2 3 | |
| Example 1: Measure lumpiness of airline data | |
|---|---|
1 2 3 4 | |
| Example 2: Measure lumpiness of random noise | |
|---|---|
1 2 3 4 5 | |
Calculation
The lumpiness \(L\) is calculated by:
- Dividing the time series \(X\) into \(k\) non-overlapping windows \(W_1, W_2, \dots, W_k\) of size \(freq\).
- Computing the variance \(\sigma^2_i\) for each window \(W_i\).
- Calculating the variance of these variances: $$ L = \text{Var}(\sigma^2_1, \sigma^2_2, \dots, \sigma^2_k) $$
References
- Hyndman, R.J., Wang, X., & Laptev, N. (2015). Large-scale unusual time series detection. In Proceedings of the IEEE International Conference on Data Mining (ICDM 2015).
See Also
Source code in src/ts_stat_tests/stability/algorithms.py
142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 | |