Coverage for src / ts_stat_tests / seasonality / algorithms.py: 100%
76 statements
« prev ^ index » next coverage.py v7.13.2, created at 2026-02-01 09:48 +0000
« prev ^ index » next coverage.py v7.13.2, created at 2026-02-01 09:48 +0000
1# ============================================================================ #
2# #
3# Title: Seasonality Algorithms #
4# Purpose: Algorithms for testing seasonality in time series data. #
5# #
6# ============================================================================ #
9# ---------------------------------------------------------------------------- #
10# #
11# Overview ####
12# #
13# ---------------------------------------------------------------------------- #
16# ---------------------------------------------------------------------------- #
17# Description ####
18# ---------------------------------------------------------------------------- #
21"""
22!!! note "Summary"
23 Seasonality tests are statistical tests used to determine whether a time series exhibits seasonal patterns or cycles. Seasonality refers to the regular and predictable fluctuations in a time series that occur at specific intervals, such as daily, weekly, monthly, or yearly.
25 Seasonality tests help identify whether a time series has a seasonal component that needs to be accounted for in forecasting models. By detecting seasonality, analysts can choose appropriate models that capture these patterns and improve the accuracy of their forecasts.
27 Common seasonality tests include the QS test, OCSB test, Canova-Hansen test, and others. These tests analyze the autocorrelation structure of the time series data to identify significant seasonal patterns.
29 Overall, seasonality tests are essential tools in time series analysis and forecasting, as they help identify and account for seasonal patterns that can significantly impact the accuracy of predictions.
30"""
33# ---------------------------------------------------------------------------- #
34# #
35# Setup ####
36# #
37# ---------------------------------------------------------------------------- #
40# ---------------------------------------------------------------------------- #
41# Imports ####
42# ---------------------------------------------------------------------------- #
45# ## Python StdLib Imports ----
46from typing import Optional, Union
48# ## Python Third Party Imports ----
49import numpy as np
50from numpy.typing import ArrayLike, NDArray
51from pmdarima.arima.arima import ARIMA
52from pmdarima.arima.auto import auto_arima
53from pmdarima.arima.seasonality import CHTest, OCSBTest
54from scipy.stats import chi2
55from statsmodels.tsa.seasonal import seasonal_decompose # , STL, DecomposeResult,
56from typeguard import typechecked
58# ## Local First Party Imports ----
59from ts_stat_tests.correlation import acf as _acf
62# ---------------------------------------------------------------------------- #
63# Exports ####
64# ---------------------------------------------------------------------------- #
67__all__: list[str] = ["qs", "ocsb", "ch", "seasonal_strength", "trend_strength", "spikiness"]
70# ---------------------------------------------------------------------------- #
71# #
72# Algorithms ####
73# #
74# ---------------------------------------------------------------------------- #
77@typechecked
78def qs(
79 x: ArrayLike,
80 freq: int = 0,
81 diff: bool = True,
82 residuals: bool = False,
83 autoarima: bool = True,
84) -> Union[tuple[float, float], tuple[float, float, Optional[ARIMA]]]:
85 r"""
86 !!! note "Summary"
87 The $QS$ test, also known as the Ljung-Box test, is a statistical test used to determine whether there is any seasonality present in a time series forecasting model. It is based on the autocorrelation function (ACF) of the residuals, which is a measure of how correlated the residuals are at different lags.
89 ???+ abstract "Details"
91 If `residuals=False` the `autoarima` settings are ignored.
93 If `residuals=True`, a non-seasonal ARIMA model is estimated for the time series. And the residuals of the fitted model are used as input to the test statistic. If an automatic order selection is used, the Hyndman-Khandakar algorithm is employed with: $\max(p)=\max(q)<=3$.
95 The null hypothesis is that there is no correlation in the residuals beyond the specified lags, indicating no seasonality. The alternative hypothesis is that there is significant correlation, indicating seasonality.
97 Here are the steps for performing the $QS$ test:
99 1. Fit a time series model to your data, such as an ARIMA or SARIMA model.
100 1. Calculate the residuals, which are the differences between the observed values and the predicted values from the model.
101 1. Calculate the ACF of the residuals.
102 1. Calculate the Q statistic, which is the sum of the squared values of the autocorrelations at different lags, up to a specified lag. Using the formula above.
103 1. Compare the Q statistic to the critical value from the chi-squared distribution with degrees of freedom equal to the number of lags. If the Q statistic is greater than the critical value, then the null hypothesis is rejected, indicating that there is evidence of seasonality in the residuals.
105 In summary, the $QS$ test is a useful tool for determining whether a time series forecasting model has adequately accounted for seasonality in the data. By detecting any seasonality present in the residuals, it helps to ensure that the model is capturing all the important patterns in the data and making accurate predictions.
107 This function will implement the Python version of the R function [`qs()`](https://rdrr.io/cran/seastests/man/qs.html) from the [`seastests`](https://cran.r-project.org/web/packages/seastests/index.html) library.
109 Params:
110 x (ArrayLike):
111 The univariate time series data to test.
112 freq (int, optional):
113 The frequency of the time series data.<br>
114 Default: `0`
115 diff (bool, optional):
116 Whether or not to run `np.diff()` over the data.<br>
117 Default: `True`
118 residuals (bool, optional):
119 Whether or not to run & return the residuals from the function.<br>
120 Default: `False`
121 autoarima (bool, optional):
122 Whether or not to run the `AutoARIMA()` algorithm over the data.<br>
123 Default: `True`
125 Raises:
126 (AttributeError):
127 If `x` is empty, or `freq` is too low for the data to be adequately tested.
128 (ValueError):
129 If, after differencing the data (by using `np.diff()`), any of the values are `None` (or `Null` or `np.nan`), then it cannot be used for QS Testing.
131 Returns:
132 (Union[tuple[float, float], tuple[float, float, Optional[ARIMA]]]):
133 The results of the QS test.
134 - stat (float): The $\text{QS}$ score for the given data set.
135 - pval (float): The p-value of the given test. Calculated using the survival function of the chi-squared algorithm (also defined as $1-\text{cdf(...)}$). For more info, see: [scipy.stats.chi2](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.chi2.html)
136 - model (Optional[ARIMA]): The ARIMA model used in the calculation of this test. Returned if `residuals` is `True`.
138 ???+ example "Examples"
140 ```pycon {.py .python linenums="1" title="Basic usage"}
141 >>> from ts_stat_tests.utils.data import load_airline
142 >>> from ts_stat_tests.seasonality.algorithms import qs
143 >>> data = load_airline().values
144 >>> qs(data, freq=12)
145 (194.469289..., 5.909223...)
147 ```
149 ```pycon {.py .python linenums="1" title="Advanced usage"}
150 >>> from ts_stat_tests.utils.data import load_airline
151 >>> from ts_stat_tests.seasonality.algorithms import qs
152 >>> data = load_airline().values
153 >>> qs(data, freq=12, diff=True, residuals=True, autoarima=True)
154 The differences of the residuals of a non-seasonal ARIMA model are computed and used. It may be better to either only take the differences or use the residuals.
155 (101.8592..., 7.6126..., ARIMA(order=(1, 1, 1), scoring_args={}, suppress_warnings=True))
157 ```
159 ??? equation "Calculation"
161 The $Q$ statistic is given by:
163 $$
164 QS = (n \times (n+2)) \times \sum_{k=1}^{h} \frac{r_k^2}{n-k}
165 $$
167 where:
169 - $n$ is the sample size,
170 - $r_k$ is the autocorrelation at lag $k$, and
171 - $h$ is the maximum lag to be considered.
173 ```
174 QS = n(n+2) * sum(r_k^2 / (n-k)) for k = 1 to h
175 ```
177 ??? success "Credit"
178 - All credit goes to the [`seastests`](https://cran.r-project.org/web/packages/seastests/index.html) library.
180 ??? question "References"
181 1. Hyndman, R. J. and Y. Khandakar (2008). Automatic Time Series Forecasting: The forecast Package for R. Journal of Statistical Software 27 (3), 1-22.
182 1. Maravall, A. (2011). Seasonality Tests and Automatic Model Identification in TRAMO-SEATS. Bank of Spain.
183 1. Ollech, D. and Webel, K. (2020). A random forest-based approach to identifying the most informative seasonality tests. Deutsche Bundesbank's Discussion Paper series 55/2020.
185 ??? tip "See Also"
186 - [github/seastests/qs.R](https://github.com/cran/seastests/blob/master/R/qs.R)
187 - [rdrr/seastests/qs](https://rdrr.io/cran/seastests/man/qs.html)
188 - [rdocumentation/seastests/qs](https://www.rdocumentation.org/packages/seastests/versions/0.15.4/topics/qs)
189 - [Machine Learning Mastery/How to Identify and Remove Seasonality from Time Series Data with Python](https://machinelearningmastery.com/time-series-seasonality-with-python)
190 - [StackOverflow/Simple tests for seasonality in Python](https://stackoverflow.com/questions/62754218/simple-tests-for-seasonality-in-python)
191 """
193 _x: NDArray[np.float64] = np.asarray(x, dtype=float)
194 if np.isnan(_x).all():
195 raise AttributeError("All observations are NaN.")
196 if diff and residuals:
197 print(
198 "The differences of the residuals of a non-seasonal ARIMA model are computed and used. "
199 "It may be better to either only take the differences or use the residuals."
200 )
201 if freq < 2:
202 raise AttributeError(f"The number of observations per cycle is '{freq}', which is too small.")
204 model: Optional[ARIMA] = None
206 if residuals:
207 if autoarima:
208 max_order: int = 1 if freq < 8 else 3
209 allow_drift: bool = True if freq < 8 else False
210 try:
211 model = auto_arima(
212 y=_x,
213 max_P=1,
214 max_Q=1,
215 max_p=3,
216 max_q=3,
217 seasonal=False,
218 stepwise=False,
219 max_order=max_order,
220 allow_drift=allow_drift,
221 )
222 except (ValueError, RuntimeError, IndexError):
223 try:
224 model = ARIMA(order=(0, 1, 1)).fit(y=_x)
225 except (ValueError, RuntimeError, IndexError):
226 print("Could not estimate any ARIMA model, original data series is used.")
227 if model is not None:
228 _x = model.resid()
229 else:
230 try:
231 model = ARIMA(order=(0, 1, 1)).fit(y=_x)
232 except (ValueError, RuntimeError, IndexError):
233 print("Could not estimate any ARIMA model, original data series is used.")
234 if model is not None:
235 _x = model.resid()
237 # Do diff
238 y: NDArray[np.float64] = np.diff(_x) if diff else _x
240 # Pre-check
241 if np.nanvar(y[~np.isnan(y)]) == 0:
242 raise ValueError(
243 "The Series is a constant (possibly after transformations). QS-Test cannot be computed on constants."
244 )
246 # Test Statistic
247 acf_output: NDArray[np.float64] = _acf(x=y, nlags=freq * 2, missing="drop")
248 rho_output: NDArray[np.float64] = acf_output[[freq, freq * 2]]
249 rho: NDArray[np.float64] = np.array([0, 0]) if np.any(np.array(rho_output) <= 0) else rho_output
250 N: int = len(y[~np.isnan(y)])
251 QS: float = float(N * (N + 2) * (rho[0] ** 2 / (N - freq) + rho[1] ** 2 / (N - freq * 2)))
252 Pval: float = float(chi2.sf(QS, 2))
254 if residuals:
255 return QS, Pval, model
256 return QS, Pval
259@typechecked
260def ocsb(x: ArrayLike, m: int, lag_method: str = "aic", max_lag: int = 3) -> int:
261 r"""
262 !!! note "Summary"
263 Compute the Osborn, Chui, Smith, and Birchenhall ($OCSB$) test for an input time series to determine whether it needs seasonal differencing. The regression equation may include lags of the dependent variable. When `lag_method="fixed"`, the lag order is fixed to `max_lag`; otherwise, `max_lag` is the maximum number of lags considered in a lag selection procedure that minimizes the `lag_method` criterion, which can be `"aic"`, `"bic"` or corrected AIC `"aicc"`.
265 ???+ abstract "Details"
267 The $OCSB$ test is a statistical test that is used to check the presence of seasonality in time series data. Seasonality refers to a pattern in the data that repeats itself at regular intervals.
269 The $OCSB$ test is based on the null hypothesis that there is no seasonality in the time series data. If the p-value of the test is less than the significance level (usually $0.05$), then the null hypothesis is rejected, and it is concluded that there is seasonality in the data.
271 The $OCSB$ test involves dividing the data into two halves and calculating the mean of each half. Then, the differences between the means of each pair of halves are calculated for each possible pair of halves. Finally, the mean of these differences is calculated, and a test statistic is computed.
273 The $OCSB$ test is useful for testing seasonality in time series data because it can detect seasonal patterns that are not obvious in the original data. It is also a useful diagnostic tool for determining the appropriate seasonal differencing parameter in ARIMA models.
275 Critical values for the test are based on simulations, which have been smoothed over to produce critical values for all seasonal periods
277 The null hypothesis of the $OCSB$ test is that there is no seasonality in the time series, and the alternative hypothesis is that there is seasonality. The test statistic is compared to a critical value from a chi-squared distribution with degrees of freedom equal to the number of possible pairs of halves. If the test statistic is larger than the critical value, then the null hypothesis is rejected, and it is concluded that there is evidence of seasonality in the time series.
279 Params:
280 x (ArrayLike):
281 The time series vector.
282 m (int):
283 The seasonal differencing term. For monthly data, e.g., this would be 12. For quarterly, 4, etc. For the OCSB test to work, `m` must exceed `1`.
284 lag_method (str, optional):
285 The lag method to use. One of (`"fixed"`, `"aic"`, `"bic"`, `"aicc"`). The metric for assessing model performance after fitting a linear model.<br>
286 Default: `"aic"`
287 max_lag (int, optional):
288 The maximum lag order to be considered by `lag_method`.<br>
289 Default: `3`
291 Returns:
292 (int):
293 The seasonal differencing term. For different values of `m`, the OCSB statistic is compared to an estimated critical value, and returns 1 if the computed statistic is greater than the critical value, or 0 if not.
295 ???+ example "Examples"
297 ```pycon {.py .python linenums="1" title="Basic usage"}
298 >>> from ts_stat_tests.utils.data import load_airline
299 >>> from ts_stat_tests.seasonality.algorithms import ocsb
300 >>> data = load_airline().values
301 >>> ocsb(x=data, m=12)
302 1
304 ```
306 ??? equation "Calculation"
308 The equation for the $OCSB$ test statistic for a time series of length n is:
310 $$
311 OCSB = \frac{1}{(n-1)} \times \sum \left( \left( x[i] - x \left[ \frac{n}{2+i} \right] \right) - \left( x \left[ \frac{n}{2+i} \right] - x \left[ \frac{i+n}{2+1} \right] \right) \right) ^2
312 $$
314 where:
316 - $n$ is the sample size, and
317 - $x[i]$ is the $i$-th observation in the time series.
319 ```
320 OCSB = (1 / (n - 1)) * sum( ((x[i] - x[n/2+i]) - (x[n/2+i] - x[i+n/2+1]))^2 )
321 ```
323 In this equation, the time series is split into two halves, and the difference between the means of each half is calculated for each possible pair of halves. The sum of the squared differences is then divided by the length of the time series minus one to obtain the $OCSB$ test statistic.
325 ??? success "Credit"
326 - All credit goes to the [`pmdarima`](http://alkaline-ml.com/pmdarima/index.html) library with the implementation of [`pmdarima.arima.OCSBTest`](http://alkaline-ml.com/pmdarima/modules/generated/pmdarima.arima.OCSBTest.html).
328 ??? question "References"
329 - Osborn DR, Chui APL, Smith J, and Birchenhall CR (1988) "Seasonality and the order of integration for consumption", Oxford Bulletin of Economics and Statistics 50(4):361-377.
330 - R's forecast::OCSB test source code: https://bit.ly/2QYQHno
332 ??? tip "See Also"
333 - [pmdarima.arima.OCSBTest](http://alkaline-ml.com/pmdarima/modules/generated/pmdarima.arima.OCSBTest.html)
334 """
335 return OCSBTest(m=m, lag_method=lag_method, max_lag=max_lag).estimate_seasonal_differencing_term(x)
338@typechecked
339def ch(x: ArrayLike, m: int) -> int:
340 r"""
341 !!! note "Summary"
342 The Canova-Hansen test for seasonal differences. Canova and Hansen (1995) proposed a test statistic for the null hypothesis that the seasonal pattern is stable. The test statistic can be formulated in terms of seasonal dummies or seasonal cycles. The former allows us to identify seasons (e.g. months or quarters) that are not stable, while the latter tests the stability of seasonal cycles (e.g. cycles of period 2 and 4 quarters in quarterly data).
344 !!! warning "Warning"
345 This test is generally not used directly, but in conjunction with `pmdarima.arima.nsdiffs()`, which directly estimates the number of seasonal differences.
347 ???+ abstract "Details"
349 The $CH$ test (also known as the Canova-Hansen test) is a statistical test for detecting seasonality in time series data. It is based on the idea of comparing the goodness of fit of two models: a non-seasonal model and a seasonal model. The null hypothesis of the $CH$ test is that the time series is non-seasonal, while the alternative hypothesis is that the time series is seasonal.
351 The test statistic is compared to a critical value from the chi-squared distribution with degrees of freedom equal to the difference in parameters between the two models. If the test statistic exceeds the critical value, the null hypothesis of non-seasonality is rejected in favor of the alternative hypothesis of seasonality.
353 The $CH$ test is based on the following steps:
355 1. Fit a non-seasonal autoregressive integrated moving average (ARIMA) model to the time series data, using a criterion such as Akaike Information Criterion (AIC) or Bayesian Information Criterion (BIC) to determine the optimal model order.
356 1. Fit a seasonal ARIMA model to the time series data, using the same criterion to determine the optimal model order and seasonal period.
357 1. Compute the sum of squared residuals (SSR) for both models.
358 1. Compute the test statistic $CH$ using the formula above.
359 1. Compare the test statistic to a critical value from the chi-squared distribution with degrees of freedom equal to the difference in parameters between the two models. If the test statistic exceeds the critical value, reject the null hypothesis of non-seasonality in favor of the alternative hypothesis of seasonality.
361 The $CH$ test is a powerful test for seasonality in time series data, as it accounts for both the presence and the nature of seasonality. However, it assumes that the time series data is stationary, and it may not be effective for detecting seasonality in non-stationary or irregular time series data. Additionally, it may not work well for time series data with short seasonal periods or with low seasonal amplitudes. Therefore, it should be used in conjunction with other tests and techniques for detecting seasonality in time series data.
363 Params:
364 x (ArrayLike):
365 The time series vector.
366 m (int):
367 The seasonal differencing term. For monthly data, e.g., this would be 12. For quarterly, 4, etc. For the Canova-Hansen test to work, `m` must exceed 1.
369 Returns:
370 (int):
371 The seasonal differencing term.
373 The $CH$ test defines a set of critical values:
375 ```
376 (0.4617146, 0.7479655, 1.0007818,
377 1.2375350, 1.4625240, 1.6920200,
378 1.9043096, 2.1169602, 2.3268562,
379 2.5406922, 2.7391007)
380 ```
382 For different values of `m`, the $CH$ statistic is compared to the corresponding critical value, and returns 1 if the computed statistic is greater than the critical value, or 0 if not.
384 ???+ example "Examples"
386 ```pycon {.py .python linenums="1" title="Basic usage"}
387 >>> from ts_stat_tests.utils.data import load_airline
388 >>> from ts_stat_tests.seasonality.algorithms import ch
389 >>> data = load_airline().values
390 >>> ch(x=data, m=12)
391 0
393 ```
395 ??? equation "Calculation"
397 The test statistic for the $CH$ test is given by:
399 $$
400 CH = \frac { \left( \frac { SSRns - SSRs } { n - p - 1 } \right) } { \left( \frac { SSRs } { n - p - s - 1 } \right) }
401 $$
403 where:
405 - $SSRns$ is the $SSR$ for the non-seasonal model,
406 - $SSRs$ is the $SSR$ for the seasonal model,
407 - $n$ is the sample size,
408 - $p$ is the number of parameters in the non-seasonal model, and
409 - $s$ is the number of parameters in the seasonal model.
411 ```
412 CH = [(SSRns - SSRs) / (n - p - 1)] / (SSRs / (n - p - s - 1))
413 ```
415 ??? note "Notes"
416 This test is generally not used directly, but in conjunction with `pmdarima.arima.nsdiffs()`, which directly estimates the number of seasonal differences.
418 ??? success "Credit"
419 - All credit goes to the [`pmdarima`](http://alkaline-ml.com/pmdarima/index.html) library with the implementation of [`pmdarima.arima.CHTest`](http://alkaline-ml.com/pmdarima/modules/generated/pmdarima.arima.CHTest.html).
421 ??? question "References"
422 - Testing for seasonal stability using the Canova and Hansen test statistic: http://bit.ly/2wKkrZo
423 - R source code for CH test: https://github.com/robjhyndman/forecast/blob/master/R/arima.R#L148
425 ??? tip "See Also"
426 - [`pmdarima.arima.CHTest`](http://alkaline-ml.com/pmdarima/modules/generated/pmdarima.arima.CHTest.html)
427 """
428 return CHTest(m=m).estimate_seasonal_differencing_term(x)
431@typechecked
432def seasonal_strength(x: ArrayLike, m: int) -> float:
433 r"""
434 !!! note "Summary"
435 The seasonal strength test is a statistical test for detecting the strength of seasonality in time series data. It measures the extent to which the seasonal component of a time series explains the variation in the data.
437 ???+ abstract "Details"
439 The seasonal strength test involves computing the seasonal strength index ($SSI$).
441 The $SSI$ ranges between $0$ and $1$, with higher values indicating stronger seasonality in the data. The critical value for the $SSI$ can be obtained from statistical tables based on the sample size and level of significance. If the $SSI$ value exceeds the critical value, the null hypothesis of no seasonality is rejected in favor of the alternative hypothesis of seasonality.
443 The seasonal strength test involves the following steps:
445 1. Decompose the time series data into its seasonal, trend, and residual components using a method such as seasonal decomposition of time series (STL) or moving average decomposition.
446 1. Compute the variance of the seasonal component $Var(S)$ and the variance of the residual component $Var(R)$.
447 1. Compute the $SSI$ using the formula above.
448 1. Compare the $SSI$ to a critical value from a statistical table for a given significance level and sample size. If the $SSI$ exceeds the critical value, reject the null hypothesis of no seasonality in favor of the alternative hypothesis of seasonality.
450 The seasonal strength test is a simple and intuitive test for seasonality in time series data. However, it assumes that the seasonal component is additive and that the residuals are independent and identically distributed. Moreover, it may not be effective for detecting complex seasonal patterns or seasonality in non-stationary or irregular time series data. Therefore, it should be used in conjunction with other tests and techniques for detecting seasonality in time series data.
452 Params:
453 x (ArrayLike):
454 The time series vector.
455 m (int):
456 The seasonal differencing term. For monthly data, e.g., this would be 12. For quarterly, 4, etc. For the seasonal strength test to work, `m` must exceed 1.
458 Returns:
459 (float):
460 The seasonal strength value.
462 ???+ example "Examples"
464 ```pycon {.py .python linenums="1" title="Basic usage"}
465 >>> from ts_stat_tests.utils.data import load_airline
466 >>> from ts_stat_tests.seasonality.algorithms import seasonal_strength
467 >>> data = load_airline().values
468 >>> seasonal_strength(x=data, m=12)
469 0.778721...
471 ```
473 ??? equation "Calculation"
475 The $SSI$ is computed using the following formula:
477 $$
478 SSI = \frac {Var(S)} {Var(S) + Var(R)}
479 $$
481 where:
483 - $Var(S)$ is the variance of the seasonal component, and
484 - $Var(R)$ is the variance of the residual component obtained after decomposing the time series data into its seasonal, trend, and residual components using a method such as STL or moving average decomposition.
486 ```
487 SSI = Var(S) / (Var(S) + Var(R))
488 ```
490 ??? success "Credit"
491 - Inspired by the `tsfeatures` library in both [`Python`](https://github.com/Nixtla/tsfeatures) and [`R`](http://pkg.robjhyndman.com/tsfeatures/).
493 ??? question "References"
494 - Wang, X, Hyndman, RJ, Smith-Miles, K (2007) "Rule-based forecasting filters using time series features", Computational Statistics and Data Analysis, 52(4), 2244-2259.
496 ??? tip "See Also"
497 - [`tsfeatures.stl_features`](https://github.com/Nixtla/tsfeatures/blob/main/tsfeatures/tsfeatures.py)
498 """
499 decomposition = seasonal_decompose(x=x, period=m, model="additive")
500 seasonal = np.nanvar(decomposition.seasonal)
501 residual = np.nanvar(decomposition.resid)
502 return float(seasonal / (seasonal + residual))
505@typechecked
506def trend_strength(x: ArrayLike, m: int) -> float:
507 r"""
508 !!! note "Summary"
509 The trend strength test is a statistical test for detecting the strength of the trend component in time series data. It measures the extent to which the trend component of a time series explains the variation in the data.
511 ???+ abstract "Details"
513 The trend strength test involves computing the trend strength index ($TSI$).
515 The $TSI$ ranges between $0$ and $1$, with higher values indicating stronger trend in the data. The critical value for the $TSI$ can be obtained from statistical tables based on the sample size and level of significance. If the $TSI$ value exceeds the critical value, the null hypothesis of no trend is rejected in favor of the alternative hypothesis of trend.
517 The trend strength test involves the following steps:
519 1. Decompose the time series data into its trend, seasonal, and residual components using a method such as seasonal decomposition of time series (STL) or moving average decomposition.
520 1. Compute the variance of the trend component, denoted by $Var(T)$.
521 1. Compute the variance of the residual component, denoted by $Var(R)$.
522 1. Compute the trend strength index ($TSI$) using the formula above.
523 1. Compare the $TSI$ value to a critical value based on the sample size and level of significance. If the $TSI$ value exceeds the critical value, reject the null hypothesis of no trend in favor of the alternative hypothesis of trend.
525 The trend strength test is a useful tool for identifying the strength of trend in time series data, and it can be used in conjunction with other tests and techniques for detecting trend. However, it assumes that the time series data is stationary and that the trend component is linear. Additionally, it may not be effective for time series data with short time spans or with nonlinear trends. Therefore, it should be used in conjunction with other tests and techniques for detecting trend in time series data.
527 Params:
528 x (ArrayLike):
529 The time series vector.
530 m (int):
531 The frequency of the time series data set. For the trend strength test to work, `m` must exceed 1.
533 Returns:
534 (float):
535 The trend strength score.
537 ???+ example "Examples"
539 ```pycon {.py .python linenums="1" title="Basic usage"}
540 >>> from ts_stat_tests.utils.data import load_airline
541 >>> from ts_stat_tests.seasonality.algorithms import trend_strength
542 >>> data = load_airline().values
543 >>> trend_strength(x=data, m=12)
544 0.965679...
546 ```
548 ??? equation "Calculation"
550 The trend strength test involves computing the trend strength index ($TSI$) using the following formula:
552 $$
553 TSI = \frac{ Var(T) } { Var(T) + Var(R) }
554 $$
556 where:
558 - $Var(T)$ is the variance of the trend component, and
559 - $Var(R)$ is the variance of the residual component obtained after decomposing the time series data into its trend, seasonal, and residual components using a method such as STL or moving average decomposition.
561 ```
562 TSI = Var(T) / (Var(T) + Var(R))
563 ```
565 ??? success "Credit"
566 - Inspired by the `tsfeatures` library in both [`Python`](https://github.com/Nixtla/tsfeatures) and [`R`](http://pkg.robjhyndman.com/tsfeatures/).
568 ??? question "References"
569 - Wang, X, Hyndman, RJ, Smith-Miles, K (2007) "Rule-based forecasting filters using time series features", Computational Statistics and Data Analysis, 52(4), 2244-2259.
571 ??? tip "See Also"
572 - [`tsfeatures.stl_features`](https://github.com/Nixtla/tsfeatures/blob/main/tsfeatures/tsfeatures.py)
573 """
574 decomposition = seasonal_decompose(x=x, period=m, model="additive")
575 trend = np.nanvar(decomposition.trend)
576 residual = np.nanvar(decomposition.resid)
577 return float(trend / (trend + residual))
580@typechecked
581def spikiness(x: ArrayLike, m: int) -> float:
582 r"""
583 !!! note "Summary"
584 The spikiness test is a statistical test that measures the degree of spikiness or volatility in a time series data. It aims to detect the presence of spikes or sudden changes in the data that may indicate important events or anomalies in the underlying process.
586 ???+ abstract "Details"
588 The spikiness test involves computing the spikiness index ($SI$). The $SI$ measures the intensity of spikes or outliers in the data relative to the overall variation. A higher $SI$ value indicates a more spiky or volatile time series, while a lower $SI$ value indicates a smoother or less volatile time series.
590 The spikiness test involves the following steps:
592 1. Decompose the time series data into its seasonal, trend, and residual components using a method such as STL or moving average decomposition.
593 1. Compute the mean absolute deviation of the residual component ($MADR$).
594 1. Compute the mean absolute deviation of the seasonal component ($MADS$).
595 1. Compute the spikiness index ($SI$) using the formula above.
597 The spikiness test can be used in conjunction with other tests and techniques for detecting spikes in time series data, such as change point analysis and outlier detection. However, it assumes that the time series data is stationary and that the spikes are abrupt and sudden. Additionally, it may not be effective for time series data with long-term trends or cyclical patterns. Therefore, it should be used in conjunction with other tests and techniques for detecting spikes in time series data.
599 Params:
600 x (ArrayLike):
601 The time series vector.
602 m (int):
603 The frequency of the time series data set. For the spikiness test to work, `m` must exceed 1.
605 Returns:
606 (float):
607 The spikiness score.
609 ???+ example "Examples"
611 ```pycon {.py .python linenums="1" title="Basic usage"}
612 >>> from ts_stat_tests.utils.data import load_airline
613 >>> from ts_stat_tests.seasonality.algorithms import spikiness
614 >>> data = load_airline().values
615 >>> spikiness(x=data, m=12)
616 0.484221...
618 ```
620 ??? equation "Calculation"
622 The spikiness test involves computing the spikiness index ($SI$) using the following formula:
624 $$
625 SI = \frac {MADR} {MADS}
626 $$
628 where:
630 - $MADR$ is the mean absolute deviation of the residuals, and
631 - $MADS$ is the mean absolute deviation of the seasonal component.
633 ```
634 SI = MADR / MADS
635 ```
637 ??? success "Credit"
638 - All credit to the [`tsfeatures`](http://pkg.robjhyndman.com/tsfeatures/) library. This code is a direct copy+paste from the [`tsfeatures.py`](https://github.com/Nixtla/tsfeatures/blob/master/tsfeatures/tsfeatures.py) module.<br>It is not possible to refer directly to a `spikiness` function in the `tsfeatures` package because the process to calculate seasonal strength is embedded within their `stl_features` function. Therefore, it it necessary to copy it here.
640 ??? question "References"
641 - Wang, X, Hyndman, RJ, Smith-Miles, K (2007) "Rule-based forecasting filters using time series features", Computational Statistics and Data Analysis, 52(4), 2244-2259.
643 ??? tip "See Also"
644 - [`tsfeatures.stl_features`](https://github.com/Nixtla/tsfeatures/blob/main/tsfeatures/tsfeatures.py)
645 """
646 decomposition = seasonal_decompose(x=x, model="additive", period=m)
647 madr = np.nanmean(np.abs(decomposition.resid))
648 mads = np.nanmean(np.abs(decomposition.seasonal))
649 return float(madr / mads)