Coverage for src / ts_stat_tests / seasonality / algorithms.py: 100%

76 statements  

« prev     ^ index     » next       coverage.py v7.13.2, created at 2026-02-01 09:48 +0000

1# ============================================================================ # 

2# # 

3# Title: Seasonality Algorithms # 

4# Purpose: Algorithms for testing seasonality in time series data. # 

5# # 

6# ============================================================================ # 

7 

8 

9# ---------------------------------------------------------------------------- # 

10# # 

11# Overview #### 

12# # 

13# ---------------------------------------------------------------------------- # 

14 

15 

16# ---------------------------------------------------------------------------- # 

17# Description #### 

18# ---------------------------------------------------------------------------- # 

19 

20 

21""" 

22!!! note "Summary" 

23 Seasonality tests are statistical tests used to determine whether a time series exhibits seasonal patterns or cycles. Seasonality refers to the regular and predictable fluctuations in a time series that occur at specific intervals, such as daily, weekly, monthly, or yearly. 

24 

25 Seasonality tests help identify whether a time series has a seasonal component that needs to be accounted for in forecasting models. By detecting seasonality, analysts can choose appropriate models that capture these patterns and improve the accuracy of their forecasts. 

26 

27 Common seasonality tests include the QS test, OCSB test, Canova-Hansen test, and others. These tests analyze the autocorrelation structure of the time series data to identify significant seasonal patterns. 

28 

29 Overall, seasonality tests are essential tools in time series analysis and forecasting, as they help identify and account for seasonal patterns that can significantly impact the accuracy of predictions. 

30""" 

31 

32 

33# ---------------------------------------------------------------------------- # 

34# # 

35# Setup #### 

36# # 

37# ---------------------------------------------------------------------------- # 

38 

39 

40# ---------------------------------------------------------------------------- # 

41# Imports #### 

42# ---------------------------------------------------------------------------- # 

43 

44 

45# ## Python StdLib Imports ---- 

46from typing import Optional, Union 

47 

48# ## Python Third Party Imports ---- 

49import numpy as np 

50from numpy.typing import ArrayLike, NDArray 

51from pmdarima.arima.arima import ARIMA 

52from pmdarima.arima.auto import auto_arima 

53from pmdarima.arima.seasonality import CHTest, OCSBTest 

54from scipy.stats import chi2 

55from statsmodels.tsa.seasonal import seasonal_decompose # , STL, DecomposeResult, 

56from typeguard import typechecked 

57 

58# ## Local First Party Imports ---- 

59from ts_stat_tests.correlation import acf as _acf 

60 

61 

62# ---------------------------------------------------------------------------- # 

63# Exports #### 

64# ---------------------------------------------------------------------------- # 

65 

66 

67__all__: list[str] = ["qs", "ocsb", "ch", "seasonal_strength", "trend_strength", "spikiness"] 

68 

69 

70# ---------------------------------------------------------------------------- # 

71# # 

72# Algorithms #### 

73# # 

74# ---------------------------------------------------------------------------- # 

75 

76 

77@typechecked 

78def qs( 

79 x: ArrayLike, 

80 freq: int = 0, 

81 diff: bool = True, 

82 residuals: bool = False, 

83 autoarima: bool = True, 

84) -> Union[tuple[float, float], tuple[float, float, Optional[ARIMA]]]: 

85 r""" 

86 !!! note "Summary" 

87 The $QS$ test, also known as the Ljung-Box test, is a statistical test used to determine whether there is any seasonality present in a time series forecasting model. It is based on the autocorrelation function (ACF) of the residuals, which is a measure of how correlated the residuals are at different lags. 

88 

89 ???+ abstract "Details" 

90 

91 If `residuals=False` the `autoarima` settings are ignored. 

92 

93 If `residuals=True`, a non-seasonal ARIMA model is estimated for the time series. And the residuals of the fitted model are used as input to the test statistic. If an automatic order selection is used, the Hyndman-Khandakar algorithm is employed with: $\max(p)=\max(q)<=3$. 

94 

95 The null hypothesis is that there is no correlation in the residuals beyond the specified lags, indicating no seasonality. The alternative hypothesis is that there is significant correlation, indicating seasonality. 

96 

97 Here are the steps for performing the $QS$ test: 

98 

99 1. Fit a time series model to your data, such as an ARIMA or SARIMA model. 

100 1. Calculate the residuals, which are the differences between the observed values and the predicted values from the model. 

101 1. Calculate the ACF of the residuals. 

102 1. Calculate the Q statistic, which is the sum of the squared values of the autocorrelations at different lags, up to a specified lag. Using the formula above. 

103 1. Compare the Q statistic to the critical value from the chi-squared distribution with degrees of freedom equal to the number of lags. If the Q statistic is greater than the critical value, then the null hypothesis is rejected, indicating that there is evidence of seasonality in the residuals. 

104 

105 In summary, the $QS$ test is a useful tool for determining whether a time series forecasting model has adequately accounted for seasonality in the data. By detecting any seasonality present in the residuals, it helps to ensure that the model is capturing all the important patterns in the data and making accurate predictions. 

106 

107 This function will implement the Python version of the R function [`qs()`](https://rdrr.io/cran/seastests/man/qs.html) from the [`seastests`](https://cran.r-project.org/web/packages/seastests/index.html) library. 

108 

109 Params: 

110 x (ArrayLike): 

111 The univariate time series data to test. 

112 freq (int, optional): 

113 The frequency of the time series data.<br> 

114 Default: `0` 

115 diff (bool, optional): 

116 Whether or not to run `np.diff()` over the data.<br> 

117 Default: `True` 

118 residuals (bool, optional): 

119 Whether or not to run & return the residuals from the function.<br> 

120 Default: `False` 

121 autoarima (bool, optional): 

122 Whether or not to run the `AutoARIMA()` algorithm over the data.<br> 

123 Default: `True` 

124 

125 Raises: 

126 (AttributeError): 

127 If `x` is empty, or `freq` is too low for the data to be adequately tested. 

128 (ValueError): 

129 If, after differencing the data (by using `np.diff()`), any of the values are `None` (or `Null` or `np.nan`), then it cannot be used for QS Testing. 

130 

131 Returns: 

132 (Union[tuple[float, float], tuple[float, float, Optional[ARIMA]]]): 

133 The results of the QS test. 

134 - stat (float): The $\text{QS}$ score for the given data set. 

135 - pval (float): The p-value of the given test. Calculated using the survival function of the chi-squared algorithm (also defined as $1-\text{cdf(...)}$). For more info, see: [scipy.stats.chi2](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.chi2.html) 

136 - model (Optional[ARIMA]): The ARIMA model used in the calculation of this test. Returned if `residuals` is `True`. 

137 

138 ???+ example "Examples" 

139 

140 ```pycon {.py .python linenums="1" title="Basic usage"} 

141 >>> from ts_stat_tests.utils.data import load_airline 

142 >>> from ts_stat_tests.seasonality.algorithms import qs 

143 >>> data = load_airline().values 

144 >>> qs(data, freq=12) 

145 (194.469289..., 5.909223...) 

146 

147 ``` 

148 

149 ```pycon {.py .python linenums="1" title="Advanced usage"} 

150 >>> from ts_stat_tests.utils.data import load_airline 

151 >>> from ts_stat_tests.seasonality.algorithms import qs 

152 >>> data = load_airline().values 

153 >>> qs(data, freq=12, diff=True, residuals=True, autoarima=True) 

154 The differences of the residuals of a non-seasonal ARIMA model are computed and used. It may be better to either only take the differences or use the residuals. 

155 (101.8592..., 7.6126..., ARIMA(order=(1, 1, 1), scoring_args={}, suppress_warnings=True)) 

156 

157 ``` 

158 

159 ??? equation "Calculation" 

160 

161 The $Q$ statistic is given by: 

162 

163 $$ 

164 QS = (n \times (n+2)) \times \sum_{k=1}^{h} \frac{r_k^2}{n-k} 

165 $$ 

166 

167 where: 

168 

169 - $n$ is the sample size, 

170 - $r_k$ is the autocorrelation at lag $k$, and 

171 - $h$ is the maximum lag to be considered. 

172 

173 ``` 

174 QS = n(n+2) * sum(r_k^2 / (n-k)) for k = 1 to h 

175 ``` 

176 

177 ??? success "Credit" 

178 - All credit goes to the [`seastests`](https://cran.r-project.org/web/packages/seastests/index.html) library. 

179 

180 ??? question "References" 

181 1. Hyndman, R. J. and Y. Khandakar (2008). Automatic Time Series Forecasting: The forecast Package for R. Journal of Statistical Software 27 (3), 1-22. 

182 1. Maravall, A. (2011). Seasonality Tests and Automatic Model Identification in TRAMO-SEATS. Bank of Spain. 

183 1. Ollech, D. and Webel, K. (2020). A random forest-based approach to identifying the most informative seasonality tests. Deutsche Bundesbank's Discussion Paper series 55/2020. 

184 

185 ??? tip "See Also" 

186 - [github/seastests/qs.R](https://github.com/cran/seastests/blob/master/R/qs.R) 

187 - [rdrr/seastests/qs](https://rdrr.io/cran/seastests/man/qs.html) 

188 - [rdocumentation/seastests/qs](https://www.rdocumentation.org/packages/seastests/versions/0.15.4/topics/qs) 

189 - [Machine Learning Mastery/How to Identify and Remove Seasonality from Time Series Data with Python](https://machinelearningmastery.com/time-series-seasonality-with-python) 

190 - [StackOverflow/Simple tests for seasonality in Python](https://stackoverflow.com/questions/62754218/simple-tests-for-seasonality-in-python) 

191 """ 

192 

193 _x: NDArray[np.float64] = np.asarray(x, dtype=float) 

194 if np.isnan(_x).all(): 

195 raise AttributeError("All observations are NaN.") 

196 if diff and residuals: 

197 print( 

198 "The differences of the residuals of a non-seasonal ARIMA model are computed and used. " 

199 "It may be better to either only take the differences or use the residuals." 

200 ) 

201 if freq < 2: 

202 raise AttributeError(f"The number of observations per cycle is '{freq}', which is too small.") 

203 

204 model: Optional[ARIMA] = None 

205 

206 if residuals: 

207 if autoarima: 

208 max_order: int = 1 if freq < 8 else 3 

209 allow_drift: bool = True if freq < 8 else False 

210 try: 

211 model = auto_arima( 

212 y=_x, 

213 max_P=1, 

214 max_Q=1, 

215 max_p=3, 

216 max_q=3, 

217 seasonal=False, 

218 stepwise=False, 

219 max_order=max_order, 

220 allow_drift=allow_drift, 

221 ) 

222 except (ValueError, RuntimeError, IndexError): 

223 try: 

224 model = ARIMA(order=(0, 1, 1)).fit(y=_x) 

225 except (ValueError, RuntimeError, IndexError): 

226 print("Could not estimate any ARIMA model, original data series is used.") 

227 if model is not None: 

228 _x = model.resid() 

229 else: 

230 try: 

231 model = ARIMA(order=(0, 1, 1)).fit(y=_x) 

232 except (ValueError, RuntimeError, IndexError): 

233 print("Could not estimate any ARIMA model, original data series is used.") 

234 if model is not None: 

235 _x = model.resid() 

236 

237 # Do diff 

238 y: NDArray[np.float64] = np.diff(_x) if diff else _x 

239 

240 # Pre-check 

241 if np.nanvar(y[~np.isnan(y)]) == 0: 

242 raise ValueError( 

243 "The Series is a constant (possibly after transformations). QS-Test cannot be computed on constants." 

244 ) 

245 

246 # Test Statistic 

247 acf_output: NDArray[np.float64] = _acf(x=y, nlags=freq * 2, missing="drop") 

248 rho_output: NDArray[np.float64] = acf_output[[freq, freq * 2]] 

249 rho: NDArray[np.float64] = np.array([0, 0]) if np.any(np.array(rho_output) <= 0) else rho_output 

250 N: int = len(y[~np.isnan(y)]) 

251 QS: float = float(N * (N + 2) * (rho[0] ** 2 / (N - freq) + rho[1] ** 2 / (N - freq * 2))) 

252 Pval: float = float(chi2.sf(QS, 2)) 

253 

254 if residuals: 

255 return QS, Pval, model 

256 return QS, Pval 

257 

258 

259@typechecked 

260def ocsb(x: ArrayLike, m: int, lag_method: str = "aic", max_lag: int = 3) -> int: 

261 r""" 

262 !!! note "Summary" 

263 Compute the Osborn, Chui, Smith, and Birchenhall ($OCSB$) test for an input time series to determine whether it needs seasonal differencing. The regression equation may include lags of the dependent variable. When `lag_method="fixed"`, the lag order is fixed to `max_lag`; otherwise, `max_lag` is the maximum number of lags considered in a lag selection procedure that minimizes the `lag_method` criterion, which can be `"aic"`, `"bic"` or corrected AIC `"aicc"`. 

264 

265 ???+ abstract "Details" 

266 

267 The $OCSB$ test is a statistical test that is used to check the presence of seasonality in time series data. Seasonality refers to a pattern in the data that repeats itself at regular intervals. 

268 

269 The $OCSB$ test is based on the null hypothesis that there is no seasonality in the time series data. If the p-value of the test is less than the significance level (usually $0.05$), then the null hypothesis is rejected, and it is concluded that there is seasonality in the data. 

270 

271 The $OCSB$ test involves dividing the data into two halves and calculating the mean of each half. Then, the differences between the means of each pair of halves are calculated for each possible pair of halves. Finally, the mean of these differences is calculated, and a test statistic is computed. 

272 

273 The $OCSB$ test is useful for testing seasonality in time series data because it can detect seasonal patterns that are not obvious in the original data. It is also a useful diagnostic tool for determining the appropriate seasonal differencing parameter in ARIMA models. 

274 

275 Critical values for the test are based on simulations, which have been smoothed over to produce critical values for all seasonal periods 

276 

277 The null hypothesis of the $OCSB$ test is that there is no seasonality in the time series, and the alternative hypothesis is that there is seasonality. The test statistic is compared to a critical value from a chi-squared distribution with degrees of freedom equal to the number of possible pairs of halves. If the test statistic is larger than the critical value, then the null hypothesis is rejected, and it is concluded that there is evidence of seasonality in the time series. 

278 

279 Params: 

280 x (ArrayLike): 

281 The time series vector. 

282 m (int): 

283 The seasonal differencing term. For monthly data, e.g., this would be 12. For quarterly, 4, etc. For the OCSB test to work, `m` must exceed `1`. 

284 lag_method (str, optional): 

285 The lag method to use. One of (`"fixed"`, `"aic"`, `"bic"`, `"aicc"`). The metric for assessing model performance after fitting a linear model.<br> 

286 Default: `"aic"` 

287 max_lag (int, optional): 

288 The maximum lag order to be considered by `lag_method`.<br> 

289 Default: `3` 

290 

291 Returns: 

292 (int): 

293 The seasonal differencing term. For different values of `m`, the OCSB statistic is compared to an estimated critical value, and returns 1 if the computed statistic is greater than the critical value, or 0 if not. 

294 

295 ???+ example "Examples" 

296 

297 ```pycon {.py .python linenums="1" title="Basic usage"} 

298 >>> from ts_stat_tests.utils.data import load_airline 

299 >>> from ts_stat_tests.seasonality.algorithms import ocsb 

300 >>> data = load_airline().values 

301 >>> ocsb(x=data, m=12) 

302 1 

303 

304 ``` 

305 

306 ??? equation "Calculation" 

307 

308 The equation for the $OCSB$ test statistic for a time series of length n is: 

309 

310 $$ 

311 OCSB = \frac{1}{(n-1)} \times \sum \left( \left( x[i] - x \left[ \frac{n}{2+i} \right] \right) - \left( x \left[ \frac{n}{2+i} \right] - x \left[ \frac{i+n}{2+1} \right] \right) \right) ^2 

312 $$ 

313 

314 where: 

315 

316 - $n$ is the sample size, and 

317 - $x[i]$ is the $i$-th observation in the time series. 

318 

319 ``` 

320 OCSB = (1 / (n - 1)) * sum( ((x[i] - x[n/2+i]) - (x[n/2+i] - x[i+n/2+1]))^2 ) 

321 ``` 

322 

323 In this equation, the time series is split into two halves, and the difference between the means of each half is calculated for each possible pair of halves. The sum of the squared differences is then divided by the length of the time series minus one to obtain the $OCSB$ test statistic. 

324 

325 ??? success "Credit" 

326 - All credit goes to the [`pmdarima`](http://alkaline-ml.com/pmdarima/index.html) library with the implementation of [`pmdarima.arima.OCSBTest`](http://alkaline-ml.com/pmdarima/modules/generated/pmdarima.arima.OCSBTest.html). 

327 

328 ??? question "References" 

329 - Osborn DR, Chui APL, Smith J, and Birchenhall CR (1988) "Seasonality and the order of integration for consumption", Oxford Bulletin of Economics and Statistics 50(4):361-377. 

330 - R's forecast::OCSB test source code: https://bit.ly/2QYQHno 

331 

332 ??? tip "See Also" 

333 - [pmdarima.arima.OCSBTest](http://alkaline-ml.com/pmdarima/modules/generated/pmdarima.arima.OCSBTest.html) 

334 """ 

335 return OCSBTest(m=m, lag_method=lag_method, max_lag=max_lag).estimate_seasonal_differencing_term(x) 

336 

337 

338@typechecked 

339def ch(x: ArrayLike, m: int) -> int: 

340 r""" 

341 !!! note "Summary" 

342 The Canova-Hansen test for seasonal differences. Canova and Hansen (1995) proposed a test statistic for the null hypothesis that the seasonal pattern is stable. The test statistic can be formulated in terms of seasonal dummies or seasonal cycles. The former allows us to identify seasons (e.g. months or quarters) that are not stable, while the latter tests the stability of seasonal cycles (e.g. cycles of period 2 and 4 quarters in quarterly data). 

343 

344 !!! warning "Warning" 

345 This test is generally not used directly, but in conjunction with `pmdarima.arima.nsdiffs()`, which directly estimates the number of seasonal differences. 

346 

347 ???+ abstract "Details" 

348 

349 The $CH$ test (also known as the Canova-Hansen test) is a statistical test for detecting seasonality in time series data. It is based on the idea of comparing the goodness of fit of two models: a non-seasonal model and a seasonal model. The null hypothesis of the $CH$ test is that the time series is non-seasonal, while the alternative hypothesis is that the time series is seasonal. 

350 

351 The test statistic is compared to a critical value from the chi-squared distribution with degrees of freedom equal to the difference in parameters between the two models. If the test statistic exceeds the critical value, the null hypothesis of non-seasonality is rejected in favor of the alternative hypothesis of seasonality. 

352 

353 The $CH$ test is based on the following steps: 

354 

355 1. Fit a non-seasonal autoregressive integrated moving average (ARIMA) model to the time series data, using a criterion such as Akaike Information Criterion (AIC) or Bayesian Information Criterion (BIC) to determine the optimal model order. 

356 1. Fit a seasonal ARIMA model to the time series data, using the same criterion to determine the optimal model order and seasonal period. 

357 1. Compute the sum of squared residuals (SSR) for both models. 

358 1. Compute the test statistic $CH$ using the formula above. 

359 1. Compare the test statistic to a critical value from the chi-squared distribution with degrees of freedom equal to the difference in parameters between the two models. If the test statistic exceeds the critical value, reject the null hypothesis of non-seasonality in favor of the alternative hypothesis of seasonality. 

360 

361 The $CH$ test is a powerful test for seasonality in time series data, as it accounts for both the presence and the nature of seasonality. However, it assumes that the time series data is stationary, and it may not be effective for detecting seasonality in non-stationary or irregular time series data. Additionally, it may not work well for time series data with short seasonal periods or with low seasonal amplitudes. Therefore, it should be used in conjunction with other tests and techniques for detecting seasonality in time series data. 

362 

363 Params: 

364 x (ArrayLike): 

365 The time series vector. 

366 m (int): 

367 The seasonal differencing term. For monthly data, e.g., this would be 12. For quarterly, 4, etc. For the Canova-Hansen test to work, `m` must exceed 1. 

368 

369 Returns: 

370 (int): 

371 The seasonal differencing term. 

372 

373 The $CH$ test defines a set of critical values: 

374 

375 ``` 

376 (0.4617146, 0.7479655, 1.0007818, 

377 1.2375350, 1.4625240, 1.6920200, 

378 1.9043096, 2.1169602, 2.3268562, 

379 2.5406922, 2.7391007) 

380 ``` 

381 

382 For different values of `m`, the $CH$ statistic is compared to the corresponding critical value, and returns 1 if the computed statistic is greater than the critical value, or 0 if not. 

383 

384 ???+ example "Examples" 

385 

386 ```pycon {.py .python linenums="1" title="Basic usage"} 

387 >>> from ts_stat_tests.utils.data import load_airline 

388 >>> from ts_stat_tests.seasonality.algorithms import ch 

389 >>> data = load_airline().values 

390 >>> ch(x=data, m=12) 

391 0 

392 

393 ``` 

394 

395 ??? equation "Calculation" 

396 

397 The test statistic for the $CH$ test is given by: 

398 

399 $$ 

400 CH = \frac { \left( \frac { SSRns - SSRs } { n - p - 1 } \right) } { \left( \frac { SSRs } { n - p - s - 1 } \right) } 

401 $$ 

402 

403 where: 

404 

405 - $SSRns$ is the $SSR$ for the non-seasonal model, 

406 - $SSRs$ is the $SSR$ for the seasonal model, 

407 - $n$ is the sample size, 

408 - $p$ is the number of parameters in the non-seasonal model, and 

409 - $s$ is the number of parameters in the seasonal model. 

410 

411 ``` 

412 CH = [(SSRns - SSRs) / (n - p - 1)] / (SSRs / (n - p - s - 1)) 

413 ``` 

414 

415 ??? note "Notes" 

416 This test is generally not used directly, but in conjunction with `pmdarima.arima.nsdiffs()`, which directly estimates the number of seasonal differences. 

417 

418 ??? success "Credit" 

419 - All credit goes to the [`pmdarima`](http://alkaline-ml.com/pmdarima/index.html) library with the implementation of [`pmdarima.arima.CHTest`](http://alkaline-ml.com/pmdarima/modules/generated/pmdarima.arima.CHTest.html). 

420 

421 ??? question "References" 

422 - Testing for seasonal stability using the Canova and Hansen test statistic: http://bit.ly/2wKkrZo 

423 - R source code for CH test: https://github.com/robjhyndman/forecast/blob/master/R/arima.R#L148 

424 

425 ??? tip "See Also" 

426 - [`pmdarima.arima.CHTest`](http://alkaline-ml.com/pmdarima/modules/generated/pmdarima.arima.CHTest.html) 

427 """ 

428 return CHTest(m=m).estimate_seasonal_differencing_term(x) 

429 

430 

431@typechecked 

432def seasonal_strength(x: ArrayLike, m: int) -> float: 

433 r""" 

434 !!! note "Summary" 

435 The seasonal strength test is a statistical test for detecting the strength of seasonality in time series data. It measures the extent to which the seasonal component of a time series explains the variation in the data. 

436 

437 ???+ abstract "Details" 

438 

439 The seasonal strength test involves computing the seasonal strength index ($SSI$). 

440 

441 The $SSI$ ranges between $0$ and $1$, with higher values indicating stronger seasonality in the data. The critical value for the $SSI$ can be obtained from statistical tables based on the sample size and level of significance. If the $SSI$ value exceeds the critical value, the null hypothesis of no seasonality is rejected in favor of the alternative hypothesis of seasonality. 

442 

443 The seasonal strength test involves the following steps: 

444 

445 1. Decompose the time series data into its seasonal, trend, and residual components using a method such as seasonal decomposition of time series (STL) or moving average decomposition. 

446 1. Compute the variance of the seasonal component $Var(S)$ and the variance of the residual component $Var(R)$. 

447 1. Compute the $SSI$ using the formula above. 

448 1. Compare the $SSI$ to a critical value from a statistical table for a given significance level and sample size. If the $SSI$ exceeds the critical value, reject the null hypothesis of no seasonality in favor of the alternative hypothesis of seasonality. 

449 

450 The seasonal strength test is a simple and intuitive test for seasonality in time series data. However, it assumes that the seasonal component is additive and that the residuals are independent and identically distributed. Moreover, it may not be effective for detecting complex seasonal patterns or seasonality in non-stationary or irregular time series data. Therefore, it should be used in conjunction with other tests and techniques for detecting seasonality in time series data. 

451 

452 Params: 

453 x (ArrayLike): 

454 The time series vector. 

455 m (int): 

456 The seasonal differencing term. For monthly data, e.g., this would be 12. For quarterly, 4, etc. For the seasonal strength test to work, `m` must exceed 1. 

457 

458 Returns: 

459 (float): 

460 The seasonal strength value. 

461 

462 ???+ example "Examples" 

463 

464 ```pycon {.py .python linenums="1" title="Basic usage"} 

465 >>> from ts_stat_tests.utils.data import load_airline 

466 >>> from ts_stat_tests.seasonality.algorithms import seasonal_strength 

467 >>> data = load_airline().values 

468 >>> seasonal_strength(x=data, m=12) 

469 0.778721... 

470 

471 ``` 

472 

473 ??? equation "Calculation" 

474 

475 The $SSI$ is computed using the following formula: 

476 

477 $$ 

478 SSI = \frac {Var(S)} {Var(S) + Var(R)} 

479 $$ 

480 

481 where: 

482 

483 - $Var(S)$ is the variance of the seasonal component, and 

484 - $Var(R)$ is the variance of the residual component obtained after decomposing the time series data into its seasonal, trend, and residual components using a method such as STL or moving average decomposition. 

485 

486 ``` 

487 SSI = Var(S) / (Var(S) + Var(R)) 

488 ``` 

489 

490 ??? success "Credit" 

491 - Inspired by the `tsfeatures` library in both [`Python`](https://github.com/Nixtla/tsfeatures) and [`R`](http://pkg.robjhyndman.com/tsfeatures/). 

492 

493 ??? question "References" 

494 - Wang, X, Hyndman, RJ, Smith-Miles, K (2007) "Rule-based forecasting filters using time series features", Computational Statistics and Data Analysis, 52(4), 2244-2259. 

495 

496 ??? tip "See Also" 

497 - [`tsfeatures.stl_features`](https://github.com/Nixtla/tsfeatures/blob/main/tsfeatures/tsfeatures.py) 

498 """ 

499 decomposition = seasonal_decompose(x=x, period=m, model="additive") 

500 seasonal = np.nanvar(decomposition.seasonal) 

501 residual = np.nanvar(decomposition.resid) 

502 return float(seasonal / (seasonal + residual)) 

503 

504 

505@typechecked 

506def trend_strength(x: ArrayLike, m: int) -> float: 

507 r""" 

508 !!! note "Summary" 

509 The trend strength test is a statistical test for detecting the strength of the trend component in time series data. It measures the extent to which the trend component of a time series explains the variation in the data. 

510 

511 ???+ abstract "Details" 

512 

513 The trend strength test involves computing the trend strength index ($TSI$). 

514 

515 The $TSI$ ranges between $0$ and $1$, with higher values indicating stronger trend in the data. The critical value for the $TSI$ can be obtained from statistical tables based on the sample size and level of significance. If the $TSI$ value exceeds the critical value, the null hypothesis of no trend is rejected in favor of the alternative hypothesis of trend. 

516 

517 The trend strength test involves the following steps: 

518 

519 1. Decompose the time series data into its trend, seasonal, and residual components using a method such as seasonal decomposition of time series (STL) or moving average decomposition. 

520 1. Compute the variance of the trend component, denoted by $Var(T)$. 

521 1. Compute the variance of the residual component, denoted by $Var(R)$. 

522 1. Compute the trend strength index ($TSI$) using the formula above. 

523 1. Compare the $TSI$ value to a critical value based on the sample size and level of significance. If the $TSI$ value exceeds the critical value, reject the null hypothesis of no trend in favor of the alternative hypothesis of trend. 

524 

525 The trend strength test is a useful tool for identifying the strength of trend in time series data, and it can be used in conjunction with other tests and techniques for detecting trend. However, it assumes that the time series data is stationary and that the trend component is linear. Additionally, it may not be effective for time series data with short time spans or with nonlinear trends. Therefore, it should be used in conjunction with other tests and techniques for detecting trend in time series data. 

526 

527 Params: 

528 x (ArrayLike): 

529 The time series vector. 

530 m (int): 

531 The frequency of the time series data set. For the trend strength test to work, `m` must exceed 1. 

532 

533 Returns: 

534 (float): 

535 The trend strength score. 

536 

537 ???+ example "Examples" 

538 

539 ```pycon {.py .python linenums="1" title="Basic usage"} 

540 >>> from ts_stat_tests.utils.data import load_airline 

541 >>> from ts_stat_tests.seasonality.algorithms import trend_strength 

542 >>> data = load_airline().values 

543 >>> trend_strength(x=data, m=12) 

544 0.965679... 

545 

546 ``` 

547 

548 ??? equation "Calculation" 

549 

550 The trend strength test involves computing the trend strength index ($TSI$) using the following formula: 

551 

552 $$ 

553 TSI = \frac{ Var(T) } { Var(T) + Var(R) } 

554 $$ 

555 

556 where: 

557 

558 - $Var(T)$ is the variance of the trend component, and 

559 - $Var(R)$ is the variance of the residual component obtained after decomposing the time series data into its trend, seasonal, and residual components using a method such as STL or moving average decomposition. 

560 

561 ``` 

562 TSI = Var(T) / (Var(T) + Var(R)) 

563 ``` 

564 

565 ??? success "Credit" 

566 - Inspired by the `tsfeatures` library in both [`Python`](https://github.com/Nixtla/tsfeatures) and [`R`](http://pkg.robjhyndman.com/tsfeatures/). 

567 

568 ??? question "References" 

569 - Wang, X, Hyndman, RJ, Smith-Miles, K (2007) "Rule-based forecasting filters using time series features", Computational Statistics and Data Analysis, 52(4), 2244-2259. 

570 

571 ??? tip "See Also" 

572 - [`tsfeatures.stl_features`](https://github.com/Nixtla/tsfeatures/blob/main/tsfeatures/tsfeatures.py) 

573 """ 

574 decomposition = seasonal_decompose(x=x, period=m, model="additive") 

575 trend = np.nanvar(decomposition.trend) 

576 residual = np.nanvar(decomposition.resid) 

577 return float(trend / (trend + residual)) 

578 

579 

580@typechecked 

581def spikiness(x: ArrayLike, m: int) -> float: 

582 r""" 

583 !!! note "Summary" 

584 The spikiness test is a statistical test that measures the degree of spikiness or volatility in a time series data. It aims to detect the presence of spikes or sudden changes in the data that may indicate important events or anomalies in the underlying process. 

585 

586 ???+ abstract "Details" 

587 

588 The spikiness test involves computing the spikiness index ($SI$). The $SI$ measures the intensity of spikes or outliers in the data relative to the overall variation. A higher $SI$ value indicates a more spiky or volatile time series, while a lower $SI$ value indicates a smoother or less volatile time series. 

589 

590 The spikiness test involves the following steps: 

591 

592 1. Decompose the time series data into its seasonal, trend, and residual components using a method such as STL or moving average decomposition. 

593 1. Compute the mean absolute deviation of the residual component ($MADR$). 

594 1. Compute the mean absolute deviation of the seasonal component ($MADS$). 

595 1. Compute the spikiness index ($SI$) using the formula above. 

596 

597 The spikiness test can be used in conjunction with other tests and techniques for detecting spikes in time series data, such as change point analysis and outlier detection. However, it assumes that the time series data is stationary and that the spikes are abrupt and sudden. Additionally, it may not be effective for time series data with long-term trends or cyclical patterns. Therefore, it should be used in conjunction with other tests and techniques for detecting spikes in time series data. 

598 

599 Params: 

600 x (ArrayLike): 

601 The time series vector. 

602 m (int): 

603 The frequency of the time series data set. For the spikiness test to work, `m` must exceed 1. 

604 

605 Returns: 

606 (float): 

607 The spikiness score. 

608 

609 ???+ example "Examples" 

610 

611 ```pycon {.py .python linenums="1" title="Basic usage"} 

612 >>> from ts_stat_tests.utils.data import load_airline 

613 >>> from ts_stat_tests.seasonality.algorithms import spikiness 

614 >>> data = load_airline().values 

615 >>> spikiness(x=data, m=12) 

616 0.484221... 

617 

618 ``` 

619 

620 ??? equation "Calculation" 

621 

622 The spikiness test involves computing the spikiness index ($SI$) using the following formula: 

623 

624 $$ 

625 SI = \frac {MADR} {MADS} 

626 $$ 

627 

628 where: 

629 

630 - $MADR$ is the mean absolute deviation of the residuals, and 

631 - $MADS$ is the mean absolute deviation of the seasonal component. 

632 

633 ``` 

634 SI = MADR / MADS 

635 ``` 

636 

637 ??? success "Credit" 

638 - All credit to the [`tsfeatures`](http://pkg.robjhyndman.com/tsfeatures/) library. This code is a direct copy+paste from the [`tsfeatures.py`](https://github.com/Nixtla/tsfeatures/blob/master/tsfeatures/tsfeatures.py) module.<br>It is not possible to refer directly to a `spikiness` function in the `tsfeatures` package because the process to calculate seasonal strength is embedded within their `stl_features` function. Therefore, it it necessary to copy it here. 

639 

640 ??? question "References" 

641 - Wang, X, Hyndman, RJ, Smith-Miles, K (2007) "Rule-based forecasting filters using time series features", Computational Statistics and Data Analysis, 52(4), 2244-2259. 

642 

643 ??? tip "See Also" 

644 - [`tsfeatures.stl_features`](https://github.com/Nixtla/tsfeatures/blob/main/tsfeatures/tsfeatures.py) 

645 """ 

646 decomposition = seasonal_decompose(x=x, model="additive", period=m) 

647 madr = np.nanmean(np.abs(decomposition.resid)) 

648 mads = np.nanmean(np.abs(decomposition.seasonal)) 

649 return float(madr / mads)