Coverage for src/ts_stat_tests/seasonality/algorithms.py: 100%

1# ============================================================================ #

2# #

3# Title: Seasonality Algorithms #

4# Purpose: Algorithms for testing seasonality in time series data. #

5# #

6# ============================================================================ #

9# ---------------------------------------------------------------------------- #

10# #

11# Overview ####

12# #

13# ---------------------------------------------------------------------------- #

16# ---------------------------------------------------------------------------- #

17# Description ####

18# ---------------------------------------------------------------------------- #

21"""

22!!! note "Summary"

23 Seasonality tests are statistical tests used to determine whether a time series exhibits seasonal patterns or cycles. Seasonality refers to the regular and predictable fluctuations in a time series that occur at specific intervals, such as daily, weekly, monthly, or yearly.

25 Seasonality tests help identify whether a time series has a seasonal component that needs to be accounted for in forecasting models. By detecting seasonality, analysts can choose appropriate models that capture these patterns and improve the accuracy of their forecasts.

27 Common seasonality tests include the QS test, OCSB test, Canova-Hansen test, and others. These tests analyze the autocorrelation structure of the time series data to identify significant seasonal patterns.

29 Overall, seasonality tests are essential tools in time series analysis and forecasting, as they help identify and account for seasonal patterns that can significantly impact the accuracy of predictions.

30"""

33# ---------------------------------------------------------------------------- #

34# #

35# Setup ####

36# #

37# ---------------------------------------------------------------------------- #

40# ---------------------------------------------------------------------------- #

41# Imports ####

42# ---------------------------------------------------------------------------- #

45# ## Python StdLib Imports ----

46from typing import Optional, Union

48# ## Python Third Party Imports ----

49import numpy as np

50from numpy.typing import ArrayLike, NDArray

51from pmdarima.arima.arima import ARIMA

52from pmdarima.arima.auto import auto_arima

53from pmdarima.arima.seasonality import CHTest, OCSBTest

54from scipy.stats import chi2

55from statsmodels.tsa.seasonal import seasonal_decompose # , STL, DecomposeResult,

56from typeguard import typechecked

58# ## Local First Party Imports ----

59from ts_stat_tests.correlation import acf as _acf

62# ---------------------------------------------------------------------------- #

63# Exports ####

64# ---------------------------------------------------------------------------- #

67__all__: list[str] = ["qs", "ocsb", "ch", "seasonal_strength", "trend_strength", "spikiness"]

70# ---------------------------------------------------------------------------- #

71# #

72# Algorithms ####

73# #

74# ---------------------------------------------------------------------------- #

77@typechecked

78def qs(

79 x: ArrayLike,

80 freq: int = 0,

81 diff: bool = True,

82 residuals: bool = False,

83 autoarima: bool = True,

84) -> Union[tuple[float, float], tuple[float, float, Optional[ARIMA]]]:

85 r"""

86 !!! note "Summary"

87 The $QS$ test, also known as the Ljung-Box test, is a statistical test used to determine whether there is any seasonality present in a time series forecasting model. It is based on the autocorrelation function (ACF) of the residuals, which is a measure of how correlated the residuals are at different lags.

89 ???+ abstract "Details"

91 If `residuals=False` the `autoarima` settings are ignored.

93 If `residuals=True`, a non-seasonal ARIMA model is estimated for the time series. And the residuals of the fitted model are used as input to the test statistic. If an automatic order selection is used, the Hyndman-Khandakar algorithm is employed with: $\max(p)=\max(q)<=3$.

95 The null hypothesis is that there is no correlation in the residuals beyond the specified lags, indicating no seasonality. The alternative hypothesis is that there is significant correlation, indicating seasonality.

97 Here are the steps for performing the $QS$ test:

99 1. Fit a time series model to your data, such as an ARIMA or SARIMA model.

100 1. Calculate the residuals, which are the differences between the observed values and the predicted values from the model.

101 1. Calculate the ACF of the residuals.

102 1. Calculate the Q statistic, which is the sum of the squared values of the autocorrelations at different lags, up to a specified lag. Using the formula above.

103 1. Compare the Q statistic to the critical value from the chi-squared distribution with degrees of freedom equal to the number of lags. If the Q statistic is greater than the critical value, then the null hypothesis is rejected, indicating that there is evidence of seasonality in the residuals.

104

105 In summary, the $QS$ test is a useful tool for determining whether a time series forecasting model has adequately accounted for seasonality in the data. By detecting any seasonality present in the residuals, it helps to ensure that the model is capturing all the important patterns in the data and making accurate predictions.

106

107 This function will implement the Python version of the R function [`qs()`](https://rdrr.io/cran/seastests/man/qs.html) from the [`seastests`](https://cran.r-project.org/web/packages/seastests/index.html) library.

108

109 Params:

110 x (ArrayLike):

111 The univariate time series data to test.

112 freq (int, optional):

113 The frequency of the time series data.

114 Default: `0`

115 diff (bool, optional):

116 Whether or not to run `np.diff()` over the data.

117 Default: `True`

118 residuals (bool, optional):

119 Whether or not to run & return the residuals from the function.

120 Default: `False`

121 autoarima (bool, optional):

122 Whether or not to run the `AutoARIMA()` algorithm over the data.

123 Default: `True`

124

125 Raises:

126 (AttributeError):

127 If `x` is empty, or `freq` is too low for the data to be adequately tested.

128 (ValueError):

129 If, after differencing the data (by using `np.diff()`), any of the values are `None` (or `Null` or `np.nan`), then it cannot be used for QS Testing.

130

131 Returns:

132 (Union[tuple[float, float], tuple[float, float, Optional[ARIMA]]]):

133 The results of the QS test.

134 - stat (float): The $\text{QS}$ score for the given data set.

135 - pval (float): The p-value of the given test. Calculated using the survival function of the chi-squared algorithm (also defined as $1-\text{cdf(...)}$). For more info, see: [scipy.stats.chi2](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.chi2.html)

136 - model (Optional[ARIMA]): The ARIMA model used in the calculation of this test. Returned if `residuals` is `True`.

137

138 ???+ example "Examples"

139

140 ```pycon {.py .python linenums="1" title="Basic usage"}

141 >>> from ts_stat_tests.utils.data import load_airline

142 >>> from ts_stat_tests.seasonality.algorithms import qs

143 >>> data = load_airline().values

144 >>> qs(data, freq=12)

145 (194.469289..., 5.909223...)

146

147 ```

148

149 ```pycon {.py .python linenums="1" title="Advanced usage"}

150 >>> from ts_stat_tests.utils.data import load_airline

151 >>> from ts_stat_tests.seasonality.algorithms import qs

152 >>> data = load_airline().values

153 >>> qs(data, freq=12, diff=True, residuals=True, autoarima=True)

154 The differences of the residuals of a non-seasonal ARIMA model are computed and used. It may be better to either only take the differences or use the residuals.

155 (101.8592..., 7.6126..., ARIMA(order=(1, 1, 1), scoring_args={}, suppress_warnings=True))

156

157 ```

158

159 ??? equation "Calculation"

160

161 The $Q$ statistic is given by:

162

163 $$

164 QS = (n \times (n+2)) \times \sum_{k=1}^{h} \frac{r_k^2}{n-k}

165 $$

166

167 where:

168

169 - $n$ is the sample size,

170 - $r_k$ is the autocorrelation at lag $k$, and

171 - $h$ is the maximum lag to be considered.

172

173 ```

174 QS = n(n+2) * sum(r_k^2 / (n-k)) for k = 1 to h

175 ```

176

177 ??? success "Credit"

178 - All credit goes to the [`seastests`](https://cran.r-project.org/web/packages/seastests/index.html) library.

179

180 ??? question "References"

181 1. Hyndman, R. J. and Y. Khandakar (2008). Automatic Time Series Forecasting: The forecast Package for R. Journal of Statistical Software 27 (3), 1-22.

182 1. Maravall, A. (2011). Seasonality Tests and Automatic Model Identification in TRAMO-SEATS. Bank of Spain.

183 1. Ollech, D. and Webel, K. (2020). A random forest-based approach to identifying the most informative seasonality tests. Deutsche Bundesbank's Discussion Paper series 55/2020.

184

185 ??? tip "See Also"

186 - [github/seastests/qs.R](https://github.com/cran/seastests/blob/master/R/qs.R)

187 - [rdrr/seastests/qs](https://rdrr.io/cran/seastests/man/qs.html)

188 - [rdocumentation/seastests/qs](https://www.rdocumentation.org/packages/seastests/versions/0.15.4/topics/qs)

189 - [Machine Learning Mastery/How to Identify and Remove Seasonality from Time Series Data with Python](https://machinelearningmastery.com/time-series-seasonality-with-python)

190 - [StackOverflow/Simple tests for seasonality in Python](https://stackoverflow.com/questions/62754218/simple-tests-for-seasonality-in-python)

191 """

192

193 _x: NDArray[np.float64] = np.asarray(x, dtype=float)

194 if np.isnan(_x).all():

195 raise AttributeError("All observations are NaN.")

196 if diff and residuals:

197 print(

198 "The differences of the residuals of a non-seasonal ARIMA model are computed and used. "

199 "It may be better to either only take the differences or use the residuals."

200 )

201 if freq < 2:

202 raise AttributeError(f"The number of observations per cycle is '{freq}', which is too small.")

203

204 model: Optional[ARIMA] = None

205

206 if residuals:

207 if autoarima:

208 max_order: int = 1 if freq < 8 else 3

209 allow_drift: bool = True if freq < 8 else False

210 try:

211 model = auto_arima(

212 y=_x,

213 max_P=1,

214 max_Q=1,

215 max_p=3,

216 max_q=3,

217 seasonal=False,

218 stepwise=False,

219 max_order=max_order,

220 allow_drift=allow_drift,

221 )

222 except (ValueError, RuntimeError, IndexError):

223 try:

224 model = ARIMA(order=(0, 1, 1)).fit(y=_x)

225 except (ValueError, RuntimeError, IndexError):

226 print("Could not estimate any ARIMA model, original data series is used.")

227 if model is not None:

228 _x = model.resid()

229 else:

230 try:

231 model = ARIMA(order=(0, 1, 1)).fit(y=_x)

232 except (ValueError, RuntimeError, IndexError):

233 print("Could not estimate any ARIMA model, original data series is used.")

234 if model is not None:

235 _x = model.resid()

236

237 # Do diff

238 y: NDArray[np.float64] = np.diff(_x) if diff else _x

239

240 # Pre-check

241 if np.nanvar(y[~np.isnan(y)]) == 0:

242 raise ValueError(

243 "The Series is a constant (possibly after transformations). QS-Test cannot be computed on constants."

244 )

245

246 # Test Statistic

247 acf_output: NDArray[np.float64] = _acf(x=y, nlags=freq * 2, missing="drop")

248 rho_output: NDArray[np.float64] = acf_output[[freq, freq * 2]]

249 rho: NDArray[np.float64] = np.array([0, 0]) if np.any(np.array(rho_output) <= 0) else rho_output

250 N: int = len(y[~np.isnan(y)])

251 QS: float = float(N * (N + 2) * (rho[0] ** 2 / (N - freq) + rho[1] ** 2 / (N - freq * 2)))

252 Pval: float = float(chi2.sf(QS, 2))

253

254 if residuals:

255 return QS, Pval, model

256 return QS, Pval

257

258

259@typechecked

260def ocsb(x: ArrayLike, m: int, lag_method: str = "aic", max_lag: int = 3) -> int:

261 r"""

262 !!! note "Summary"

263 Compute the Osborn, Chui, Smith, and Birchenhall ($OCSB$) test for an input time series to determine whether it needs seasonal differencing. The regression equation may include lags of the dependent variable. When `lag_method="fixed"`, the lag order is fixed to `max_lag`; otherwise, `max_lag` is the maximum number of lags considered in a lag selection procedure that minimizes the `lag_method` criterion, which can be `"aic"`, `"bic"` or corrected AIC `"aicc"`.

264

265 ???+ abstract "Details"

266

267 The $OCSB$ test is a statistical test that is used to check the presence of seasonality in time series data. Seasonality refers to a pattern in the data that repeats itself at regular intervals.

268

269 The $OCSB$ test is based on the null hypothesis that there is no seasonality in the time series data. If the p-value of the test is less than the significance level (usually $0.05$), then the null hypothesis is rejected, and it is concluded that there is seasonality in the data.

270

271 The $OCSB$ test involves dividing the data into two halves and calculating the mean of each half. Then, the differences between the means of each pair of halves are calculated for each possible pair of halves. Finally, the mean of these differences is calculated, and a test statistic is computed.

272

273 The $OCSB$ test is useful for testing seasonality in time series data because it can detect seasonal patterns that are not obvious in the original data. It is also a useful diagnostic tool for determining the appropriate seasonal differencing parameter in ARIMA models.

274

275 Critical values for the test are based on simulations, which have been smoothed over to produce critical values for all seasonal periods

276

277 The null hypothesis of the $OCSB$ test is that there is no seasonality in the time series, and the alternative hypothesis is that there is seasonality. The test statistic is compared to a critical value from a chi-squared distribution with degrees of freedom equal to the number of possible pairs of halves. If the test statistic is larger than the critical value, then the null hypothesis is rejected, and it is concluded that there is evidence of seasonality in the time series.

278

279 Params:

280 x (ArrayLike):

281 The time series vector.

282 m (int):

283 The seasonal differencing term. For monthly data, e.g., this would be 12. For quarterly, 4, etc. For the OCSB test to work, `m` must exceed `1`.

284 lag_method (str, optional):

285 The lag method to use. One of (`"fixed"`, `"aic"`, `"bic"`, `"aicc"`). The metric for assessing model performance after fitting a linear model.

286 Default: `"aic"`

287 max_lag (int, optional):

288 The maximum lag order to be considered by `lag_method`.

289 Default: `3`

290

291 Returns:

292 (int):

293 The seasonal differencing term. For different values of `m`, the OCSB statistic is compared to an estimated critical value, and returns 1 if the computed statistic is greater than the critical value, or 0 if not.

294

295 ???+ example "Examples"

296

297 ```pycon {.py .python linenums="1" title="Basic usage"}

298 >>> from ts_stat_tests.utils.data import load_airline

299 >>> from ts_stat_tests.seasonality.algorithms import ocsb

300 >>> data = load_airline().values

301 >>> ocsb(x=data, m=12)

302 1

303

304 ```

305

306 ??? equation "Calculation"

307

308 The equation for the $OCSB$ test statistic for a time series of length n is:

309

310 $$

311 OCSB = \frac{1}{(n-1)} \times \sum \left( \left( x[i] - x \left[ \frac{n}{2+i} \right] \right) - \left( x \left[ \frac{n}{2+i} \right] - x \left[ \frac{i+n}{2+1} \right] \right) \right) ^2

312 $$

313

314 where:

315

316 - $n$ is the sample size, and

317 - $x[i]$ is the $i$-th observation in the time series.

318

319 ```

320 OCSB = (1 / (n - 1)) * sum( ((x[i] - x[n/2+i]) - (x[n/2+i] - x[i+n/2+1]))^2 )

321 ```

322

323 In this equation, the time series is split into two halves, and the difference between the means of each half is calculated for each possible pair of halves. The sum of the squared differences is then divided by the length of the time series minus one to obtain the $OCSB$ test statistic.

324

325 ??? success "Credit"

326 - All credit goes to the [`pmdarima`](http://alkaline-ml.com/pmdarima/index.html) library with the implementation of [`pmdarima.arima.OCSBTest`](http://alkaline-ml.com/pmdarima/modules/generated/pmdarima.arima.OCSBTest.html).

327

328 ??? question "References"

329 - Osborn DR, Chui APL, Smith J, and Birchenhall CR (1988) "Seasonality and the order of integration for consumption", Oxford Bulletin of Economics and Statistics 50(4):361-377.

330 - R's forecast::OCSB test source code: https://bit.ly/2QYQHno

331

332 ??? tip "See Also"

333 - [pmdarima.arima.OCSBTest](http://alkaline-ml.com/pmdarima/modules/generated/pmdarima.arima.OCSBTest.html)

334 """

335 return OCSBTest(m=m, lag_method=lag_method, max_lag=max_lag).estimate_seasonal_differencing_term(x)

336

337

338@typechecked

339def ch(x: ArrayLike, m: int) -> int:

340 r"""

341 !!! note "Summary"

342 The Canova-Hansen test for seasonal differences. Canova and Hansen (1995) proposed a test statistic for the null hypothesis that the seasonal pattern is stable. The test statistic can be formulated in terms of seasonal dummies or seasonal cycles. The former allows us to identify seasons (e.g. months or quarters) that are not stable, while the latter tests the stability of seasonal cycles (e.g. cycles of period 2 and 4 quarters in quarterly data).

343

344 !!! warning "Warning"

345 This test is generally not used directly, but in conjunction with `pmdarima.arima.nsdiffs()`, which directly estimates the number of seasonal differences.

346

347 ???+ abstract "Details"

348

349 The $CH$ test (also known as the Canova-Hansen test) is a statistical test for detecting seasonality in time series data. It is based on the idea of comparing the goodness of fit of two models: a non-seasonal model and a seasonal model. The null hypothesis of the $CH$ test is that the time series is non-seasonal, while the alternative hypothesis is that the time series is seasonal.

350

351 The test statistic is compared to a critical value from the chi-squared distribution with degrees of freedom equal to the difference in parameters between the two models. If the test statistic exceeds the critical value, the null hypothesis of non-seasonality is rejected in favor of the alternative hypothesis of seasonality.

352

353 The $CH$ test is based on the following steps:

354

355 1. Fit a non-seasonal autoregressive integrated moving average (ARIMA) model to the time series data, using a criterion such as Akaike Information Criterion (AIC) or Bayesian Information Criterion (BIC) to determine the optimal model order.

356 1. Fit a seasonal ARIMA model to the time series data, using the same criterion to determine the optimal model order and seasonal period.

357 1. Compute the sum of squared residuals (SSR) for both models.

358 1. Compute the test statistic $CH$ using the formula above.

359 1. Compare the test statistic to a critical value from the chi-squared distribution with degrees of freedom equal to the difference in parameters between the two models. If the test statistic exceeds the critical value, reject the null hypothesis of non-seasonality in favor of the alternative hypothesis of seasonality.

360

361 The $CH$ test is a powerful test for seasonality in time series data, as it accounts for both the presence and the nature of seasonality. However, it assumes that the time series data is stationary, and it may not be effective for detecting seasonality in non-stationary or irregular time series data. Additionally, it may not work well for time series data with short seasonal periods or with low seasonal amplitudes. Therefore, it should be used in conjunction with other tests and techniques for detecting seasonality in time series data.

362

363 Params:

364 x (ArrayLike):

365 The time series vector.

366 m (int):

367 The seasonal differencing term. For monthly data, e.g., this would be 12. For quarterly, 4, etc. For the Canova-Hansen test to work, `m` must exceed 1.

368

369 Returns:

370 (int):

371 The seasonal differencing term.

372

373 The $CH$ test defines a set of critical values:

374

375 ```

376 (0.4617146, 0.7479655, 1.0007818,

377 1.2375350, 1.4625240, 1.6920200,

378 1.9043096, 2.1169602, 2.3268562,

379 2.5406922, 2.7391007)

380 ```

381

382 For different values of `m`, the $CH$ statistic is compared to the corresponding critical value, and returns 1 if the computed statistic is greater than the critical value, or 0 if not.

383

384 ???+ example "Examples"

385

386 ```pycon {.py .python linenums="1" title="Basic usage"}

387 >>> from ts_stat_tests.utils.data import load_airline

388 >>> from ts_stat_tests.seasonality.algorithms import ch

389 >>> data = load_airline().values

390 >>> ch(x=data, m=12)

391 0

392

393 ```

394

395 ??? equation "Calculation"

396

397 The test statistic for the $CH$ test is given by:

398

399 $$

400 CH = \frac { \left( \frac { SSRns - SSRs } { n - p - 1 } \right) } { \left( \frac { SSRs } { n - p - s - 1 } \right) }

401 $$

402

403 where:

404

405 - $SSRns$ is the $SSR$ for the non-seasonal model,

406 - $SSRs$ is the $SSR$ for the seasonal model,

407 - $n$ is the sample size,

408 - $p$ is the number of parameters in the non-seasonal model, and

409 - $s$ is the number of parameters in the seasonal model.

410

411 ```

412 CH = [(SSRns - SSRs) / (n - p - 1)] / (SSRs / (n - p - s - 1))

413 ```

414

415 ??? note "Notes"

416 This test is generally not used directly, but in conjunction with `pmdarima.arima.nsdiffs()`, which directly estimates the number of seasonal differences.

417

418 ??? success "Credit"

419 - All credit goes to the [`pmdarima`](http://alkaline-ml.com/pmdarima/index.html) library with the implementation of [`pmdarima.arima.CHTest`](http://alkaline-ml.com/pmdarima/modules/generated/pmdarima.arima.CHTest.html).

420

421 ??? question "References"

422 - Testing for seasonal stability using the Canova and Hansen test statistic: http://bit.ly/2wKkrZo

423 - R source code for CH test: https://github.com/robjhyndman/forecast/blob/master/R/arima.R#L148

424

425 ??? tip "See Also"

426 - [`pmdarima.arima.CHTest`](http://alkaline-ml.com/pmdarima/modules/generated/pmdarima.arima.CHTest.html)

427 """

428 return CHTest(m=m).estimate_seasonal_differencing_term(x)

429

430

431@typechecked

432def seasonal_strength(x: ArrayLike, m: int) -> float:

433 r"""

434 !!! note "Summary"

435 The seasonal strength test is a statistical test for detecting the strength of seasonality in time series data. It measures the extent to which the seasonal component of a time series explains the variation in the data.

436

437 ???+ abstract "Details"

438

439 The seasonal strength test involves computing the seasonal strength index ($SSI$).

440

441 The $SSI$ ranges between $0$ and $1$, with higher values indicating stronger seasonality in the data. The critical value for the $SSI$ can be obtained from statistical tables based on the sample size and level of significance. If the $SSI$ value exceeds the critical value, the null hypothesis of no seasonality is rejected in favor of the alternative hypothesis of seasonality.

442

443 The seasonal strength test involves the following steps:

444

445 1. Decompose the time series data into its seasonal, trend, and residual components using a method such as seasonal decomposition of time series (STL) or moving average decomposition.

446 1. Compute the variance of the seasonal component $Var(S)$ and the variance of the residual component $Var(R)$.

447 1. Compute the $SSI$ using the formula above.

448 1. Compare the $SSI$ to a critical value from a statistical table for a given significance level and sample size. If the $SSI$ exceeds the critical value, reject the null hypothesis of no seasonality in favor of the alternative hypothesis of seasonality.

449

450 The seasonal strength test is a simple and intuitive test for seasonality in time series data. However, it assumes that the seasonal component is additive and that the residuals are independent and identically distributed. Moreover, it may not be effective for detecting complex seasonal patterns or seasonality in non-stationary or irregular time series data. Therefore, it should be used in conjunction with other tests and techniques for detecting seasonality in time series data.

451

452 Params:

453 x (ArrayLike):

454 The time series vector.

455 m (int):

456 The seasonal differencing term. For monthly data, e.g., this would be 12. For quarterly, 4, etc. For the seasonal strength test to work, `m` must exceed 1.

457

458 Returns:

459 (float):

460 The seasonal strength value.

461

462 ???+ example "Examples"

463

464 ```pycon {.py .python linenums="1" title="Basic usage"}

465 >>> from ts_stat_tests.utils.data import load_airline

466 >>> from ts_stat_tests.seasonality.algorithms import seasonal_strength

467 >>> data = load_airline().values

468 >>> seasonal_strength(x=data, m=12)

469 0.778721...

470

471 ```

472

473 ??? equation "Calculation"

474

475 The $SSI$ is computed using the following formula:

476

477 $$

478 SSI = \frac {Var(S)} {Var(S) + Var(R)}

479 $$

480

481 where:

482

483 - $Var(S)$ is the variance of the seasonal component, and

484 - $Var(R)$ is the variance of the residual component obtained after decomposing the time series data into its seasonal, trend, and residual components using a method such as STL or moving average decomposition.

485

486 ```

487 SSI = Var(S) / (Var(S) + Var(R))

488 ```

489

490 ??? success "Credit"

491 - Inspired by the `tsfeatures` library in both [`Python`](https://github.com/Nixtla/tsfeatures) and [`R`](http://pkg.robjhyndman.com/tsfeatures/).

492

493 ??? question "References"

494 - Wang, X, Hyndman, RJ, Smith-Miles, K (2007) "Rule-based forecasting filters using time series features", Computational Statistics and Data Analysis, 52(4), 2244-2259.

495

496 ??? tip "See Also"

497 - [`tsfeatures.stl_features`](https://github.com/Nixtla/tsfeatures/blob/main/tsfeatures/tsfeatures.py)

498 """

499 decomposition = seasonal_decompose(x=x, period=m, model="additive")

500 seasonal = np.nanvar(decomposition.seasonal)

501 residual = np.nanvar(decomposition.resid)

502 return float(seasonal / (seasonal + residual))

503

504

505@typechecked

506def trend_strength(x: ArrayLike, m: int) -> float:

507 r"""

508 !!! note "Summary"

509 The trend strength test is a statistical test for detecting the strength of the trend component in time series data. It measures the extent to which the trend component of a time series explains the variation in the data.

510

511 ???+ abstract "Details"

512

513 The trend strength test involves computing the trend strength index ($TSI$).

514

515 The $TSI$ ranges between $0$ and $1$, with higher values indicating stronger trend in the data. The critical value for the $TSI$ can be obtained from statistical tables based on the sample size and level of significance. If the $TSI$ value exceeds the critical value, the null hypothesis of no trend is rejected in favor of the alternative hypothesis of trend.

516

517 The trend strength test involves the following steps:

518

519 1. Decompose the time series data into its trend, seasonal, and residual components using a method such as seasonal decomposition of time series (STL) or moving average decomposition.

520 1. Compute the variance of the trend component, denoted by $Var(T)$.

521 1. Compute the variance of the residual component, denoted by $Var(R)$.

522 1. Compute the trend strength index ($TSI$) using the formula above.

523 1. Compare the $TSI$ value to a critical value based on the sample size and level of significance. If the $TSI$ value exceeds the critical value, reject the null hypothesis of no trend in favor of the alternative hypothesis of trend.

524

525 The trend strength test is a useful tool for identifying the strength of trend in time series data, and it can be used in conjunction with other tests and techniques for detecting trend. However, it assumes that the time series data is stationary and that the trend component is linear. Additionally, it may not be effective for time series data with short time spans or with nonlinear trends. Therefore, it should be used in conjunction with other tests and techniques for detecting trend in time series data.

526

527 Params:

528 x (ArrayLike):

529 The time series vector.

530 m (int):

531 The frequency of the time series data set. For the trend strength test to work, `m` must exceed 1.

532

533 Returns:

534 (float):

535 The trend strength score.

536

537 ???+ example "Examples"

538

539 ```pycon {.py .python linenums="1" title="Basic usage"}

540 >>> from ts_stat_tests.utils.data import load_airline

541 >>> from ts_stat_tests.seasonality.algorithms import trend_strength

542 >>> data = load_airline().values

543 >>> trend_strength(x=data, m=12)

544 0.965679...

545

546 ```

547

548 ??? equation "Calculation"

549

550 The trend strength test involves computing the trend strength index ($TSI$) using the following formula:

551

552 $$

553 TSI = \frac{ Var(T) } { Var(T) + Var(R) }

554 $$

555

556 where:

557

558 - $Var(T)$ is the variance of the trend component, and

559 - $Var(R)$ is the variance of the residual component obtained after decomposing the time series data into its trend, seasonal, and residual components using a method such as STL or moving average decomposition.

560

561 ```

562 TSI = Var(T) / (Var(T) + Var(R))

563 ```

564

565 ??? success "Credit"

566 - Inspired by the `tsfeatures` library in both [`Python`](https://github.com/Nixtla/tsfeatures) and [`R`](http://pkg.robjhyndman.com/tsfeatures/).

567

568 ??? question "References"

569 - Wang, X, Hyndman, RJ, Smith-Miles, K (2007) "Rule-based forecasting filters using time series features", Computational Statistics and Data Analysis, 52(4), 2244-2259.

570

571 ??? tip "See Also"

572 - [`tsfeatures.stl_features`](https://github.com/Nixtla/tsfeatures/blob/main/tsfeatures/tsfeatures.py)

573 """

574 decomposition = seasonal_decompose(x=x, period=m, model="additive")

575 trend = np.nanvar(decomposition.trend)

576 residual = np.nanvar(decomposition.resid)

577 return float(trend / (trend + residual))

578

579

580@typechecked

581def spikiness(x: ArrayLike, m: int) -> float:

582 r"""

583 !!! note "Summary"

584 The spikiness test is a statistical test that measures the degree of spikiness or volatility in a time series data. It aims to detect the presence of spikes or sudden changes in the data that may indicate important events or anomalies in the underlying process.

585

586 ???+ abstract "Details"

587

588 The spikiness test involves computing the spikiness index ($SI$). The $SI$ measures the intensity of spikes or outliers in the data relative to the overall variation. A higher $SI$ value indicates a more spiky or volatile time series, while a lower $SI$ value indicates a smoother or less volatile time series.

589

590 The spikiness test involves the following steps:

591

592 1. Decompose the time series data into its seasonal, trend, and residual components using a method such as STL or moving average decomposition.

593 1. Compute the mean absolute deviation of the residual component ($MADR$).

594 1. Compute the mean absolute deviation of the seasonal component ($MADS$).

595 1. Compute the spikiness index ($SI$) using the formula above.

596

597 The spikiness test can be used in conjunction with other tests and techniques for detecting spikes in time series data, such as change point analysis and outlier detection. However, it assumes that the time series data is stationary and that the spikes are abrupt and sudden. Additionally, it may not be effective for time series data with long-term trends or cyclical patterns. Therefore, it should be used in conjunction with other tests and techniques for detecting spikes in time series data.

598

599 Params:

600 x (ArrayLike):

601 The time series vector.

602 m (int):

603 The frequency of the time series data set. For the spikiness test to work, `m` must exceed 1.

604

605 Returns:

606 (float):

607 The spikiness score.

608

609 ???+ example "Examples"

610

611 ```pycon {.py .python linenums="1" title="Basic usage"}

612 >>> from ts_stat_tests.utils.data import load_airline

613 >>> from ts_stat_tests.seasonality.algorithms import spikiness

614 >>> data = load_airline().values

615 >>> spikiness(x=data, m=12)

616 0.484221...

617

618 ```

619

620 ??? equation "Calculation"

621

622 The spikiness test involves computing the spikiness index ($SI$) using the following formula:

623

624 $$

625 SI = \frac {MADR} {MADS}

626 $$

627

628 where:

629

630 - $MADR$ is the mean absolute deviation of the residuals, and

631 - $MADS$ is the mean absolute deviation of the seasonal component.

632

633 ```

634 SI = MADR / MADS

635 ```

636

637 ??? success "Credit"

638 - All credit to the [`tsfeatures`](http://pkg.robjhyndman.com/tsfeatures/) library. This code is a direct copy+paste from the [`tsfeatures.py`](https://github.com/Nixtla/tsfeatures/blob/master/tsfeatures/tsfeatures.py) module. It is not possible to refer directly to a `spikiness` function in the `tsfeatures` package because the process to calculate seasonal strength is embedded within their `stl_features` function. Therefore, it it necessary to copy it here.

639

640 ??? question "References"

641 - Wang, X, Hyndman, RJ, Smith-Miles, K (2007) "Rule-based forecasting filters using time series features", Computational Statistics and Data Analysis, 52(4), 2244-2259.

642

643 ??? tip "See Also"

644 - [`tsfeatures.stl_features`](https://github.com/Nixtla/tsfeatures/blob/main/tsfeatures/tsfeatures.py)

645 """

646 decomposition = seasonal_decompose(x=x, model="additive", period=m)

647 madr = np.nanmean(np.abs(decomposition.resid))

648 mads = np.nanmean(np.abs(decomposition.seasonal))

649 return float(madr / mads)

Coverage for src / ts_stat_tests / seasonality / algorithms.py: 100%

76 statements