Coverage for src/ts_stat_tests/stationarity/algorithms.py: 100%

1# ============================================================================ #

2# #

3# Title: Stationarity Algorithms #

4# Purpose: Algorithms to test for stationarity in time series data. #

5# #

6# ============================================================================ #

9# ---------------------------------------------------------------------------- #

10# #

11# Overview ####

12# #

13# ---------------------------------------------------------------------------- #

16# ---------------------------------------------------------------------------- #

17# Description ####

18# ---------------------------------------------------------------------------- #

21"""

22!!! note "Summary"

23 Stationarity tests are statistical tests used to determine whether a time series is stationary or not. A stationary time series is one whose statistical properties, such as mean and variance, do not change over time. Stationarity is an important assumption in many time series forecasting models, as it allows for the use of techniques such as autoregression and moving averages.

25 There are several different types of stationarity tests, including the Augmented Dickey-Fuller (ADF) test, the Kwiatkowski-Phillips-Schmidt-Shin (KPSS) test, the Phillips-Perron (PP) test, the Elliott-Rothenberg-Stock (ERS) test, and the Variance Ratio (VR) test. Each of these tests has its own strengths and weaknesses, and the choice of which test to use will depend on the specific characteristics of the time series being analyzed.

27 Overall, stationarity tests are an important tool in time series analysis and forecasting, as they help identify whether a time series is stationary or non-stationary, which can have implications for the choice of forecasting models and methods.

29 For a really good article on ADF & KPSS tests, check: [When A Time Series Only Quacks Like A Duck: Testing for Stationarity Before Running Forecast Models. With Python. And A Duckling Picture.](https://towardsdatascience.com/when-a-time-series-only-quacks-like-a-duck-10de9e165e)

30"""

33# ---------------------------------------------------------------------------- #

34# #

35# Setup ####

36# #

37# ---------------------------------------------------------------------------- #

40# ---------------------------------------------------------------------------- #

41# Imports ####

42# ---------------------------------------------------------------------------- #

45# ## Python StdLib Imports ----

46from typing import Any, Literal, Optional, Union, overload

48# ## Python Third Party Imports ----

49import numpy as np

50from arch.unitroot import (

51 DFGLS as _ers,

52 PhillipsPerron as _pp,

53 VarianceRatio as _vr,

54)

55from numpy.typing import ArrayLike

56from statsmodels.stats.diagnostic import ResultsStore

57from statsmodels.tsa.stattools import (

58 adfuller as _adfuller,

59 kpss as _kpss,

60 range_unit_root_test as _rur,

61 zivot_andrews as _za,

62)

63from typeguard import typechecked

66# ---------------------------------------------------------------------------- #

67# Exports ####

68# ---------------------------------------------------------------------------- #

71__all__: list[str] = ["adf", "kpss", "rur", "za", "pp", "ers", "vr"]

74## --------------------------------------------------------------------------- #

75## Constants ####

76## --------------------------------------------------------------------------- #

79VALID_ADF_REGRESSION_OPTIONS = Literal["c", "ct", "ctt", "n"]

80VALID_ADF_AUTOLAG_OPTIONS = Literal["AIC", "BIC", "t-stat"]

81VALID_KPSS_REGRESSION_OPTIONS = Literal["c", "ct"]

82VALID_KPSS_NLAGS_OPTIONS = Literal["auto", "legacy"]

83VALID_ZA_REGRESSION_OPTIONS = Literal["c", "t", "ct"]

84VALID_ZA_AUTOLAG_OPTIONS = Literal["AIC", "BIC", "t-stat"]

85VALID_PP_TREND_OPTIONS = Literal["n", "c", "ct"]

86VALID_PP_TEST_TYPE_OPTIONS = Literal["rho", "tau"]

87VALID_ERS_TREND_OPTIONS = Literal["c", "ct"]

88VALID_ERS_METHOD_OPTIONS = Literal["aic", "bic", "t-stat"]

89VALID_VR_TREND_OPTIONS = Literal["c", "n"]

92# ---------------------------------------------------------------------------- #

93# #

94# Algorithms ####

95# #

96# ---------------------------------------------------------------------------- #

99@overload

100def adf(

101 x: ArrayLike,

102 maxlag: Optional[int] = None,

103 regression: VALID_ADF_REGRESSION_OPTIONS = "c",

104 *,

105 autolag: Optional[VALID_ADF_AUTOLAG_OPTIONS] = "AIC",

106 store: Literal[True],

107 regresults: bool = False,

108) -> tuple[float, float, dict, ResultsStore]: ...

109@overload

110def adf(

111 x: ArrayLike,

112 maxlag: Optional[int] = None,

113 regression: VALID_ADF_REGRESSION_OPTIONS = "c",

114 *,

115 autolag: None,

116 store: Literal[False] = False,

117 regresults: bool = False,

118) -> tuple[float, float, int, int, dict]: ...

119@overload

120def adf(

121 x: ArrayLike,

122 maxlag: Optional[int] = None,

123 regression: VALID_ADF_REGRESSION_OPTIONS = "c",

124 *,

125 autolag: VALID_ADF_AUTOLAG_OPTIONS = "AIC",

126 store: Literal[False] = False,

127 regresults: bool = False,

128) -> tuple[float, float, int, int, dict, float]: ...

129@typechecked

130def adf(

131 x: ArrayLike,

132 maxlag: Optional[int] = None,

133 regression: VALID_ADF_REGRESSION_OPTIONS = "c",

134 *,

135 autolag: Optional[VALID_ADF_AUTOLAG_OPTIONS] = "AIC",

136 store: bool = False,

137 regresults: bool = False,

138) -> Union[

139 tuple[float, float, dict, ResultsStore],

140 tuple[float, float, int, int, dict],

141 tuple[float, float, int, int, dict, float],

142]:

143 r"""

144 !!! note "Summary"

145 The Augmented Dickey-Fuller test can be used to test for a unit root in a univariate process in the presence of serial correlation.

146

147 ???+ abstract "Details"

148

149 The Augmented Dickey-Fuller (ADF) test is a statistical test used to determine whether a time series is stationary or not. Stationarity refers to the property of a time series where the statistical properties, such as mean and variance, remain constant over time. Stationarity is important for time series forecasting as it allows for the use of many popular forecasting models, such as ARIMA.

150

151 The ADF test is an extension of the Dickey-Fuller test and involves regressing the first-difference of the time series on its lagged values, and then testing whether the coefficient of the lagged first-difference term is statistically significant. If it is, then the time series is considered non-stationary.

152

153 The null hypothesis of the ADF test is that the time series has a unit root, which means that it is non-stationary. The alternative hypothesis is that the time series is stationary. If the p-value of the test is less than a chosen significance level, typically 0.05, then we reject the null hypothesis and conclude that the time series is stationary.

154

155 In practical terms, if a time series is found to be non-stationary by the ADF test, one can apply differencing to the time series until it becomes stationary. This involves taking the difference between consecutive observations and potentially repeating this process until the time series is stationary.

156

157 Params:

158 x (ArrayLike):

159 The data series to test.

160 maxlag (Optional[int]):

161 Maximum lag which is included in test, default value of $12 \times (\frac{nobs}{100})^{\frac{1}{4}}$ is used when `None`.

162 Default: `None`

163 regression (VALID_ADF_REGRESSION_OPTIONS):

164 Constant and trend order to include in regression.

165

166 - `"c"`: constant only (default).

167 - `"ct"`: constant and trend.

168 - `"ctt"`: constant, and linear and quadratic trend.

169 - `"n"`: no constant, no trend.

170

171 Default: `"c"`

172 autolag (Optional[VALID_ADF_AUTOLAG_OPTIONS]):

173 Method to use when automatically determining the lag length among the values $0, 1, ..., maxlag$.

174

175 - If `"AIC"` (default) or `"BIC"`, then the number of lags is chosen to minimize the corresponding information criterion.

176 - `"t-stat"` based choice of `maxlag`. Starts with `maxlag` and drops a lag until the t-statistic on the last lag length is significant using a 5%-sized test.

177 - If `None`, then the number of included lags is set to `maxlag`.

178

179 Default: `"AIC"`

180 store (bool):

181 If `True`, then a result instance is returned additionally to the `adf` statistic.

182 Default: `False`

183 regresults (bool):

184 If `True`, the full regression results are returned.

185 Default: `False`

186

187 Returns:

188 (Union[tuple[float, float, dict, ResultsStore], tuple[float, float, int, int, dict], tuple[float, float, int, int, dict, float]]):

189 Depending on parameters, returns a tuple containing:

190 - `adf` (float): The test statistic.

191 - `pvalue` (float): MacKinnon's approximate p-value.

192 - `uselag` (int): The number of lags used.

193 - `nobs` (int): The number of observations used.

194 - `critical_values` (dict): Critical values at the 1%, 5%, and 10% levels.

195 - `icbest` (float): The maximized information criterion (if `autolag` is not `None`).

196 - `resstore` (Optional[ResultsStore]): Result instance (if `store` is `True`).

197

198 ???+ example "Examples"

199

200 ```pycon {.py .python linenums="1" title="Setup"}

201 >>> from ts_stat_tests.stationarity.algorithms import adf

202 >>> from ts_stat_tests.utils.data import data_airline, data_normal

203 >>> normal = data_normal

204 >>> airline = data_airline.values

205

206 ```

207

208 ```pycon {.py .python linenums="1" title="Example 1: Stationary Series"}

209 >>> stat, pvalue, lags, nobs, crit, icbest = adf(x=normal)

210 >>> print(f"ADF statistic: {stat:.4f}")

211 ADF statistic: -30.7838

212 >>> print(f"p-value: {pvalue:.4f}")

213 p-value: 0.0000

214

215 ```

216

217 ```pycon {.py .python linenums="1" title="Example 2: Airline Passengers Data"}

218 >>> stat, pvalue, lags, nobs, crit, icbest = adf(x=airline)

219 >>> print(f"p-value: {pvalue:.4f}")

220 p-value: 0.9919

221

222 ```

223

224 ```pycon {.py .python linenums="1" title="Example 3: Store Result Instance"}

225 >>> res = adf(x=airline, store=True)

226 >>> print(res)

227 (0.8153..., 0.9918..., {'1%': np.float64(-3.4816...), '5%': np.float64(-2.8840...), '10%': np.float64(-2.5787...)}, <statsmodels.stats.diagnostic.ResultsStore object at ...>)

228

229 ```

230

231 ```pycon {.py .python linenums="1" title="Example 4: No Autolag"}

232 >>> stat, pvalue, lags, nobs, crit = adf(x=airline, autolag=None, maxlag=5)

233 >>> print(f"p-value: {pvalue:.4f}")

234 p-value: 0.7670

235

236 ```

237

238 ??? equation "Calculation"

239

240 The mathematical equation for the Augmented Dickey-Fuller (ADF) test for stationarity in time series forecasting is:

241

242 $$

243 \Delta y_t = \alpha + \beta y_{t-1} + \sum_{i=1}^p \delta_i \Delta y_{t-i} + \epsilon_t

244 $$

245

246 where:

247

248 - $y_t$ is the value of the time series at time $t$.

249 - $\Delta y_t$ is the first difference of $y_t$, which is defined as $\Delta y_t = y_t - y_{t-1}$.

250 - $\alpha$ is the constant term.

251 - $\beta$ is the coefficient on $y_{t-1}$.

252 - $\delta_i$ are the coefficients on the lagged differences of $y$.

253 - $\epsilon_t$ is the error term.

254

255 The ADF test involves testing the null hypothesis that $\beta = 0$, or equivalently, that the time series has a unit root. If $\beta$ is significantly different from $0$, then the null hypothesis can be rejected and the time series is considered stationary.

256

257 Here are the detailed steps for how to calculate the ADF test:

258

259 1. Collect your time series data and plot it to visually check for any trends, seasonal patterns, or other patterns that could make the data non-stationary. If you detect any such patterns, you will need to pre-process your data (e.g., detrending, deseasonalizing, etc.) to remove these effects.

260

261 1. Calculate the first differences of the time series, which is simply the difference between each observation and the previous observation. This step is performed to transform the original data into a stationary process. The first difference of $y_t$ is defined as $\Delta y_t = y_t - y_{t-1}$.

262

263 1. Estimate the parameters $\alpha$, $\beta$, and $\delta_i$ using the least squares method. This involves regressing $\Delta y_t$ on its lagged values, $y_{t-1}$, and the lagged differences of $y, \Delta y_{t-1}, \Delta y_{t-2}, \dots, \Delta y_{t-p}$, where $p$ is the number of lags to include in the model. The estimated equation is:

264

265 $$

266 \Delta y_t = \alpha + \beta y_{t-1} + \sum_{i=1}^p \delta_i \Delta y_{t-i} + \epsilon_t

267 $$

268

269 1. Calculate the test statistic, which is given by:

270

271 $$

272 ADF = \frac {\beta-1}{SE(\beta)}

273 $$

274

275 - where $SE(\beta)$ is the standard error of the coefficient on $y_{t-1}$.

276

277 The test statistic measures the number of standard errors by which $\beta$ deviates from $1$. If ADF is less than the critical values from the ADF distribution table, we can reject the null hypothesis and conclude that the time series is stationary.

278

279 1. Compare the test statistic to the critical values in the ADF distribution table to determine the level of significance. The critical values depend on the sample size, the level of significance, and the number of lags in the model.

280

281 1. Finally, interpret the results and draw conclusions about the stationarity of the time series. If the null hypothesis is rejected, then the time series is stationary and can be used for forecasting. If the null hypothesis is not rejected, then the time series is non-stationary and requires further pre-processing before it can be used for forecasting.

282

283 ??? note "Notes"

284 The null hypothesis of the Augmented Dickey-Fuller is that there is a unit root, with the alternative that there is no unit root. If the p-value is above a critical size, then we cannot reject that there is a unit root.

285

286 The p-values are obtained through regression surface approximation from MacKinnon 1994, but using the updated 2010 tables. If the p-value is close to significant, then the critical values should be used to judge whether to reject the null.

287

288 The `autolag` option and `maxlag` for it are described in Greene.

289

290 ??? success "Credit"

291 - All credit goes to the [`statsmodels`](https://www.statsmodels.org/stable/generated/statsmodels.tsa.stattools.adfuller.html) library.

292

293 ??? question "References"

294 - Baum, C.F. (2004). ZANDREWS: Stata module to calculate Zivot-Andrews unit root test in presence of structural break," Statistical Software Components S437301, Boston College Department of Economics, revised 2015.

295 - Schwert, G.W. (1989). Tests for unit roots: A Monte Carlo investigation. Journal of Business & Economic Statistics, 7: 147-159.

296 - Zivot, E., and Andrews, D.W.K. (1992). Further evidence on the great crash, the oil-price shock, and the unit-root hypothesis. Journal of Business & Economic Studies, 10: 251-270.

297

298 ??? tip "See Also"

299 - [`statsmodels.tsa.stattools.adfuller`](https://www.statsmodels.org/stable/generated/statsmodels.tsa.stattools.adfuller.html): Augmented Dickey-Fuller unit root test.

300 - [`statsmodels.tsa.stattools.kpss`](https://www.statsmodels.org/stable/generated/statsmodels.tsa.stattools.kpss.html): Kwiatkowski-Phillips-Schmidt-Shin stationarity test.

301 - [`statsmodels.tsa.stattools.range_unit_root_test`](https://www.statsmodels.org/stable/generated/statsmodels.tsa.stattools.range_unit_root_test.html): Range Unit-Root test.

302 - [`statsmodels.tsa.stattools.zivot_andrews`](https://www.statsmodels.org/stable/generated/statsmodels.tsa.stattools.zivot_andrews.html): Zivot-Andrews structural break test.

303 - [`pmdarima.arima.PPTest`](https://alkaline-ml.com/pmdarima/modules/generated/pmdarima.arima.PPTest.html): Phillips-Perron unit root test.

304 - [`arch.unitroot.DFGLS`](https://arch.readthedocs.io/en/latest/unitroot/generated/arch.unitroot.DFGLS.html): Elliot, Rothenberg and Stock's GLS-detrended Dickey-Fuller.

305 - [`arch.unitroot.VarianceRatio`](https://arch.readthedocs.io/en/latest/unitroot/generated/arch.unitroot.VarianceRatio.html): Variance Ratio test of a random walk.

306 - [`ts_stat_tests.stationarity.algorithms.adf`][ts_stat_tests.stationarity.algorithms.adf]: Augmented Dickey-Fuller unit root test.

307 - [`ts_stat_tests.stationarity.algorithms.kpss`][ts_stat_tests.stationarity.algorithms.kpss]: Kwiatkowski-Phillips-Schmidt-Shin stationarity test.

308 - [`ts_stat_tests.stationarity.algorithms.rur`][ts_stat_tests.stationarity.algorithms.rur]: Range Unit-Root test of stationarity.

309 - [`ts_stat_tests.stationarity.algorithms.za`][ts_stat_tests.stationarity.algorithms.za]: Zivot-Andrews structural break unit root test.

310 - [`ts_stat_tests.stationarity.algorithms.pp`][ts_stat_tests.stationarity.algorithms.pp]: Phillips-Perron unit root test.

311 - [`ts_stat_tests.stationarity.algorithms.ers`][ts_stat_tests.stationarity.algorithms.ers]: Elliot, Rothenberg and Stock's GLS-detrended Dickey-Fuller test.

312 - [`ts_stat_tests.stationarity.algorithms.vr`][ts_stat_tests.stationarity.algorithms.vr]: Variance Ratio test of a random walk.

313 """

314 res: Any = _adfuller( # Using `Any` to avoid ty issues with statsmodels stubs

315 x=x,

316 maxlag=maxlag,

317 regression=regression,

318 autolag=autolag, # type: ignore[arg-type] # statsmodels stubs are often missing `None`

319 store=store,

320 regresults=regresults,

321 )

322

323 if store:

324 # returns (stat, pval, crit, store)

325 return float(res[0]), float(res[1]), dict(res[2]), res[3]

326

327 if autolag is None:

328 # returns (stat, pval, lags, nobs, crit)

329 return (

330 float(res[0]),

331 float(res[1]),

332 int(res[2]),

333 int(res[3]),

334 dict(res[4]),

335 )

336

337 # returns (stat, pval, lags, nobs, crit, icbest)

338 return (

339 float(res[0]),

340 float(res[1]),

341 int(res[2]),

342 int(res[3]),

343 dict(res[4]),

344 float(res[5]),

345 )

346

347

348@overload

349def kpss(

350 x: ArrayLike,

351 regression: VALID_KPSS_REGRESSION_OPTIONS = "c",

352 nlags: Optional[Union[VALID_KPSS_NLAGS_OPTIONS, int]] = None,

353 *,

354 store: Literal[True],

355) -> tuple[float, float, int, dict, ResultsStore]: ...

356@overload

357def kpss(

358 x: ArrayLike,

359 regression: VALID_KPSS_REGRESSION_OPTIONS = "c",

360 nlags: Optional[Union[VALID_KPSS_NLAGS_OPTIONS, int]] = None,

361 *,

362 store: Literal[False] = False,

363) -> tuple[float, float, int, dict]: ...

364@typechecked

365def kpss(

366 x: ArrayLike,

367 regression: VALID_KPSS_REGRESSION_OPTIONS = "c",

368 nlags: Optional[Union[VALID_KPSS_NLAGS_OPTIONS, int]] = None,

369 *,

370 store: bool = False,

371) -> Union[

372 tuple[float, float, int, dict, ResultsStore],

373 tuple[float, float, int, dict],

374]:

375 r"""

376 !!! note "Summary"

377 Computes the Kwiatkowski-Phillips-Schmidt-Shin (KPSS) test for the null hypothesis that `x` is level or trend stationary.

378

379 ???+ abstract "Details"

380

381 The Kwiatkowski-Phillips-Schmidt-Shin (KPSS) test is another statistical test used to determine whether a time series is stationary or not. The KPSS test is the opposite of the Augmented Dickey-Fuller (ADF) test, which tests for the presence of a unit root in the time series.

382

383 The KPSS test involves regressing the time series on a constant and a time trend. The null hypothesis of the test is that the time series is stationary. The alternative hypothesis is that the time series has a unit root, which means that it is non-stationary.

384

385 The test statistic is calculated by taking the sum of the squared residuals of the regression. If the test statistic is greater than a critical value at a given significance level, typically 0.05, then we reject the null hypothesis and conclude that the time series is non-stationary. If the test statistic is less than the critical value, then we fail to reject the null hypothesis and conclude that the time series is stationary.

386

387 In practical terms, if a time series is found to be non-stationary by the KPSS test, one can apply differencing to the time series until it becomes stationary. This involves taking the difference between consecutive observations and potentially repeating this process until the time series is stationary.

388

389 Overall, the ADF and KPSS tests are both important tools in time series analysis and forecasting, as they help identify whether a time series is stationary or non-stationary, which can have implications for the choice of forecasting models and methods.

390

391 Params:

392 x (ArrayLike):

393 The data series to test.

394 regression (VALID_KPSS_REGRESSION_OPTIONS, optional):

395 The null hypothesis for the KPSS test.

396

397 - `"c"`: The data is stationary around a constant (default).

398 - `"ct"`: The data is stationary around a trend.

399

400 Defaults to `"c"`.

401 nlags (Optional[Union[VALID_KPSS_NLAGS_OPTIONS, int]], optional):

402 Indicates the number of lags to be used.

403

404 - If `"auto"` (default), `lags` is calculated using the data-dependent method of Hobijn et al. (1998). See also Andrews (1991), Newey & West (1994), and Schwert (1989).

405 - If set to `"legacy"`, uses $int(12 \\times (\\frac{n}{100})^{\\frac{1}{4}})$, as outlined in Schwert (1989).

406

407 Defaults to `None`.

408 store (bool, optional):

409 If `True`, then a result instance is returned additionally to the KPSS statistic.

410 Defaults to `False`.

411

412 Returns:

413 (Union[tuple[float, float, int, dict, ResultsStore], tuple[float, float, int, dict]]):

414 Returns a tuple containing:

415 - `stat` (float): The KPSS test statistic.

416 - `pvalue` (float): The p-value of the test.

417 - `lags` (int): The truncation lag parameter.

418 - `crit` (dict): The critical values at 10%, 5%, 2.5%, and 1%.

419 - `resstore` (Optional[ResultsStore]): Result instance (if `store` is `True`).

420

421 ???+ example "Examples"

422

423 ```pycon {.py .python linenums="1" title="Setup"}

424 >>> from ts_stat_tests.stationarity.algorithms import kpss

425 >>> from ts_stat_tests.utils.data import data_airline, data_normal

426 >>> normal = data_normal

427 >>> airline = data_airline.values

428

429 ```

430

431 ```pycon {.py .python linenums="1" title="Example 1: Stationary Series"}

432 >>> stat, pvalue, lags, crit = kpss(x=normal)

433 >>> print(f"KPSS statistic: {stat:.4f}")

434 KPSS statistic: 0.0858

435 >>> print(f"p-value: {pvalue:.4f}")

436 p-value: 0.1000

437

438 ```

439

440 ```pycon {.py .python linenums="1" title="Example 2: Airline Passengers Data"}

441 >>> stat, pvalue, lags, crit = kpss(x=airline)

442 >>> print(f"p-value: {pvalue:.4f}")

443 p-value: 0.0100

444

445 ```

446

447 ??? equation "Calculation"

448

449 The mathematical equation for the KPSS test for stationarity in time series forecasting is:

450

451 $$

452 y_t = \mu_t + \epsilon_t

453 $$

454

455 where:

456

457 - $y_t$ is the value of the time series at time $t$.

458 - $\mu_t$ is the trend component of the time series.

459 - $\epsilon_t$ is the error term.

460

461 The KPSS test involves testing the null hypothesis that the time series is trend stationary, which means that the trend component of the time series is stationary over time. If the null hypothesis is rejected, then the time series is non-stationary and requires further pre-processing before it can be used for forecasting.

462

463 Here are the detailed steps for how to calculate the KPSS test:

464

465 1. Collect your time series data and plot it to visually check for any trends, seasonal patterns, or other patterns that could make the data non-stationary. If you detect any such patterns, you will need to pre-process your data (e.g., detrending, deseasonalizing, etc.) to remove these effects.

466

467 1. Divide your time series data into multiple overlapping windows of equal size. The length of each window depends on the length of your time series and the level of detail you want to capture.

468

469 1. Calculate the trend component $\mu_t$ for each window using a trend estimation method. There are several methods for estimating the trend component, such as the Hodrick-Prescott filter, the Christiano-Fitzgerald filter, or simple linear regression. The choice of method depends on the characteristics of your data and the level of accuracy you want to achieve.

470

471 1. Calculate the residual series $\epsilon_t$ by subtracting the trend component from the original time series:

472

473 $$

474 \epsilon_t = y_t - \mu_t

475 $$

476

477 1. Estimate the variance of the residual series using a suitable estimator, such as the Newey-West estimator or the Bartlett kernel estimator. This step is necessary to correct for any serial correlation in the residual series.

478

479 1. Calculate the test statistic, which is given by:

480

481 $$

482 KPSS = T \times \sum_{t=1}^T \frac {S_t^2} {\sigma^2}

483 $$

484

485 where:

486

487 - $T$ is the number of observations in the time series.

488 - $S_t$ is the cumulative sum of the residual series up to time $t$, i.e., $S_t = \sum_{i=1}^t \epsilon_i$.

489 - $\sigma^2$ is the estimated variance of the residual series.

490

491 The test statistic measures the strength of the trend component relative to the residual series. If KPSS is greater than the critical values from the KPSS distribution table, we can reject the null hypothesis and conclude that the time series is non-stationary.

492

493 1. Finally, interpret the results and draw conclusions about the stationarity of the time series. If the null hypothesis is rejected, then the time series is non-stationary and requires further pre-processing before it can be used for forecasting. If the null hypothesis is not rejected, then the time series is trend stationary and can be used for forecasting.

494

495 ??? note "Notes"

496 To estimate $\sigma^2$ the Newey-West estimator is used. If `lags` is `"legacy"`, the truncation lag parameter is set to $int(12 \times (\frac{n}{100})^{\frac{1}{4}})$, as outlined in Schwert (1989). The p-values are interpolated from Table 1 of Kwiatkowski et al. (1992). If the computed statistic is outside the table of critical values, then a warning message is generated.

497

498 Missing values are not handled.

499

500 ??? success "Credit"

501 - All credit goes to the [`statsmodels`](https://www.statsmodels.org/stable/generated/statsmodels.tsa.stattools.kpss.html) library.

502

503 ??? question "References"

504 - Andrews, D.W.K. (1991). Heteroskedasticity and autocorrelation consistent covariance matrix estimation. Econometrica, 59: 817-858.

505 - Hobijn, B., Frances, B.H., & Ooms, M. (2004). Generalizations of the KPSS-test for stationarity. Statistica Neerlandica, 52: 483-502.

506 - Kwiatkowski, D., Phillips, P.C.B., Schmidt, P., & Shin, Y. (1992). Testing the null hypothesis of stationarity against the alternative of a unit root. Journal of Econometrics, 54: 159-178.

507 - Newey, W.K., & West, K.D. (1994). Automatic lag selection in covariance matrix estimation. Review of Economic Studies, 61: 631-653.

508 - Schwert, G. W. (1989). Tests for unit roots: A Monte Carlo investigation. Journal of Business and Economic Statistics, 7 (2): 147-159.

509

510 ??? tip "See Also"

511 - [`statsmodels.tsa.stattools.adfuller`](https://www.statsmodels.org/stable/generated/statsmodels.tsa.stattools.adfuller.html): Augmented Dickey-Fuller unit root test.

512 - [`statsmodels.tsa.stattools.kpss`](https://www.statsmodels.org/stable/generated/statsmodels.tsa.stattools.kpss.html): Kwiatkowski-Phillips-Schmidt-Shin stationarity test.

513 - [`statsmodels.tsa.stattools.range_unit_root_test`](https://www.statsmodels.org/stable/generated/statsmodels.tsa.stattools.range_unit_root_test.html): Range Unit-Root test.

514 - [`statsmodels.tsa.stattools.zivot_andrews`](https://www.statsmodels.org/stable/generated/statsmodels.tsa.stattools.zivot_andrews.html): Zivot-Andrews structural break test.

515 - [`pmdarima.arima.PPTest`](https://alkaline-ml.com/pmdarima/modules/generated/pmdarima.arima.PPTest.html): Phillips-Perron unit root test.

516 - [`arch.unitroot.DFGLS`](https://arch.readthedocs.io/en/latest/unitroot/generated/arch.unitroot.DFGLS.html): Elliot, Rothenberg and Stock's GLS-detrended Dickey-Fuller.

517 - [`arch.unitroot.VarianceRatio`](https://arch.readthedocs.io/en/latest/unitroot/generated/arch.unitroot.VarianceRatio.html): Variance Ratio test of a random walk.

518 - [`ts_stat_tests.stationarity.algorithms.adf`][ts_stat_tests.stationarity.algorithms.adf]: Augmented Dickey-Fuller unit root test.

519 - [`ts_stat_tests.stationarity.algorithms.kpss`][ts_stat_tests.stationarity.algorithms.kpss]: Kwiatkowski-Phillips-Schmidt-Shin stationarity test.

520 - [`ts_stat_tests.stationarity.algorithms.rur`][ts_stat_tests.stationarity.algorithms.rur]: Range Unit-Root test of stationarity.

521 - [`ts_stat_tests.stationarity.algorithms.za`][ts_stat_tests.stationarity.algorithms.za]: Zivot-Andrews structural break unit root test.

522 - [`ts_stat_tests.stationarity.algorithms.pp`][ts_stat_tests.stationarity.algorithms.pp]: Phillips-Perron unit root test.

523 - [`ts_stat_tests.stationarity.algorithms.ers`][ts_stat_tests.stationarity.algorithms.ers]: Elliot, Rothenberg and Stock's GLS-detrended Dickey-Fuller test.

524 - [`ts_stat_tests.stationarity.algorithms.vr`][ts_stat_tests.stationarity.algorithms.vr]: Variance Ratio test of a random walk.

525 """

526 _nlags: Union[VALID_KPSS_NLAGS_OPTIONS, int] = nlags if nlags is not None else "auto"

527 return _kpss(x=x, regression=regression, nlags=_nlags, store=store)

528

529

530@overload

531def rur(x: ArrayLike, *, store: Literal[True]) -> tuple[float, float, dict, ResultsStore]: ...

532@overload

533def rur(x: ArrayLike, *, store: Literal[False] = False) -> tuple[float, float, dict]: ...

534@typechecked

535def rur(x: ArrayLike, *, store: bool = False) -> Union[

536 tuple[float, float, dict, ResultsStore],

537 tuple[float, float, dict],

538]:

539 r"""

540 !!! note "Summary"

541 Computes the Range Unit-Root (RUR) test for the null hypothesis that x is stationary.

542

543 ???+ abstract "Details"

544

545 The Range Unit-Root (RUR) test is a statistical test used to determine whether a time series is stationary or not. It is based on the range of the time series and does not require any knowledge of the underlying stochastic process.

546

547 The RUR test involves dividing the time series into non-overlapping windows of a fixed size and calculating the range of each window. Then, the range of the entire time series is calculated. If the time series is stationary, the range of the entire time series should be proportional to the square root of the window size. If the time series is non-stationary, the range of the entire time series will grow with the window size.

548

549 The null hypothesis of the RUR test is that the time series is non-stationary (unit root). The alternative hypothesis is that the time series is stationary. If the test statistic is less than a critical value at a given significance level, typically 0.05, then we reject the null hypothesis and conclude that the time series is stationary. If the test statistic is greater than the critical value, then we fail to reject the null hypothesis and conclude that the time series is non-stationary.

550

551 In practical terms, if a time series is found to be non-stationary by the RUR test, one can apply differencing to the time series until it becomes stationary. This involves taking the difference between consecutive observations and potentially repeating this process until the time series is stationary.

552

553 The RUR test is a simple and computationally efficient test for stationarity, but it may not be as powerful as other unit root tests in detecting non-stationarity in some cases. It is important to use multiple tests to determine the stationarity of a time series, as no single test is perfect in all situations.

554

555 Params:

556 x (ArrayLike):

557 The data series to test.

558 store (bool, optional):

559 If `True`, then a result instance is returned additionally to the RUR statistic.

560 Defaults to `False`.

561

562 Returns:

563 (Union[tuple[float, float, dict, ResultsStore], tuple[float, float, dict]]):

564 Returns a tuple containing:

565 - `stat` (float): The RUR test statistic.

566 - `pvalue` (float): The p-value of the test.

567 - `crit` (dict): The critical values at 10%, 5%, 2.5%, and 1%.

568 - `resstore` (Optional[ResultsStore]): Result instance (if `store` is `True`).

569

570 ???+ example "Examples"

571

572 ```pycon {.py .python linenums="1" title="Setup"}

573 >>> from ts_stat_tests.utils.data import data_airline, data_normal, data_trend, data_sine

574 >>> from ts_stat_tests.stationarity.algorithms import rur

575 >>> normal = data_normal

576 >>> trend = data_trend

577 >>> seasonal = data_sine

578 >>> airline = data_airline.values

579

580 ```

581

582 ```pycon {.py .python linenums="1" title="Example 1: Stationary Series"}

583 >>> stat, pvalue, crit = rur(x=normal)

584 >>> print(f"RUR statistic: {stat:.4f}")

585 RUR statistic: 0.3479

586 >>> print(f"p-value: {pvalue:.4f}")

587 p-value: 0.0100

588

589 ```

590

591 ```pycon {.py .python linenums="1" title="Example 2: Trend-Stationary Series"}

592 >>> stat, pvalue, crit = rur(x=trend)

593 >>> print(f"RUR statistic: {stat:.4f}")

594 RUR statistic: 31.5912

595 >>> print(f"p-value: {pvalue:.4f}")

596 p-value: 0.9500

597

598 ```

599

600 ```pycon {.py .python linenums="1" title="Example 3: Seasonal Series"}

601 >>> stat, pvalue, crit = rur(x=seasonal)

602 >>> print(f"RUR statistic: {stat:.4f}")

603 RUR statistic: 0.9129

604 >>> print(f"p-value: {pvalue:.04f}")

605 p-value: 0.0100

606

607 ```

608

609 ```pycon {.py .python linenums="1" title="Example 4: Real-World Time Series"}

610 >>> stat, pvalue, crit = rur(x=airline)

611 >>> print(f"RUR statistic: {stat:.4f}")

612 RUR statistic: 2.3333

613 >>> print(f"p-value: {pvalue:.4f}")

614 p-value: 0.9000

615

616 ```

617

618 ??? equation "Calculation"

619

620 The mathematical equation for the RUR test is:

621

622 $$

623 y_t = \rho y_{t-1} + \epsilon_t

624 $$

625

626 where:

627

628 - $y_t$ is the value of the time series at time $t$.

629 - $\rho$ is the parameter of the unit root process.

630 - $y_{t-1}$ is the value of the time series at time $t-1$.

631 - $\epsilon_t$ is a stationary error term with mean zero and constant variance.

632

633 The null hypothesis of the RUR test is that the time series is stationary, and the alternative hypothesis is that the time series is non-stationary with a unit root.

634

635 Here are the detailed steps for how to calculate the RUR test:

636

637 1. Collect your time series data and plot it to visually check for any trends, seasonal patterns, or other patterns that could make the data non-stationary. If you detect any such patterns, you will need to pre-process your data (e.g., detrending, deseasonalizing, etc.) to remove these effects.

638

639 1. Estimate the parameter $\rho$ using the ordinary least squares method. This involves regressing $y_t$ on $y_{t-1}$. The estimated equation is:

640

641 $$

642 y_t = \alpha + \rho y_{t-1} + \epsilon_t

643 $$

644

645 where:

646

647 - $\alpha$ is the intercept.

648 - $\epsilon_t$ is the error term.

649

650 1. Calculate the range of the time series, which is the difference between the maximum and minimum values of the time series:

651

652 $$

653 R = \max(y_t) - \min(y_t)

654 $$

655

656 1. Calculate the expected range of the time series under the null hypothesis of stationarity, which is given by:

657

658 $$

659 E(R) = \frac {T - 1} {2 \sqrt{T}}

660 $$

661

662 where:

663

664 - $T$ is the sample size.

665

666 1. Calculate the test statistic, which is given by:

667

668 $$

669 RUR = \frac {R - E(R)} {E(R)}

670 $$

671

672 1. Compare the test statistic to the critical values in the RUR distribution table to determine the level of significance. The critical values depend on the sample size and the level of significance.

673

674 1. Finally, interpret the results and draw conclusions about the stationarity of the time series. If the null hypothesis is rejected, then the time series is non-stationary with a unit root. If the null hypothesis is not rejected, then the time series is stationary.

675

676 In practice, the RUR test is often conducted using software packages such as R, Python, or MATLAB, which automate the estimation of parameters and calculation of the test statistic.

677

678 ??? note "Notes"

679 The p-values are interpolated from Table 1 of Aparicio et al. (2006). If the computed statistic is outside the table of critical values, then a warning message is generated.

680

681 Missing values are not handled.

682

683 !!! success "Credit"

684 - All credit goes to the [`statsmodels`](https://www.statsmodels.org/stable/generated/statsmodels.tsa.stattools.range_unit_root_test.html) library.

685

686 ??? question "References"

687 - Aparicio, F., Escribano A., Sipols, A.E. (2006). Range Unit-Root (RUR) tests: robust against nonlinearities, error distributions, structural breaks and outliers. Journal of Time Series Analysis, 27 (4): 545-576.

688

689 ??? tip "See Also"

690 - [`statsmodels.tsa.stattools.adfuller`](https://www.statsmodels.org/stable/generated/statsmodels.tsa.stattools.adfuller.html): Augmented Dickey-Fuller unit root test.

691 - [`statsmodels.tsa.stattools.kpss`](https://www.statsmodels.org/stable/generated/statsmodels.tsa.stattools.kpss.html): Kwiatkowski-Phillips-Schmidt-Shin stationarity test.

692 - [`statsmodels.tsa.stattools.range_unit_root_test`](https://www.statsmodels.org/stable/generated/statsmodels.tsa.stattools.range_unit_root_test.html): Range Unit-Root test.

693 - [`statsmodels.tsa.stattools.zivot_andrews`](https://www.statsmodels.org/stable/generated/statsmodels.tsa.stattools.zivot_andrews.html): Zivot-Andrews structural break test.

694 - [`pmdarima.arima.PPTest`](https://alkaline-ml.com/pmdarima/modules/generated/pmdarima.arima.PPTest.html): Phillips-Perron unit root test.

695 - [`arch.unitroot.DFGLS`](https://arch.readthedocs.io/en/latest/unitroot/generated/arch.unitroot.DFGLS.html): Elliot, Rothenberg and Stock's GLS-detrended Dickey-Fuller.

696 - [`arch.unitroot.VarianceRatio`](https://arch.readthedocs.io/en/latest/unitroot/generated/arch.unitroot.VarianceRatio.html): Variance Ratio test of a random walk.

697 - [`ts_stat_tests.stationarity.algorithms.adf`][ts_stat_tests.stationarity.algorithms.adf]: Augmented Dickey-Fuller unit root test.

698 - [`ts_stat_tests.stationarity.algorithms.kpss`][ts_stat_tests.stationarity.algorithms.kpss]: Kwiatkowski-Phillips-Schmidt-Shin stationarity test.

699 - [`ts_stat_tests.stationarity.algorithms.rur`][ts_stat_tests.stationarity.algorithms.rur]: Range Unit-Root test of stationarity.

700 - [`ts_stat_tests.stationarity.algorithms.za`][ts_stat_tests.stationarity.algorithms.za]: Zivot-Andrews structural break unit root test.

701 - [`ts_stat_tests.stationarity.algorithms.pp`][ts_stat_tests.stationarity.algorithms.pp]: Phillips-Perron unit root test.

702 - [`ts_stat_tests.stationarity.algorithms.ers`][ts_stat_tests.stationarity.algorithms.ers]: Elliot, Rothenberg and Stock's GLS-detrended Dickey-Fuller test.

703 - [`ts_stat_tests.stationarity.algorithms.vr`][ts_stat_tests.stationarity.algorithms.vr]: Variance Ratio test of a random walk.

704 """

705 return _rur(x=x, store=store)

706

707

708@typechecked

709def za(

710 x: ArrayLike,

711 trim: float = 0.15,

712 maxlag: Optional[int] = None,

713 regression: VALID_ZA_REGRESSION_OPTIONS = "c",

714 autolag: Optional[VALID_ZA_AUTOLAG_OPTIONS] = "AIC",

715) -> tuple[float, float, dict, int, int]:

716 r"""

717 !!! note "Summary"

718 The Zivot-Andrews (ZA) test tests for a unit root in a univariate process in the presence of serial correlation and a single structural break.

719

720 ???+ abstract "Details"

721 The Zivot-Andrews (ZA) test is a statistical test used to determine whether a time series is stationary or not in the presence of structural breaks. Structural breaks refer to significant changes in the underlying stochastic process of the time series, which can cause non-stationarity.

722

723 The ZA test involves running a regression of the time series on a constant and a linear time trend, and testing whether the residuals of the regression are stationary or not. The null hypothesis of the test is that the time series is stationary with a single break point, while the alternative hypothesis is that the time series is non-stationary with a single break point.

724

725 The test statistic is calculated by first estimating the break point using a likelihood ratio test. Then, the test statistic is calculated based on the estimated break point and the residuals of the regression. If the test statistic is greater than a critical value at a given significance level, typically 0.05, then we reject the null hypothesis and conclude that the time series is non-stationary with a structural break. If the test statistic is less than the critical value, then we fail to reject the null hypothesis and conclude that the time series is stationary with a structural break.

726

727 In practical terms, if a time series is found to be non-stationary with a structural break by the ZA test, one can apply methods to account for the structural break, such as including dummy variables in the regression or using time series models that allow for structural breaks.

728

729 Overall, the ZA test is a useful tool in time series analysis and forecasting when there is a suspicion of structural breaks in the data. However, it is important to note that the test may not detect multiple break points or breaks that are not well-separated in time.

730

731 Params:

732 x (ArrayLike):

733 The data series to test.

734 trim (float):

735 The percentage of series at begin/end to exclude.

736 Default: `0.15`

737 maxlag (Optional[int]):

738 The maximum lag which is included in test.

739 Default: `None`

740 regression (VALID_ZA_REGRESSION_OPTIONS):

741 Constant and trend order to include in regression.

742

743 - `"c"`: constant only (default).

744 - `"t"`: trend only.

745 - `"ct"`: constant and trend.

746

747 Default: `"c"`

748 autolag (Optional[VALID_ZA_AUTOLAG_OPTIONS]):

749 The method to select the lag length.

750

751 - If `None`, then `maxlag` lags are used.

752 - If `"AIC"` (default) or `"BIC"`, then the number of lags is chosen.

753

754 Default: `"AIC"`

755

756 Returns:

757 (tuple[float, float, dict, int, int]):

758 Returns a tuple containing:

759 - `zastat` (float): The test statistic.

760 - `pvalue` (float): The p-value.

761 - `cvdict` (dict): Critical values at the $1\%$, $5\%$, and $10\%$ levels.

762 - `baselag` (int): Lags used for period regressions.

763 - `pbidx` (int): Break period index.

764

765 ???+ example "Examples"

766

767 ```pycon {.py .python linenums="1" title="Setup"}

768 >>> from ts_stat_tests.utils.data import data_airline, data_normal, data_noise

769 >>> from ts_stat_tests.stationarity.algorithms import za

770 >>> normal = data_normal

771 >>> noise = data_noise

772 >>> airline = data_airline.values

773

774 ```

775

776 ```pycon {.py .python linenums="1" title="Example 1: Stationary Series"}

777 >>> stat, pvalue, crit, lags, break_idx = za(x=normal)

778 >>> print(f"ZA statistic: {stat:.4f}")

779 ZA statistic: -30.8800

780 >>> print(f"p-value: {pvalue:.4e}")

781 p-value: 1.0000e-05

782

783 ```

784

785 ```pycon {.py .python linenums="1" title="Example 2: Noisy Series"}

786 >>> stat, pvalue, crit, lags, break_idx = za(x=noise)

787 >>> print(f"ZA statistic: {stat:.4f}")

788 ZA statistic: -32.4316

789 >>> print(f"p-value: {pvalue:.4e}")

790 p-value: 1.0000e-05

791

792 ```

793

794 ```pycon {.py .python linenums="1" title="Example 3: Real-World Time Series"}

795 >>> stat, pvalue, crit, lags, break_idx = za(x=airline)

796 >>> print(f"ZA statistic: {stat:.4f}")

797 ZA statistic: -3.6508

798 >>> print(f"p-value: {pvalue:.4f}")

799 p-value: 0.5808

800

801 ```

802

803 ??? equation "Calculation"

804

805 The mathematical equation for the Zivot-Andrews test is:

806

807 $$

808 y_t = \alpha + \beta t + \gamma y_{t-1} + \delta_1 D_t + \delta_2 t D_t + \epsilon_t

809 $$

810

811 where:

812

813 - $y_t$ is the value of the time series at time $t$.

814 - $\alpha$ is the intercept.

815 - $\beta$ is the slope coefficient of the time trend.

816 - $\gamma$ is the coefficient of the lagged dependent variable.

817 - $D_t$ is a dummy variable that takes a value of 1 after the suspected structural break point, and 0 otherwise.

818 - $\delta_1$ and $\delta_2$ are the coefficients of the dummy variable and the interaction term of the dummy variable and time trend, respectively.

819 - $\epsilon_t$ is a stationary error term with mean zero and constant variance.

820

821 The null hypothesis of the Zivot-Andrews test is that the time series is non-stationary, and the alternative hypothesis is that the time series is stationary with a single structural break.

822

823 Here are the detailed steps for how to calculate the Zivot-Andrews test:

824

825 1. Collect your time series data and plot it to visually check for any trends, seasonal patterns, or other patterns that could make the data non-stationary. If you detect any such patterns, you will need to pre-process your data (e.g., detrending, deseasonalizing, etc.) to remove these effects.

826

827 1. Estimate the parameters of the model using the least squares method. This involves regressing $y_t$ on $t$, $y_{t-1}$, $D_t$, and $t D_t$. The estimated equation is:

828

829 $$

830 y_t = \alpha + \beta t + \gamma y_{t-1} + \delta_1 D_t + \delta_2 t D_t + \epsilon_t

831 $$

832

833 1. Perform a unit root test on the residuals to check for stationarity. The most commonly used unit root tests for this purpose are the Augmented Dickey-Fuller (ADF) test and the Phillips-Perron (PP) test.

834

835 1. Calculate the test statistic, which is based on the largest root of the following equation:

836

837 $$

838 \Delta y_t = \alpha + \beta t + \gamma y_{t-1} + \delta_1 D_t + \delta_2 t D_t + \epsilon_t

839 $$

840

841 where:

842

843 - $\Delta$ is the first difference operator.

844

845 1. Determine the critical values of the test statistic from the Zivot-Andrews distribution table. The critical values depend on the sample size, the level of significance, and the number of lagged dependent variables in the model.

846

847 1. Finally, interpret the results and draw conclusions about the stationarity of the time series. If the null hypothesis is rejected, then the time series is stationary with a structural break. If the null hypothesis is not rejected, then the time series is non-stationary and may require further processing to make it stationary.

848

849 In practice, the Zivot-Andrews test is often conducted using software packages such as R, Python, or MATLAB, which automate the estimation of parameters and calculation of the test statistic.

850

851 ??? note "Notes"

852 H0 = unit root with a single structural break

853

854 Algorithm follows Baum (2004/2015) approximation to original Zivot-Andrews method. Rather than performing an autolag regression at each candidate break period (as per the original paper), a single autolag regression is run up-front on the base model (constant + trend with no dummies) to determine the best lag length. This lag length is then used for all subsequent break-period regressions. This results in significant run time reduction but also slightly more pessimistic test statistics than the original Zivot-Andrews method, although no attempt has been made to characterize the size/power trade-off.

855

856 ??? success "Credit"

857 - All credit goes to the [`statsmodels`](https://www.statsmodels.org/stable/generated/statsmodels.tsa.stattools.zivot_andrews.html) library.

858

859 ??? question "References"

860 - Baum, C.F. (2004). ZANDREWS: Stata module to calculate Zivot-Andrews unit root test in presence of structural break," Statistical Software Components S437301, Boston College Department of Economics, revised 2015.

861 - Schwert, G.W. (1989). Tests for unit roots: A Monte Carlo investigation. Journal of Business & Economic Statistics, 7: 147-159.

862 - Zivot, E., and Andrews, D.W.K. (1992). Further evidence on the great crash, the oil-price shock, and the unit-root hypothesis. Journal of Business & Economic Studies, 10: 251-270.

863

864 ??? tip "See Also"

865 - [`statsmodels.tsa.stattools.adfuller`](https://www.statsmodels.org/stable/generated/statsmodels.tsa.stattools.adfuller.html): Augmented Dickey-Fuller unit root test.

866 - [`statsmodels.tsa.stattools.kpss`](https://www.statsmodels.org/stable/generated/statsmodels.tsa.stattools.kpss.html): Kwiatkowski-Phillips-Schmidt-Shin stationarity test.

867 - [`statsmodels.tsa.stattools.range_unit_root_test`](https://www.statsmodels.org/stable/generated/statsmodels.tsa.stattools.range_unit_root_test.html): Range Unit-Root test.

868 - [`statsmodels.tsa.stattools.zivot_andrews`](https://www.statsmodels.org/stable/generated/statsmodels.tsa.stattools.zivot_andrews.html): Zivot-Andrews structural break test.

869 - [`pmdarima.arima.PPTest`](https://alkaline-ml.com/pmdarima/modules/generated/pmdarima.arima.PPTest.html): Phillips-Perron unit root test.

870 - [`arch.unitroot.DFGLS`](https://arch.readthedocs.io/en/latest/unitroot/generated/arch.unitroot.DFGLS.html): Elliot, Rothenberg and Stock's GLS-detrended Dickey-Fuller.

871 - [`arch.unitroot.VarianceRatio`](https://arch.readthedocs.io/en/latest/unitroot/generated/arch.unitroot.VarianceRatio.html): Variance Ratio test of a random walk.

872 - [`ts_stat_tests.stationarity.algorithms.adf`][ts_stat_tests.stationarity.algorithms.adf]: Augmented Dickey-Fuller unit root test.

873 - [`ts_stat_tests.stationarity.algorithms.kpss`][ts_stat_tests.stationarity.algorithms.kpss]: Kwiatkowski-Phillips-Schmidt-Shin stationarity test.

874 - [`ts_stat_tests.stationarity.algorithms.rur`][ts_stat_tests.stationarity.algorithms.rur]: Range Unit-Root test of stationarity.

875 - [`ts_stat_tests.stationarity.algorithms.za`][ts_stat_tests.stationarity.algorithms.za]: Zivot-Andrews structural break unit root test.

876 - [`ts_stat_tests.stationarity.algorithms.pp`][ts_stat_tests.stationarity.algorithms.pp]: Phillips-Perron unit root test.

877 - [`ts_stat_tests.stationarity.algorithms.ers`][ts_stat_tests.stationarity.algorithms.ers]: Elliot, Rothenberg and Stock's GLS-detrended Dickey-Fuller test.

878 - [`ts_stat_tests.stationarity.algorithms.vr`][ts_stat_tests.stationarity.algorithms.vr]: Variance Ratio test of a random walk.

879 """

880 res: Any = _za(

881 x=x,

882 trim=trim,

883 maxlag=maxlag,

884 regression=regression,

885 autolag=autolag, # type: ignore[arg-type] # statsmodels stubs are often missing None

886 )

887 return (

888 float(res[0]),

889 float(res[1]),

890 dict(res[2]),

891 int(res[3]),

892 int(res[4]),

893 )

894

895

896@typechecked

897def pp(

898 x: ArrayLike,

899 lags: Optional[int] = None,

900 trend: VALID_PP_TREND_OPTIONS = "c",

901 test_type: VALID_PP_TEST_TYPE_OPTIONS = "tau",

902) -> tuple[float, float, int, dict]:

903 r"""

904 !!! note "Summary"

905 Conduct a Phillips-Perron (PP) test for stationarity.

906

907 In statistics, the Phillips-Perron test (named after Peter C. B. Phillips and Pierre Perron) is a unit root test. It is used in time series analysis to test the null hypothesis that a time series is integrated of order $1$. It builds on the Dickey-Fuller test of the null hypothesis $p=0$.

908

909 ???+ abstract "Details"

910

911 The Phillips-Perron (PP) test is a statistical test used to determine whether a time series is stationary or not. It is similar to the Augmented Dickey-Fuller (ADF) test, but it has some advantages, especially in the presence of autocorrelation and heteroscedasticity.

912

913 The PP test involves regressing the time series on a constant and a linear time trend, and testing whether the residuals of the regression are stationary or not. The null hypothesis of the test is that the time series is non-stationary, while the alternative hypothesis is that the time series is stationary.

914

915 The test statistic is calculated by taking the sum of the squared residuals of the regression, which is adjusted for autocorrelation and heteroscedasticity. The PP test also accounts for the bias in the standard errors of the test statistic, which can lead to incorrect inference in small samples.

916

917 If the test statistic is less than a critical value at a given significance level, typically 0.05, then we reject the null hypothesis and conclude that the time series is stationary. If the test statistic is greater than the critical value, then we fail to reject the null hypothesis and conclude that the time series is non-stationary.

918

919 In practical terms, if a time series is found to be non-stationary by the PP test, one can apply differencing to the time series until it becomes stationary. This involves taking the difference between consecutive observations and potentially repeating this process until the time series is stationary.

920

921 Overall, the PP test is a powerful and robust test for stationarity, and it is widely used in time series analysis and forecasting. However, it is important to use multiple tests and diagnostic tools to determine the stationarity of a time series, as no single test is perfect in all situations.

922

923 Params:

924 x (ArrayLike):

925 The data series to test.

926 lags (Optional[int], optional):

927 The number of lags to use in the Newey-West estimator of the variance. If omitted or `None`, the lag length is selected automatically.

928 Defaults to `None`.

929 trend (VALID_PP_TREND_OPTIONS, optional):

930 The trend component to include in the test.

931

932 - `"n"`: No constant, no trend.

933 - `"c"`: Include a constant (default).

934 - `"ct"`: Include a constant and linear time trend.

935

936 Defaults to `"c"`.

937 test_type (VALID_PP_TEST_TYPE_OPTIONS, optional):

938 The type of test statistic to compute:

939

940 - `"tau"`: The t-statistic based on the augmented regression (default).

941 - `"rho"`: The normalized autocorrelation coefficient (also known as the $Z(\\alpha)$ test).

942

943 Defaults to `"tau"`.

944

945 Returns:

946 (tuple[float, float, int, dict]):

947 Returns a tuple containing:

948 - `stat` (float): The test statistic.

949 - `pvalue` (float): The p-value for the test statistic.

950 - `lags` (int): The number of lags used in the test.

951 - `crit` (dict): The critical values at 1%, 5%, and 10%.

952

953 ???+ example "Examples"

954

955 ```pycon {.py .python linenums="1" title="Setup"}

956 >>> from ts_stat_tests.stationarity.algorithms import pp

957 >>> from ts_stat_tests.utils.data import data_airline, data_normal, data_trend, data_sine

958 >>> normal = data_normal

959 >>> trend = data_trend

960 >>> seasonal = data_sine

961 >>> airline = data_airline.values

962

963 ```

964

965 ```pycon {.py .python linenums="1" title="Example 1: Stationary Series"}

966 >>> stat, pvalue, lags, crit = pp(x=normal)

967 >>> print(f"PP statistic: {stat:.4f}")

968 PP statistic: -30.7758

969 >>> print(f"p-value: {pvalue:.4f}")

970 p-value: 0.0000

971

972 ```

973

974 ```pycon {.py .python linenums="1" title="Example 2: Trend-Stationary Series"}

975 >>> stat, pvalue, lags, crit = pp(x=trend, trend="ct")

976 >>> print(f"p-value: {pvalue:.4f}")

977 p-value: 0.0000

978

979 ```

980

981 ```pycon {.py .python linenums="1" title="Example 3: Seasonal Series"}

982 >>> stat, pvalue, lags, crit = pp(x=seasonal)

983 >>> print(f"PP statistic: {stat:.4f}")

984 PP statistic: -8.0571

985 >>> print(f"p-value: {pvalue:.4f}")

986 p-value: 0.0000

987

988 ```

989

990 ```pycon {.py .python linenums="1" title="Example 4: Real-World Time Series"}

991 >>> stat, pvalue, lags, crit = pp(x=airline)

992 >>> print(f"PP statistic: {stat:.4f}")

993 PP statistic: -1.3511

994 >>> print(f"p-value: {pvalue:.4f}")

995 p-value: 0.6055

996

997 ```

998

999 ```pycon {.py .python linenums="1" title="Example 5: PP test with excessive lags (coverage check)"}

1000 >>> from ts_stat_tests.stationarity.algorithms import pp

1001 >>> from ts_stat_tests.utils.data import data_normal

1002 >>> # data_normal has 1000 observations. Force lags = 1000 to trigger adjustment.

1003 >>> res = pp(data_normal, lags=1000)

1004 >>> print(f"stat: {res[0]:.4f}, lags: {res[2]}")

1005 stat: -43.6895, lags: 998

1006

1007 ```

1008

1009 ??? equation "Calculation"

1010

1011 The Phillips-Perron (PP) test is a commonly used test for stationarity in time series forecasting. The mathematical equation for the PP test is:

1012

1013 $$

1014 y_t = \delta + \pi t + \rho y_{t-1} + \epsilon_t

1015 $$

1016

1017 where:

1018

1019 - $y_t$ is the value of the time series at time $t$.

1020 - $\delta$ is a constant term.

1021 - $\pi$ is a coefficient that captures the trend in the data.

1022 - $\rho$ is a coefficient that captures the autocorrelation in the data.

1023 - $y_{t-1}$ is the lagged value of the time series at time $t-1$.

1024 - $\epsilon_t$ is a stationary error term with mean zero and constant variance.

1025

1026 The PP test is based on the idea that if the time series is stationary, then the coefficient $\rho$ should be equal to zero. Therefore, the null hypothesis of the PP test is that the time series is stationary, and the alternative hypothesis is that the time series is non-stationary with a non-zero value of $\rho$.

1027

1028 Here are the detailed steps for how to calculate the PP test:

1029

1030 1. Collect your time series data and plot it to visually check for any trends, seasonal patterns, or other patterns that could make the data non-stationary. If you detect any such patterns, you will need to pre-process your data (e.g., detrending, deseasonalizing, etc.) to remove these effects.

1031

1032 1. Estimate the regression model by regressing $y_t$ on a constant, a linear trend, and the lagged value of $y_{t-1}$. The regression equation is:

1033

1034 $$

1035 y_t = \delta + \pi t + \rho y_{t-1} + \epsilon_t

1036 $$

1037

1038 1. Calculate the test statistic, which is based on the following equation:

1039

1040 $$

1041 z = \left( T^{-\frac{1}{2}} \right) \times \left( \sum_{t=1}^T \left( y_t - \delta - \pi t - \rho y_{t-1} \right) - \left( \frac{1}{T} \right) \times \sum_{t=1}^T \sum_{s=1}^T K \left( \frac{s-t}{h} \right) (y_s - \delta - \pi s - \rho y_{s-1}) \right)

1042 $$

1043

1044 where:

1045

1046 - $T$ is the sample size.

1047 - $K(\dots)$ is the kernel function, which determines the weight of each observation in the smoothed series. The choice of the kernel function depends on the degree of serial correlation in the data. Typically, a Gaussian kernel or a Bartlett kernel is used.

1048 - $h$ is the bandwidth parameter, which controls the degree of smoothing of the series. The optimal value of $h$ depends on the sample size and the noise level of the data.

1049

1050 1. Determine the critical values of the test statistic from the PP distribution table. The critical values depend on the sample size and the level of significance.

1051

1052 1. Finally, interpret the results and draw conclusions about the stationarity of the time series. If the null hypothesis is rejected, then the time series is non-stationary with a non-zero value of $\rho$. If the null hypothesis is not rejected, then the time series is stationary.

1053

1054 In practice, the PP test is often conducted using software packages such as R, Python, or MATLAB, which automate the estimation of the regression model and calculation of the test statistic.

1055

1056 ??? note "Notes"

1057 This test is generally used indirectly via the [`pmdarima.arima.ndiffs()`](https://alkaline-ml.com/pmdarima/modules/generated/pmdarima.arima.ndiffs.html) function, which computes the differencing term, `d`.

1058

1059 The R code allows for two types of tests: `'Z(alpha)'` and `'Z(t_alpha)'`. Since sklearn does not allow extraction of std errors from the linear model fit, `t_alpha` is much more difficult to achieve, so we do not allow that variant.

1060

1061 !!! success "Credit"

1062 - All credit goes to the [`arch`](https://arch.readthedocs.io/en/latest/unitroot/generated/arch.unitroot.PhillipsPerron.html) library.

1063

1064 ??? question "References"

1065 - Phillips, P. C. B.; Perron, P. (1988). Testing for a Unit Root in Time Series Regression. Biometrika. 75 (2): 335-346.

1066

1067 ??? tip "See Also"

1068 - [`statsmodels.tsa.stattools.adfuller`](https://www.statsmodels.org/stable/generated/statsmodels.tsa.stattools.adfuller.html): Augmented Dickey-Fuller unit root test.

1069 - [`statsmodels.tsa.stattools.kpss`](https://www.statsmodels.org/stable/generated/statsmodels.tsa.stattools.kpss.html): Kwiatkowski-Phillips-Schmidt-Shin stationarity test.

1070 - [`statsmodels.tsa.stattools.range_unit_root_test`](https://www.statsmodels.org/stable/generated/statsmodels.tsa.stattools.range_unit_root_test.html): Range Unit-Root test.

1071 - [`statsmodels.tsa.stattools.zivot_andrews`](https://www.statsmodels.org/stable/generated/statsmodels.tsa.stattools.zivot_andrews.html): Zivot-Andrews structural break test.

1072 - [`pmdarima.arima.PPTest`](https://alkaline-ml.com/pmdarima/modules/generated/pmdarima.arima.PPTest.html): Phillips-Perron unit root test.

1073 - [`arch.unitroot.DFGLS`](https://arch.readthedocs.io/en/latest/unitroot/generated/arch.unitroot.DFGLS.html): Elliot, Rothenberg and Stock's GLS-detrended Dickey-Fuller.

1074 - [`arch.unitroot.VarianceRatio`](https://arch.readthedocs.io/en/latest/unitroot/generated/arch.unitroot.VarianceRatio.html): Variance Ratio test of a random walk.

1075 - [`ts_stat_tests.stationarity.algorithms.adf`][ts_stat_tests.stationarity.algorithms.adf]: Augmented Dickey-Fuller unit root test.

1076 - [`ts_stat_tests.stationarity.algorithms.kpss`][ts_stat_tests.stationarity.algorithms.kpss]: Kwiatkowski-Phillips-Schmidt-Shin stationarity test.

1077 - [`ts_stat_tests.stationarity.algorithms.rur`][ts_stat_tests.stationarity.algorithms.rur]: Range Unit-Root test of stationarity.

1078 - [`ts_stat_tests.stationarity.algorithms.za`][ts_stat_tests.stationarity.algorithms.za]: Zivot-Andrews structural break unit root test.

1079 - [`ts_stat_tests.stationarity.algorithms.pp`][ts_stat_tests.stationarity.algorithms.pp]: Phillips-Perron unit root test.

1080 - [`ts_stat_tests.stationarity.algorithms.ers`][ts_stat_tests.stationarity.algorithms.ers]: Elliot, Rothenberg and Stock's GLS-detrended Dickey-Fuller test.

1081 - [`ts_stat_tests.stationarity.algorithms.vr`][ts_stat_tests.stationarity.algorithms.vr]: Variance Ratio test of a random walk.

1082 """

1083 _x = np.asarray(x)

1084 nobs = _x.shape[0]

1085 _lags = lags

1086 if _lags is None:

1087 _lags = int(np.ceil(12.0 * np.power(nobs / 100.0, 1 / 4.0)))

1088

1089 # arch PP test requires lags < nobs-1

1090 if _lags >= nobs - 1:

1091 _lags = max(0, nobs - 2)

1092

1093 res = _pp(y=_x, lags=_lags, trend=trend, test_type=test_type)

1094 return (float(res.stat), float(res.pvalue), int(res.lags), dict(res.critical_values))

1095

1096

1097@typechecked

1098def ers(

1099 y: ArrayLike,

1100 lags: Optional[int] = None,

1101 trend: VALID_ERS_TREND_OPTIONS = "c",

1102 max_lags: Optional[int] = None,

1103 method: VALID_ERS_METHOD_OPTIONS = "aic",

1104 low_memory: Optional[bool] = None,

1105) -> tuple[float, float, int, dict]:

1106 r"""

1107 !!! note "Summary"

1108 Elliott, Rothenberg and Stock's GLS detrended Dickey-Fuller.

1109

1110 ???+ abstract "Details"

1111

1112 The Elliott-Rothenberg-Stock (ERS) test is a statistical test used to determine whether a time series is stationary or not. It is a robust test that is able to handle a wide range of non-stationary processes, including ones with structural breaks, heteroscedasticity, and autocorrelation.

1113

1114 The ERS test involves fitting a local-to-zero regression of the time series on a constant and a linear time trend, using a kernel function to weight the observations. The test statistic is then calculated based on the sum of the squared residuals of the local-to-zero regression, which is adjusted for the bandwidth of the kernel function and for the correlation of the residuals.

1115

1116 If the test statistic is less than a critical value at a given significance level, typically 0.05, then we reject the null hypothesis and conclude that the time series is stationary. If the test statistic is greater than the critical value, then we fail to reject the null hypothesis and conclude that the time series is non-stationary.

1117

1118 In practical terms, if a time series is found to be non-stationary by the ERS test, one can apply differencing to the time series until it becomes stationary. This involves taking the difference between consecutive observations and potentially repeating this process until the time series is stationary.

1119

1120 Overall, the ERS test is a powerful and flexible test for stationarity, and it is widely used in time series analysis and forecasting. However, it is important to use multiple tests and diagnostic tools to determine the stationarity of a time series, as no single test is perfect in all situations.

1121

1122 Params:

1123 y (ArrayLike):

1124 The data to test for a unit root.

1125 lags (Optional[int], optional):

1126 The number of lags to use in the ADF regression. If omitted or `None`, method is used to automatically select the lag length with no more than `max_lags` are included.

1127 Defaults to `None`.

1128 trend (VALID_ERS_TREND_OPTIONS, optional):

1129 The trend component to include in the test

1130

1131 - `"c"`: Include a constant (Default)

1132 - `"ct"`: Include a constant and linear time trend

1133

1134 Defaults to `"c"`.

1135 max_lags (Optional[int], optional):

1136 The maximum number of lags to use when selecting lag length. When using automatic lag length selection, the lag is selected using OLS detrending rather than GLS detrending.

1137 Defaults to `None`.

1138 method (VALID_ERS_METHOD_OPTIONS, optional):

1139 The method to use when selecting the lag length

1140

1141 - `"AIC"`: Select the minimum of the Akaike IC

1142 - `"BIC"`: Select the minimum of the Schwarz/Bayesian IC

1143 - `"t-stat"`: Select the minimum of the Schwarz/Bayesian IC

1144

1145 Defaults to `"aic"`.

1146 low_memory (Optional[bool], optional):

1147 Flag indicating whether to use the low-memory algorithm for lag-length selection.

1148 Defaults to `None`.

1149

1150 Returns:

1151 (tuple[float, float, int, dict]):

1152 Returns a tuple containing:

1153 - `stat` (float): The test statistic for a unit root.

1154 - `pvalue` (float): The p-value for the test statistic.

1155 - `lags` (int): The number of lags used in the test.

1156 - `crit` (dict): The critical values for the test statistic at the 1%, 5%, and 10% levels.

1157

1158 ???+ example "Examples"

1159

1160 ```pycon {.py .python linenums="1" title="Setup"}

1161 >>> from ts_stat_tests.stationarity.algorithms import ers

1162 >>> from ts_stat_tests.utils.data import data_airline, data_normal, data_noise

1163 >>> normal = data_normal

1164 >>> noise = data_noise

1165 >>> airline = data_airline.values

1166

1167 ```

1168

1169 ```pycon {.py .python linenums="1" title="Example 1: Stationary Series"}

1170 >>> stat, pvalue, lags, crit = ers(y=normal)

1171 >>> print(f"ERS statistic: {stat:.4f}")

1172 ERS statistic: -30.1517

1173 >>> print(f"p-value: {pvalue:.4f}")

1174 p-value: 0.0000

1175

1176 ```

1177

1178 ```pycon {.py .python linenums="1" title="Example 2: Noisy Series"}

1179 >>> stat, pvalue, lags, crit = ers(y=noise)

1180 >>> print(f"ERS statistic: {stat:.4f}")

1181 ERS statistic: -12.6897

1182 >>> print(f"p-value: {pvalue:.4e}")

1183 p-value: 1.0956e-21

1184

1185 ```

1186

1187 ```pycon {.py .python linenums="1" title="Example 3: Real-World Time Series"}

1188 >>> stat, pvalue, lags, crit = ers(y=airline)

1189 >>> print(f"ERS statistic: {stat:.4f}")

1190 ERS statistic: 0.9918

1191 >>> print(f"p-value: {pvalue:.4f}")

1192 p-value: 0.9232

1193

1194 ```

1195

1196 ??? equation "Calculation"

1197

1198 The mathematical equation for the ERS test is:

1199

1200 $$

1201 y_t = \mu_t + \epsilon_t

1202 $$

1203

1204 where:

1205

1206 - $y_t$ is the value of the time series at time $t$.

1207 - $\mu_t$ is a time-varying mean function.

1208 - $\epsilon_t$ is a stationary error term with mean zero and constant variance.

1209

1210 The ERS test is based on the idea that if the time series is stationary, then the mean function should be a constant over time. Therefore, the null hypothesis of the ERS test is that the time series is non-stationary (unit root), and the alternative hypothesis is that the time series is stationary.

1211

1212 Here are the detailed steps for how to calculate the ERS test:

1213

1214 1. Collect your time series data and plot it to visually check for any trends, seasonal patterns, or other patterns that could make the data non-stationary. If you detect any such patterns, you will need to pre-process your data (e.g., detrending, deseasonalizing, etc.) to remove these effects.

1215

1216 1. Estimate the time-varying mean function using a local polynomial regression method. The choice of the polynomial degree depends on the complexity of the mean function and the sample size. Typically, a quadratic or cubic polynomial is used. The estimated mean function is denoted as $\mu_t$.

1217

1218 1. Calculate the test statistic, which is based on the following equation:

1219

1220 $$

1221 z = \left( \frac {T-1} {( \frac {1} {12\pi^2 \times \Delta^2} )} \right) ^{\frac{1}{2}} \times \left( \sum_{t=1}^T \frac {(y_t - \mu_t)^2} {T-1} \right)

1222 $$

1223

1224 where:

1225

1226 - $T$ is the sample size

1227 - $\Delta$ is the bandwidth parameter, which controls the degree of smoothing of the mean function. The optimal value of $\Delta$ depends on the sample size and the noise level of the data.

1228 - $\pi$ is the constant pi.

1229

1230 1. Determine the critical values of the test statistic from the ERS distribution table. The critical values depend on the sample size and the level of significance.

1231

1232 1. Finally, interpret the results and draw conclusions about the stationarity of the time series. If the null hypothesis is rejected, then the time series is non-stationary with a time-varying mean function. If the null hypothesis is not rejected, then the time series is stationary.

1233

1234 In practice, the ERS test is often conducted using software packages such as R, Python, or MATLAB, which automate the estimation of the time-varying mean function and calculation of the test statistic.

1235

1236 ??? note "Notes"

1237 The null hypothesis of the Dickey-Fuller GLS is that there is a unit root, with the alternative that there is no unit root. If the p-value is above a critical size, then the null cannot be rejected and the series appears to be a unit root.

1238

1239 DFGLS differs from the ADF test in that an initial GLS detrending step is used before a trend-less ADF regression is run.

1240

1241 Critical values and p-values when trend is `"c"` are identical to the ADF. When trend is set to `"ct"`, they are from Elliott, Rothenberg, and Stock (1996).

1242

1243 !!! success "Credit"

1244 - All credit goes to the [`arch`](https://arch.readthedocs.io/en/latest/unitroot/generated/arch.unitroot.DFGLS.html) library.

1245

1246 ??? question "References"

1247 - Elliott, G. R., T. J. Rothenberg, and J. H. Stock. 1996. Efficient bootstrap for an autoregressive unit root. Econometrica 64: 813-836.

1248 - Perron, P., & Qu, Z. (2007). A simple modification to improve the finite sample properties of Ng and Perron’s unit root tests. Economics letters, 94(1), 12-19.

1249

1250 ??? tip "See Also"

1251 - [`statsmodels.tsa.stattools.adfuller`](https://www.statsmodels.org/stable/generated/statsmodels.tsa.stattools.adfuller.html): Augmented Dickey-Fuller unit root test.

1252 - [`statsmodels.tsa.stattools.kpss`](https://www.statsmodels.org/stable/generated/statsmodels.tsa.stattools.kpss.html): Kwiatkowski-Phillips-Schmidt-Shin stationarity test.

1253 - [`statsmodels.tsa.stattools.range_unit_root_test`](https://www.statsmodels.org/stable/generated/statsmodels.tsa.stattools.range_unit_root_test.html): Range Unit-Root test.

1254 - [`statsmodels.tsa.stattools.zivot_andrews`](https://www.statsmodels.org/stable/generated/statsmodels.tsa.stattools.zivot_andrews.html): Zivot-Andrews structural break test.

1255 - [`pmdarima.arima.PPTest`](https://alkaline-ml.com/pmdarima/modules/generated/pmdarima.arima.PPTest.html): Phillips-Perron unit root test.

1256 - [`arch.unitroot.DFGLS`](https://arch.readthedocs.io/en/latest/unitroot/generated/arch.unitroot.DFGLS.html): Elliot, Rothenberg and Stock's GLS-detrended Dickey-Fuller.

1257 - [`arch.unitroot.VarianceRatio`](https://arch.readthedocs.io/en/latest/unitroot/generated/arch.unitroot.VarianceRatio.html): Variance Ratio test of a random walk.

1258 - [`ts_stat_tests.stationarity.algorithms.adf`][ts_stat_tests.stationarity.algorithms.adf]: Augmented Dickey-Fuller unit root test.

1259 - [`ts_stat_tests.stationarity.algorithms.kpss`][ts_stat_tests.stationarity.algorithms.kpss]: Kwiatkowski-Phillips-Schmidt-Shin stationarity test.

1260 - [`ts_stat_tests.stationarity.algorithms.rur`][ts_stat_tests.stationarity.algorithms.rur]: Range Unit-Root test of stationarity.

1261 - [`ts_stat_tests.stationarity.algorithms.za`][ts_stat_tests.stationarity.algorithms.za]: Zivot-Andrews structural break unit root test.

1262 - [`ts_stat_tests.stationarity.algorithms.pp`][ts_stat_tests.stationarity.algorithms.pp]: Phillips-Perron unit root test.

1263 - [`ts_stat_tests.stationarity.algorithms.ers`][ts_stat_tests.stationarity.algorithms.ers]: Elliot, Rothenberg and Stock's GLS-detrended Dickey-Fuller test.

1264 - [`ts_stat_tests.stationarity.algorithms.vr`][ts_stat_tests.stationarity.algorithms.vr]: Variance Ratio test of a random walk.

1265 """

1266 res = _ers(

1267 y=np.asarray(y),

1268 lags=lags,

1269 trend=trend,

1270 max_lags=max_lags,

1271 method=method,

1272 low_memory=low_memory,

1273 )

1274 return (float(res.stat), float(res.pvalue), int(res.lags), dict(res.critical_values))

1275

1276

1277@typechecked

1278def vr(

1279 y: ArrayLike,

1280 lags: int = 2,

1281 trend: VALID_VR_TREND_OPTIONS = "c",

1282 debiased: bool = True,

1283 robust: bool = True,

1284 overlap: bool = True,

1285) -> tuple[float, float, float]:

1286 r"""

1287 !!! note "Summary"

1288 Variance Ratio test of a random walk.

1289

1290 ???+ abstract "Details"

1291

1292 The Variance Ratio (VR) test is a statistical test used to determine whether a time series is stationary or not based on the presence of long-term dependence in the series. It is a non-parametric test that can be used to test for the presence of a unit root or a trend in the series.

1293

1294 The VR test involves calculating the ratio of the variance of the differences of the logarithms of the time series over different time intervals. The variance of the differences of the logarithms is a measure of the volatility of the series, and the ratio of the variances over different intervals is a measure of the long-term dependence in the series.

1295

1296 If the series is stationary, then the variance ratio will be close to one for all intervals. If the series is non-stationary, then the variance ratio will tend to increase as the length of the interval increases, reflecting the presence of long-term dependence in the series.

1297

1298 The VR test involves comparing the observed variance ratio to the distribution of variance ratios expected under the null hypothesis of a random walk (non-stationary). If the test statistic is less than a critical value at a given significance level, typically 0.05, then we reject the null hypothesis and conclude that the time series is stationary. If the test statistic is greater than the critical value, then we fail to reject the null hypothesis and conclude that the time series is non-stationary.

1299

1300 In practical terms, if a time series is found to be non-stationary by the VR test, one can apply differencing to the time series until it becomes stationary. This involves taking the difference between consecutive observations and potentially repeating this process until the time series is stationary.

1301

1302 Overall, the VR test is a useful and relatively simple test for stationarity that can be applied to a wide range of time series. However, it is important to use multiple tests and diagnostic tools to confirm the stationarity of a time series, as no single test is perfect in all situations.

1303

1304 Params:

1305 y (ArrayLike):

1306 The data to test for a random walk.

1307 lags (int):

1308 The number of periods to used in the multi-period variance, which is the numerator of the test statistic. Must be at least 2.

1309 Defaults to `2`.

1310 trend (VALID_VR_TREND_OPTIONS, optional):

1311 `"c"` allows for a non-zero drift in the random walk, while `"n"` requires that the increments to `y` are mean `0`.

1312 Defaults to `"c"`.

1313 debiased (bool, optional):

1314 Indicates whether to use a debiased version of the test. Only applicable if `overlap` is `True`.

1315 Defaults to `True`.

1316 robust (bool, optional):

1317 Indicates whether to use heteroskedasticity robust inference.

1318 Defaults to `True`.

1319 overlap (bool, optional):

1320 Indicates whether to use all overlapping blocks. If `False`, the number of observations in $y-1$ must be an exact multiple of `lags`. If this condition is not satisfied, some values at the end of `y` will be discarded.

1321 Defaults to `True`.

1322

1323 Returns:

1324 (tuple[float, float, float]):

1325 Returns a tuple containing:

1326 - `stat` (float): The test statistic for a unit root.

1327 - `pvalue` (float): The p-value for the test statistic.

1328 - `vr` (float): The ratio of the long block lags-period variance.

1329

1330 ???+ example "Examples"

1331

1332 ```pycon {.py .python linenums="1" title="Setup"}

1333 >>> from ts_stat_tests.stationarity.algorithms import vr

1334 >>> from ts_stat_tests.utils.data import data_airline, data_normal, data_noise, data_sine

1335 >>> normal = data_normal

1336 >>> noise = data_noise

1337 >>> seasonal = data_sine

1338 >>> airline = data_airline.values

1339

1340 ```

1341

1342 ```pycon {.py .python linenums="1" title="Example 1: Stationary Series"}

1343 >>> stat, pvalue, variance_ratio = vr(y=normal)

1344 >>> print(f"VR statistic: {stat:.4f}")

1345 VR statistic: -12.8518

1346 >>> print(f"p-value: {pvalue:.4f}")

1347 p-value: 0.0000

1348 >>> print(f"Variance ratio: {variance_ratio:.4f}")

1349 Variance ratio: 0.5202

1350

1351 ```

1352

1353 ```pycon {.py .python linenums="1" title="Example 2: Noisy Series"}

1354 >>> stat, pvalue, variance_ratio = vr(y=noise)

1355 >>> print(f"VR statistic: {stat:.4f}")

1356 VR statistic: -11.5007

1357 >>> print(f"p-value: {pvalue:.4f}")

1358 p-value: 0.0000

1359 >>> print(f"Variance ratio: {variance_ratio:.4f}")

1360 Variance ratio: 0.5094

1361

1362 ```

1363

1364 ```pycon {.py .python linenums="1" title="Example 3: Seasonal Series"}

1365 >>> stat, pvalue, variance_ratio = vr(y=seasonal)

1366 >>> print(f"VR statistic: {stat:.4f}")

1367 VR statistic: 44.7019

1368 >>> print(f"p-value: {pvalue:.4f}")

1369 p-value: 0.0000

1370 >>> print(f"Variance ratio: {variance_ratio:.4f}")

1371 Variance ratio: 1.9980

1372

1373 ```

1374

1375 ```pycon {.py .python linenums="1" title="Example 4: Real-World Time Series"}

1376 >>> stat, pvalue, variance_ratio = vr(y=airline)

1377 >>> print(f"VR statistic: {stat:.4f}")

1378 VR statistic: 3.1511

1379 >>> print(f"p-value: {pvalue:.4f}")

1380 p-value: 0.0016

1381 >>> print(f"Variance ratio: {variance_ratio:.4f}")

1382 Variance ratio: 1.3163

1383

1384 ```

1385

1386 ??? equation "Calculation"

1387

1388 The Variance Ratio (VR) test is a statistical test for stationarity in time series forecasting that is based on the idea that if the time series is stationary, then the variance of the returns should be constant over time. The mathematical equation for the VR test is:

1389

1390 $$

1391 VR(k) = \frac {\sigma^2(k)} {k\sigma^2(1)}

1392 $$

1393

1394 where:

1395

1396 - $VR(k)$ is the variance ratio for the time series over $k$ periods.

1397 - $\sigma^2(k)$ is the variance of the returns over $k$ periods.

1398 - $\sigma^2(1)$ is the variance of the returns over $1$ period.

1399

1400 The VR test involves comparing the variance ratio to a critical value, which is derived from the null distribution of the variance ratio under the assumption of a random walk with drift.

1401

1402 Here are the detailed steps for how to calculate the VR test:

1403

1404 1. Collect your time series data and compute the log returns, which are defined as:

1405

1406 $$

1407 r_t = \log(y_t) - \log(y_{t-1})

1408 $$

1409

1410 where:

1411

1412 - $y_t$ is the value of the time series at time $t$.

1413

1414 1. Compute the variance of the returns over $k$ periods, which is defined as:

1415

1416 $$

1417 \sigma^2(k) = \left( \frac {1} {n-k} \right) \times \sum_{t=k+1}^n (r_t - \mu_k)^2

1418 $$

1419

1420 where:

1421

1422 - $n$ is the sample size.

1423 - $\mu_k$ is the mean of the returns over $k$ periods, which is defined as:

1424

1425 $\mu_k = \left( \frac{1} {n-k} \right) \times \sum_{t=k+1}^n r_t$

1426

1427 1. Compute the variance of the returns over $1$ period, which is defined as:

1428

1429 $$

1430 \sigma^2(1) = \left( \frac{1} {n-1} \right) \times \sum_{t=2}^n (r_t - \mu_1)^2

1431 $$

1432

1433 where:

1434

1435 - $\mu_1$ is the mean of the returns over $1$ period, which is defined as:

1436

1437 $\mu_1 = \left( \frac{1} {n-1} \right) \times \sum_{t=2}^n r_t$

1438

1439 1. Compute the variance ratio for each value of $k$, which is defined as:

1440

1441 $$

1442 VR(k) = \frac {\sigma^2(k)} {k\sigma^2(1)}

1443 $$

1444

1445 1. Determine the critical values of the variance ratio from the null distribution table of the VR test, which depend on the sample size, the level of significance, and the lag length $k$.

1446

1447 1. Finally, compare the variance ratio to the critical value. If the variance ratio is greater than the critical value, then the null hypothesis of a random walk with drift is rejected, and the time series is considered stationary. If the variance ratio is less than or equal to the critical value, then the null hypothesis cannot be rejected, and the time series is considered non-stationary.

1448

1449 In practice, the VR test is often conducted using software packages such as R, Python, or MATLAB, which automate the calculation of the variance ratio and the determination of the critical value.

1450

1451 ??? note "Notes"

1452 The null hypothesis of a VR is that the process is a random walk, possibly plus drift. Rejection of the null with a positive test statistic indicates the presence of positive serial correlation in the time series.

1453

1454 !!! success "Credit"

1455 - All credit goes to the [`arch`](https://arch.readthedocs.io/en/latest/unitroot/generated/arch.unitroot.VarianceRatio.html) library.

1456

1457 ??? question "References"

1458 - Campbell, John Y., Lo, Andrew W. and MacKinlay, A. Craig. (1997) The Econometrics of Financial Markets. Princeton, NJ: Princeton University Press.

1459

1460 ??? tip "See Also"

1461 - [`statsmodels.tsa.stattools.adfuller`](https://www.statsmodels.org/stable/generated/statsmodels.tsa.stattools.adfuller.html): Augmented Dickey-Fuller unit root test.

1462 - [`statsmodels.tsa.stattools.kpss`](https://www.statsmodels.org/stable/generated/statsmodels.tsa.stattools.kpss.html): Kwiatkowski-Phillips-Schmidt-Shin stationarity test.

1463 - [`statsmodels.tsa.stattools.range_unit_root_test`](https://www.statsmodels.org/stable/generated/statsmodels.tsa.stattools.range_unit_root_test.html): Range Unit-Root test.

1464 - [`statsmodels.tsa.stattools.zivot_andrews`](https://www.statsmodels.org/stable/generated/statsmodels.tsa.stattools.zivot_andrews.html): Zivot-Andrews structural break test.

1465 - [`pmdarima.arima.PPTest`](https://alkaline-ml.com/pmdarima/modules/generated/pmdarima.arima.PPTest.html): Phillips-Perron unit root test.

1466 - [`arch.unitroot.DFGLS`](https://arch.readthedocs.io/en/latest/unitroot/generated/arch.unitroot.DFGLS.html): Elliot, Rothenberg and Stock's GLS-detrended Dickey-Fuller.

1467 - [`arch.unitroot.VarianceRatio`](https://arch.readthedocs.io/en/latest/unitroot/generated/arch.unitroot.VarianceRatio.html): Variance Ratio test of a random walk.

1468 - [`ts_stat_tests.stationarity.algorithms.adf`][ts_stat_tests.stationarity.algorithms.adf]: Augmented Dickey-Fuller unit root test.

1469 - [`ts_stat_tests.stationarity.algorithms.kpss`][ts_stat_tests.stationarity.algorithms.kpss]: Kwiatkowski-Phillips-Schmidt-Shin stationarity test.

1470 - [`ts_stat_tests.stationarity.algorithms.rur`][ts_stat_tests.stationarity.algorithms.rur]: Range Unit-Root test of stationarity.

1471 - [`ts_stat_tests.stationarity.algorithms.za`][ts_stat_tests.stationarity.algorithms.za]: Zivot-Andrews structural break unit root test.

1472 - [`ts_stat_tests.stationarity.algorithms.pp`][ts_stat_tests.stationarity.algorithms.pp]: Phillips-Perron unit root test.

1473 - [`ts_stat_tests.stationarity.algorithms.ers`][ts_stat_tests.stationarity.algorithms.ers]: Elliot, Rothenberg and Stock's GLS-detrended Dickey-Fuller test.

1474 - [`ts_stat_tests.stationarity.algorithms.vr`][ts_stat_tests.stationarity.algorithms.vr]: Variance Ratio test of a random walk.

1475 """

1476 res = _vr(

1477 y=np.asarray(y),

1478 lags=lags,

1479 trend=trend,

1480 debiased=debiased,

1481 robust=robust,

1482 overlap=overlap,

1483 )

1484 return float(res.stat), float(res.pvalue), float(res.vr)

Coverage for src / ts_stat_tests / stationarity / algorithms.py: 100%

57 statements