Coverage for src / ts_stat_tests / stationarity / algorithms.py: 100%
57 statements
« prev ^ index » next coverage.py v7.13.2, created at 2026-02-01 09:48 +0000
« prev ^ index » next coverage.py v7.13.2, created at 2026-02-01 09:48 +0000
1# ============================================================================ #
2# #
3# Title: Stationarity Algorithms #
4# Purpose: Algorithms to test for stationarity in time series data. #
5# #
6# ============================================================================ #
9# ---------------------------------------------------------------------------- #
10# #
11# Overview ####
12# #
13# ---------------------------------------------------------------------------- #
16# ---------------------------------------------------------------------------- #
17# Description ####
18# ---------------------------------------------------------------------------- #
21"""
22!!! note "Summary"
23 Stationarity tests are statistical tests used to determine whether a time series is stationary or not. A stationary time series is one whose statistical properties, such as mean and variance, do not change over time. Stationarity is an important assumption in many time series forecasting models, as it allows for the use of techniques such as autoregression and moving averages.
25 There are several different types of stationarity tests, including the Augmented Dickey-Fuller (ADF) test, the Kwiatkowski-Phillips-Schmidt-Shin (KPSS) test, the Phillips-Perron (PP) test, the Elliott-Rothenberg-Stock (ERS) test, and the Variance Ratio (VR) test. Each of these tests has its own strengths and weaknesses, and the choice of which test to use will depend on the specific characteristics of the time series being analyzed.
27 Overall, stationarity tests are an important tool in time series analysis and forecasting, as they help identify whether a time series is stationary or non-stationary, which can have implications for the choice of forecasting models and methods.
29 For a really good article on ADF & KPSS tests, check: [When A Time Series Only Quacks Like A Duck: Testing for Stationarity Before Running Forecast Models. With Python. And A Duckling Picture.](https://towardsdatascience.com/when-a-time-series-only-quacks-like-a-duck-10de9e165e)
30"""
33# ---------------------------------------------------------------------------- #
34# #
35# Setup ####
36# #
37# ---------------------------------------------------------------------------- #
40# ---------------------------------------------------------------------------- #
41# Imports ####
42# ---------------------------------------------------------------------------- #
45# ## Python StdLib Imports ----
46from typing import Any, Literal, Optional, Union, overload
48# ## Python Third Party Imports ----
49import numpy as np
50from arch.unitroot import (
51 DFGLS as _ers,
52 PhillipsPerron as _pp,
53 VarianceRatio as _vr,
54)
55from numpy.typing import ArrayLike
56from statsmodels.stats.diagnostic import ResultsStore
57from statsmodels.tsa.stattools import (
58 adfuller as _adfuller,
59 kpss as _kpss,
60 range_unit_root_test as _rur,
61 zivot_andrews as _za,
62)
63from typeguard import typechecked
66# ---------------------------------------------------------------------------- #
67# Exports ####
68# ---------------------------------------------------------------------------- #
71__all__: list[str] = ["adf", "kpss", "rur", "za", "pp", "ers", "vr"]
74## --------------------------------------------------------------------------- #
75## Constants ####
76## --------------------------------------------------------------------------- #
79VALID_ADF_REGRESSION_OPTIONS = Literal["c", "ct", "ctt", "n"]
80VALID_ADF_AUTOLAG_OPTIONS = Literal["AIC", "BIC", "t-stat"]
81VALID_KPSS_REGRESSION_OPTIONS = Literal["c", "ct"]
82VALID_KPSS_NLAGS_OPTIONS = Literal["auto", "legacy"]
83VALID_ZA_REGRESSION_OPTIONS = Literal["c", "t", "ct"]
84VALID_ZA_AUTOLAG_OPTIONS = Literal["AIC", "BIC", "t-stat"]
85VALID_PP_TREND_OPTIONS = Literal["n", "c", "ct"]
86VALID_PP_TEST_TYPE_OPTIONS = Literal["rho", "tau"]
87VALID_ERS_TREND_OPTIONS = Literal["c", "ct"]
88VALID_ERS_METHOD_OPTIONS = Literal["aic", "bic", "t-stat"]
89VALID_VR_TREND_OPTIONS = Literal["c", "n"]
92# ---------------------------------------------------------------------------- #
93# #
94# Algorithms ####
95# #
96# ---------------------------------------------------------------------------- #
99@overload
100def adf(
101 x: ArrayLike,
102 maxlag: Optional[int] = None,
103 regression: VALID_ADF_REGRESSION_OPTIONS = "c",
104 *,
105 autolag: Optional[VALID_ADF_AUTOLAG_OPTIONS] = "AIC",
106 store: Literal[True],
107 regresults: bool = False,
108) -> tuple[float, float, dict, ResultsStore]: ...
109@overload
110def adf(
111 x: ArrayLike,
112 maxlag: Optional[int] = None,
113 regression: VALID_ADF_REGRESSION_OPTIONS = "c",
114 *,
115 autolag: None,
116 store: Literal[False] = False,
117 regresults: bool = False,
118) -> tuple[float, float, int, int, dict]: ...
119@overload
120def adf(
121 x: ArrayLike,
122 maxlag: Optional[int] = None,
123 regression: VALID_ADF_REGRESSION_OPTIONS = "c",
124 *,
125 autolag: VALID_ADF_AUTOLAG_OPTIONS = "AIC",
126 store: Literal[False] = False,
127 regresults: bool = False,
128) -> tuple[float, float, int, int, dict, float]: ...
129@typechecked
130def adf(
131 x: ArrayLike,
132 maxlag: Optional[int] = None,
133 regression: VALID_ADF_REGRESSION_OPTIONS = "c",
134 *,
135 autolag: Optional[VALID_ADF_AUTOLAG_OPTIONS] = "AIC",
136 store: bool = False,
137 regresults: bool = False,
138) -> Union[
139 tuple[float, float, dict, ResultsStore],
140 tuple[float, float, int, int, dict],
141 tuple[float, float, int, int, dict, float],
142]:
143 r"""
144 !!! note "Summary"
145 The Augmented Dickey-Fuller test can be used to test for a unit root in a univariate process in the presence of serial correlation.
147 ???+ abstract "Details"
149 The Augmented Dickey-Fuller (ADF) test is a statistical test used to determine whether a time series is stationary or not. Stationarity refers to the property of a time series where the statistical properties, such as mean and variance, remain constant over time. Stationarity is important for time series forecasting as it allows for the use of many popular forecasting models, such as ARIMA.
151 The ADF test is an extension of the Dickey-Fuller test and involves regressing the first-difference of the time series on its lagged values, and then testing whether the coefficient of the lagged first-difference term is statistically significant. If it is, then the time series is considered non-stationary.
153 The null hypothesis of the ADF test is that the time series has a unit root, which means that it is non-stationary. The alternative hypothesis is that the time series is stationary. If the p-value of the test is less than a chosen significance level, typically 0.05, then we reject the null hypothesis and conclude that the time series is stationary.
155 In practical terms, if a time series is found to be non-stationary by the ADF test, one can apply differencing to the time series until it becomes stationary. This involves taking the difference between consecutive observations and potentially repeating this process until the time series is stationary.
157 Params:
158 x (ArrayLike):
159 The data series to test.
160 maxlag (Optional[int]):
161 Maximum lag which is included in test, default value of $12 \times (\frac{nobs}{100})^{\frac{1}{4}}$ is used when `None`.
162 Default: `None`
163 regression (VALID_ADF_REGRESSION_OPTIONS):
164 Constant and trend order to include in regression.
166 - `"c"`: constant only (default).
167 - `"ct"`: constant and trend.
168 - `"ctt"`: constant, and linear and quadratic trend.
169 - `"n"`: no constant, no trend.
171 Default: `"c"`
172 autolag (Optional[VALID_ADF_AUTOLAG_OPTIONS]):
173 Method to use when automatically determining the lag length among the values $0, 1, ..., maxlag$.
175 - If `"AIC"` (default) or `"BIC"`, then the number of lags is chosen to minimize the corresponding information criterion.
176 - `"t-stat"` based choice of `maxlag`. Starts with `maxlag` and drops a lag until the t-statistic on the last lag length is significant using a 5%-sized test.
177 - If `None`, then the number of included lags is set to `maxlag`.
179 Default: `"AIC"`
180 store (bool):
181 If `True`, then a result instance is returned additionally to the `adf` statistic.
182 Default: `False`
183 regresults (bool):
184 If `True`, the full regression results are returned.
185 Default: `False`
187 Returns:
188 (Union[tuple[float, float, dict, ResultsStore], tuple[float, float, int, int, dict], tuple[float, float, int, int, dict, float]]):
189 Depending on parameters, returns a tuple containing:
190 - `adf` (float): The test statistic.
191 - `pvalue` (float): MacKinnon's approximate p-value.
192 - `uselag` (int): The number of lags used.
193 - `nobs` (int): The number of observations used.
194 - `critical_values` (dict): Critical values at the 1%, 5%, and 10% levels.
195 - `icbest` (float): The maximized information criterion (if `autolag` is not `None`).
196 - `resstore` (Optional[ResultsStore]): Result instance (if `store` is `True`).
198 ???+ example "Examples"
200 ```pycon {.py .python linenums="1" title="Setup"}
201 >>> from ts_stat_tests.stationarity.algorithms import adf
202 >>> from ts_stat_tests.utils.data import data_airline, data_normal
203 >>> normal = data_normal
204 >>> airline = data_airline.values
206 ```
208 ```pycon {.py .python linenums="1" title="Example 1: Stationary Series"}
209 >>> stat, pvalue, lags, nobs, crit, icbest = adf(x=normal)
210 >>> print(f"ADF statistic: {stat:.4f}")
211 ADF statistic: -30.7838
212 >>> print(f"p-value: {pvalue:.4f}")
213 p-value: 0.0000
215 ```
217 ```pycon {.py .python linenums="1" title="Example 2: Airline Passengers Data"}
218 >>> stat, pvalue, lags, nobs, crit, icbest = adf(x=airline)
219 >>> print(f"p-value: {pvalue:.4f}")
220 p-value: 0.9919
222 ```
224 ```pycon {.py .python linenums="1" title="Example 3: Store Result Instance"}
225 >>> res = adf(x=airline, store=True)
226 >>> print(res)
227 (0.8153..., 0.9918..., {'1%': np.float64(-3.4816...), '5%': np.float64(-2.8840...), '10%': np.float64(-2.5787...)}, <statsmodels.stats.diagnostic.ResultsStore object at ...>)
229 ```
231 ```pycon {.py .python linenums="1" title="Example 4: No Autolag"}
232 >>> stat, pvalue, lags, nobs, crit = adf(x=airline, autolag=None, maxlag=5)
233 >>> print(f"p-value: {pvalue:.4f}")
234 p-value: 0.7670
236 ```
238 ??? equation "Calculation"
240 The mathematical equation for the Augmented Dickey-Fuller (ADF) test for stationarity in time series forecasting is:
242 $$
243 \Delta y_t = \alpha + \beta y_{t-1} + \sum_{i=1}^p \delta_i \Delta y_{t-i} + \epsilon_t
244 $$
246 where:
248 - $y_t$ is the value of the time series at time $t$.
249 - $\Delta y_t$ is the first difference of $y_t$, which is defined as $\Delta y_t = y_t - y_{t-1}$.
250 - $\alpha$ is the constant term.
251 - $\beta$ is the coefficient on $y_{t-1}$.
252 - $\delta_i$ are the coefficients on the lagged differences of $y$.
253 - $\epsilon_t$ is the error term.
255 The ADF test involves testing the null hypothesis that $\beta = 0$, or equivalently, that the time series has a unit root. If $\beta$ is significantly different from $0$, then the null hypothesis can be rejected and the time series is considered stationary.
257 Here are the detailed steps for how to calculate the ADF test:
259 1. Collect your time series data and plot it to visually check for any trends, seasonal patterns, or other patterns that could make the data non-stationary. If you detect any such patterns, you will need to pre-process your data (e.g., detrending, deseasonalizing, etc.) to remove these effects.
261 1. Calculate the first differences of the time series, which is simply the difference between each observation and the previous observation. This step is performed to transform the original data into a stationary process. The first difference of $y_t$ is defined as $\Delta y_t = y_t - y_{t-1}$.
263 1. Estimate the parameters $\alpha$, $\beta$, and $\delta_i$ using the least squares method. This involves regressing $\Delta y_t$ on its lagged values, $y_{t-1}$, and the lagged differences of $y, \Delta y_{t-1}, \Delta y_{t-2}, \dots, \Delta y_{t-p}$, where $p$ is the number of lags to include in the model. The estimated equation is:
265 $$
266 \Delta y_t = \alpha + \beta y_{t-1} + \sum_{i=1}^p \delta_i \Delta y_{t-i} + \epsilon_t
267 $$
269 1. Calculate the test statistic, which is given by:
271 $$
272 ADF = \frac {\beta-1}{SE(\beta)}
273 $$
275 - where $SE(\beta)$ is the standard error of the coefficient on $y_{t-1}$.
277 The test statistic measures the number of standard errors by which $\beta$ deviates from $1$. If ADF is less than the critical values from the ADF distribution table, we can reject the null hypothesis and conclude that the time series is stationary.
279 1. Compare the test statistic to the critical values in the ADF distribution table to determine the level of significance. The critical values depend on the sample size, the level of significance, and the number of lags in the model.
281 1. Finally, interpret the results and draw conclusions about the stationarity of the time series. If the null hypothesis is rejected, then the time series is stationary and can be used for forecasting. If the null hypothesis is not rejected, then the time series is non-stationary and requires further pre-processing before it can be used for forecasting.
283 ??? note "Notes"
284 The null hypothesis of the Augmented Dickey-Fuller is that there is a unit root, with the alternative that there is no unit root. If the p-value is above a critical size, then we cannot reject that there is a unit root.
286 The p-values are obtained through regression surface approximation from MacKinnon 1994, but using the updated 2010 tables. If the p-value is close to significant, then the critical values should be used to judge whether to reject the null.
288 The `autolag` option and `maxlag` for it are described in Greene.
290 ??? success "Credit"
291 - All credit goes to the [`statsmodels`](https://www.statsmodels.org/stable/generated/statsmodels.tsa.stattools.adfuller.html) library.
293 ??? question "References"
294 - Baum, C.F. (2004). ZANDREWS: Stata module to calculate Zivot-Andrews unit root test in presence of structural break," Statistical Software Components S437301, Boston College Department of Economics, revised 2015.
295 - Schwert, G.W. (1989). Tests for unit roots: A Monte Carlo investigation. Journal of Business & Economic Statistics, 7: 147-159.
296 - Zivot, E., and Andrews, D.W.K. (1992). Further evidence on the great crash, the oil-price shock, and the unit-root hypothesis. Journal of Business & Economic Studies, 10: 251-270.
298 ??? tip "See Also"
299 - [`statsmodels.tsa.stattools.adfuller`](https://www.statsmodels.org/stable/generated/statsmodels.tsa.stattools.adfuller.html): Augmented Dickey-Fuller unit root test.
300 - [`statsmodels.tsa.stattools.kpss`](https://www.statsmodels.org/stable/generated/statsmodels.tsa.stattools.kpss.html): Kwiatkowski-Phillips-Schmidt-Shin stationarity test.
301 - [`statsmodels.tsa.stattools.range_unit_root_test`](https://www.statsmodels.org/stable/generated/statsmodels.tsa.stattools.range_unit_root_test.html): Range Unit-Root test.
302 - [`statsmodels.tsa.stattools.zivot_andrews`](https://www.statsmodels.org/stable/generated/statsmodels.tsa.stattools.zivot_andrews.html): Zivot-Andrews structural break test.
303 - [`pmdarima.arima.PPTest`](https://alkaline-ml.com/pmdarima/modules/generated/pmdarima.arima.PPTest.html): Phillips-Perron unit root test.
304 - [`arch.unitroot.DFGLS`](https://arch.readthedocs.io/en/latest/unitroot/generated/arch.unitroot.DFGLS.html): Elliot, Rothenberg and Stock's GLS-detrended Dickey-Fuller.
305 - [`arch.unitroot.VarianceRatio`](https://arch.readthedocs.io/en/latest/unitroot/generated/arch.unitroot.VarianceRatio.html): Variance Ratio test of a random walk.
306 - [`ts_stat_tests.stationarity.algorithms.adf`][ts_stat_tests.stationarity.algorithms.adf]: Augmented Dickey-Fuller unit root test.
307 - [`ts_stat_tests.stationarity.algorithms.kpss`][ts_stat_tests.stationarity.algorithms.kpss]: Kwiatkowski-Phillips-Schmidt-Shin stationarity test.
308 - [`ts_stat_tests.stationarity.algorithms.rur`][ts_stat_tests.stationarity.algorithms.rur]: Range Unit-Root test of stationarity.
309 - [`ts_stat_tests.stationarity.algorithms.za`][ts_stat_tests.stationarity.algorithms.za]: Zivot-Andrews structural break unit root test.
310 - [`ts_stat_tests.stationarity.algorithms.pp`][ts_stat_tests.stationarity.algorithms.pp]: Phillips-Perron unit root test.
311 - [`ts_stat_tests.stationarity.algorithms.ers`][ts_stat_tests.stationarity.algorithms.ers]: Elliot, Rothenberg and Stock's GLS-detrended Dickey-Fuller test.
312 - [`ts_stat_tests.stationarity.algorithms.vr`][ts_stat_tests.stationarity.algorithms.vr]: Variance Ratio test of a random walk.
313 """
314 res: Any = _adfuller( # Using `Any` to avoid ty issues with statsmodels stubs
315 x=x,
316 maxlag=maxlag,
317 regression=regression,
318 autolag=autolag, # type: ignore[arg-type] # statsmodels stubs are often missing `None`
319 store=store,
320 regresults=regresults,
321 )
323 if store:
324 # returns (stat, pval, crit, store)
325 return float(res[0]), float(res[1]), dict(res[2]), res[3]
327 if autolag is None:
328 # returns (stat, pval, lags, nobs, crit)
329 return (
330 float(res[0]),
331 float(res[1]),
332 int(res[2]),
333 int(res[3]),
334 dict(res[4]),
335 )
337 # returns (stat, pval, lags, nobs, crit, icbest)
338 return (
339 float(res[0]),
340 float(res[1]),
341 int(res[2]),
342 int(res[3]),
343 dict(res[4]),
344 float(res[5]),
345 )
348@overload
349def kpss(
350 x: ArrayLike,
351 regression: VALID_KPSS_REGRESSION_OPTIONS = "c",
352 nlags: Optional[Union[VALID_KPSS_NLAGS_OPTIONS, int]] = None,
353 *,
354 store: Literal[True],
355) -> tuple[float, float, int, dict, ResultsStore]: ...
356@overload
357def kpss(
358 x: ArrayLike,
359 regression: VALID_KPSS_REGRESSION_OPTIONS = "c",
360 nlags: Optional[Union[VALID_KPSS_NLAGS_OPTIONS, int]] = None,
361 *,
362 store: Literal[False] = False,
363) -> tuple[float, float, int, dict]: ...
364@typechecked
365def kpss(
366 x: ArrayLike,
367 regression: VALID_KPSS_REGRESSION_OPTIONS = "c",
368 nlags: Optional[Union[VALID_KPSS_NLAGS_OPTIONS, int]] = None,
369 *,
370 store: bool = False,
371) -> Union[
372 tuple[float, float, int, dict, ResultsStore],
373 tuple[float, float, int, dict],
374]:
375 r"""
376 !!! note "Summary"
377 Computes the Kwiatkowski-Phillips-Schmidt-Shin (KPSS) test for the null hypothesis that `x` is level or trend stationary.
379 ???+ abstract "Details"
381 The Kwiatkowski-Phillips-Schmidt-Shin (KPSS) test is another statistical test used to determine whether a time series is stationary or not. The KPSS test is the opposite of the Augmented Dickey-Fuller (ADF) test, which tests for the presence of a unit root in the time series.
383 The KPSS test involves regressing the time series on a constant and a time trend. The null hypothesis of the test is that the time series is stationary. The alternative hypothesis is that the time series has a unit root, which means that it is non-stationary.
385 The test statistic is calculated by taking the sum of the squared residuals of the regression. If the test statistic is greater than a critical value at a given significance level, typically 0.05, then we reject the null hypothesis and conclude that the time series is non-stationary. If the test statistic is less than the critical value, then we fail to reject the null hypothesis and conclude that the time series is stationary.
387 In practical terms, if a time series is found to be non-stationary by the KPSS test, one can apply differencing to the time series until it becomes stationary. This involves taking the difference between consecutive observations and potentially repeating this process until the time series is stationary.
389 Overall, the ADF and KPSS tests are both important tools in time series analysis and forecasting, as they help identify whether a time series is stationary or non-stationary, which can have implications for the choice of forecasting models and methods.
391 Params:
392 x (ArrayLike):
393 The data series to test.
394 regression (VALID_KPSS_REGRESSION_OPTIONS, optional):
395 The null hypothesis for the KPSS test.
397 - `"c"`: The data is stationary around a constant (default).
398 - `"ct"`: The data is stationary around a trend.
400 Defaults to `"c"`.
401 nlags (Optional[Union[VALID_KPSS_NLAGS_OPTIONS, int]], optional):
402 Indicates the number of lags to be used.
404 - If `"auto"` (default), `lags` is calculated using the data-dependent method of Hobijn et al. (1998). See also Andrews (1991), Newey & West (1994), and Schwert (1989).
405 - If set to `"legacy"`, uses $int(12 \\times (\\frac{n}{100})^{\\frac{1}{4}})$, as outlined in Schwert (1989).
407 Defaults to `None`.
408 store (bool, optional):
409 If `True`, then a result instance is returned additionally to the KPSS statistic.<br>
410 Defaults to `False`.
412 Returns:
413 (Union[tuple[float, float, int, dict, ResultsStore], tuple[float, float, int, dict]]):
414 Returns a tuple containing:
415 - `stat` (float): The KPSS test statistic.
416 - `pvalue` (float): The p-value of the test.
417 - `lags` (int): The truncation lag parameter.
418 - `crit` (dict): The critical values at 10%, 5%, 2.5%, and 1%.
419 - `resstore` (Optional[ResultsStore]): Result instance (if `store` is `True`).
421 ???+ example "Examples"
423 ```pycon {.py .python linenums="1" title="Setup"}
424 >>> from ts_stat_tests.stationarity.algorithms import kpss
425 >>> from ts_stat_tests.utils.data import data_airline, data_normal
426 >>> normal = data_normal
427 >>> airline = data_airline.values
429 ```
431 ```pycon {.py .python linenums="1" title="Example 1: Stationary Series"}
432 >>> stat, pvalue, lags, crit = kpss(x=normal)
433 >>> print(f"KPSS statistic: {stat:.4f}")
434 KPSS statistic: 0.0858
435 >>> print(f"p-value: {pvalue:.4f}")
436 p-value: 0.1000
438 ```
440 ```pycon {.py .python linenums="1" title="Example 2: Airline Passengers Data"}
441 >>> stat, pvalue, lags, crit = kpss(x=airline)
442 >>> print(f"p-value: {pvalue:.4f}")
443 p-value: 0.0100
445 ```
447 ??? equation "Calculation"
449 The mathematical equation for the KPSS test for stationarity in time series forecasting is:
451 $$
452 y_t = \mu_t + \epsilon_t
453 $$
455 where:
457 - $y_t$ is the value of the time series at time $t$.
458 - $\mu_t$ is the trend component of the time series.
459 - $\epsilon_t$ is the error term.
461 The KPSS test involves testing the null hypothesis that the time series is trend stationary, which means that the trend component of the time series is stationary over time. If the null hypothesis is rejected, then the time series is non-stationary and requires further pre-processing before it can be used for forecasting.
463 Here are the detailed steps for how to calculate the KPSS test:
465 1. Collect your time series data and plot it to visually check for any trends, seasonal patterns, or other patterns that could make the data non-stationary. If you detect any such patterns, you will need to pre-process your data (e.g., detrending, deseasonalizing, etc.) to remove these effects.
467 1. Divide your time series data into multiple overlapping windows of equal size. The length of each window depends on the length of your time series and the level of detail you want to capture.
469 1. Calculate the trend component $\mu_t$ for each window using a trend estimation method. There are several methods for estimating the trend component, such as the Hodrick-Prescott filter, the Christiano-Fitzgerald filter, or simple linear regression. The choice of method depends on the characteristics of your data and the level of accuracy you want to achieve.
471 1. Calculate the residual series $\epsilon_t$ by subtracting the trend component from the original time series:
473 $$
474 \epsilon_t = y_t - \mu_t
475 $$
477 1. Estimate the variance of the residual series using a suitable estimator, such as the Newey-West estimator or the Bartlett kernel estimator. This step is necessary to correct for any serial correlation in the residual series.
479 1. Calculate the test statistic, which is given by:
481 $$
482 KPSS = T \times \sum_{t=1}^T \frac {S_t^2} {\sigma^2}
483 $$
485 where:
487 - $T$ is the number of observations in the time series.
488 - $S_t$ is the cumulative sum of the residual series up to time $t$, i.e., $S_t = \sum_{i=1}^t \epsilon_i$.
489 - $\sigma^2$ is the estimated variance of the residual series.
491 The test statistic measures the strength of the trend component relative to the residual series. If KPSS is greater than the critical values from the KPSS distribution table, we can reject the null hypothesis and conclude that the time series is non-stationary.
493 1. Finally, interpret the results and draw conclusions about the stationarity of the time series. If the null hypothesis is rejected, then the time series is non-stationary and requires further pre-processing before it can be used for forecasting. If the null hypothesis is not rejected, then the time series is trend stationary and can be used for forecasting.
495 ??? note "Notes"
496 To estimate $\sigma^2$ the Newey-West estimator is used. If `lags` is `"legacy"`, the truncation lag parameter is set to $int(12 \times (\frac{n}{100})^{\frac{1}{4}})$, as outlined in Schwert (1989). The p-values are interpolated from Table 1 of Kwiatkowski et al. (1992). If the computed statistic is outside the table of critical values, then a warning message is generated.
498 Missing values are not handled.
500 ??? success "Credit"
501 - All credit goes to the [`statsmodels`](https://www.statsmodels.org/stable/generated/statsmodels.tsa.stattools.kpss.html) library.
503 ??? question "References"
504 - Andrews, D.W.K. (1991). Heteroskedasticity and autocorrelation consistent covariance matrix estimation. Econometrica, 59: 817-858.
505 - Hobijn, B., Frances, B.H., & Ooms, M. (2004). Generalizations of the KPSS-test for stationarity. Statistica Neerlandica, 52: 483-502.
506 - Kwiatkowski, D., Phillips, P.C.B., Schmidt, P., & Shin, Y. (1992). Testing the null hypothesis of stationarity against the alternative of a unit root. Journal of Econometrics, 54: 159-178.
507 - Newey, W.K., & West, K.D. (1994). Automatic lag selection in covariance matrix estimation. Review of Economic Studies, 61: 631-653.
508 - Schwert, G. W. (1989). Tests for unit roots: A Monte Carlo investigation. Journal of Business and Economic Statistics, 7 (2): 147-159.
510 ??? tip "See Also"
511 - [`statsmodels.tsa.stattools.adfuller`](https://www.statsmodels.org/stable/generated/statsmodels.tsa.stattools.adfuller.html): Augmented Dickey-Fuller unit root test.
512 - [`statsmodels.tsa.stattools.kpss`](https://www.statsmodels.org/stable/generated/statsmodels.tsa.stattools.kpss.html): Kwiatkowski-Phillips-Schmidt-Shin stationarity test.
513 - [`statsmodels.tsa.stattools.range_unit_root_test`](https://www.statsmodels.org/stable/generated/statsmodels.tsa.stattools.range_unit_root_test.html): Range Unit-Root test.
514 - [`statsmodels.tsa.stattools.zivot_andrews`](https://www.statsmodels.org/stable/generated/statsmodels.tsa.stattools.zivot_andrews.html): Zivot-Andrews structural break test.
515 - [`pmdarima.arima.PPTest`](https://alkaline-ml.com/pmdarima/modules/generated/pmdarima.arima.PPTest.html): Phillips-Perron unit root test.
516 - [`arch.unitroot.DFGLS`](https://arch.readthedocs.io/en/latest/unitroot/generated/arch.unitroot.DFGLS.html): Elliot, Rothenberg and Stock's GLS-detrended Dickey-Fuller.
517 - [`arch.unitroot.VarianceRatio`](https://arch.readthedocs.io/en/latest/unitroot/generated/arch.unitroot.VarianceRatio.html): Variance Ratio test of a random walk.
518 - [`ts_stat_tests.stationarity.algorithms.adf`][ts_stat_tests.stationarity.algorithms.adf]: Augmented Dickey-Fuller unit root test.
519 - [`ts_stat_tests.stationarity.algorithms.kpss`][ts_stat_tests.stationarity.algorithms.kpss]: Kwiatkowski-Phillips-Schmidt-Shin stationarity test.
520 - [`ts_stat_tests.stationarity.algorithms.rur`][ts_stat_tests.stationarity.algorithms.rur]: Range Unit-Root test of stationarity.
521 - [`ts_stat_tests.stationarity.algorithms.za`][ts_stat_tests.stationarity.algorithms.za]: Zivot-Andrews structural break unit root test.
522 - [`ts_stat_tests.stationarity.algorithms.pp`][ts_stat_tests.stationarity.algorithms.pp]: Phillips-Perron unit root test.
523 - [`ts_stat_tests.stationarity.algorithms.ers`][ts_stat_tests.stationarity.algorithms.ers]: Elliot, Rothenberg and Stock's GLS-detrended Dickey-Fuller test.
524 - [`ts_stat_tests.stationarity.algorithms.vr`][ts_stat_tests.stationarity.algorithms.vr]: Variance Ratio test of a random walk.
525 """
526 _nlags: Union[VALID_KPSS_NLAGS_OPTIONS, int] = nlags if nlags is not None else "auto"
527 return _kpss(x=x, regression=regression, nlags=_nlags, store=store)
530@overload
531def rur(x: ArrayLike, *, store: Literal[True]) -> tuple[float, float, dict, ResultsStore]: ...
532@overload
533def rur(x: ArrayLike, *, store: Literal[False] = False) -> tuple[float, float, dict]: ...
534@typechecked
535def rur(x: ArrayLike, *, store: bool = False) -> Union[
536 tuple[float, float, dict, ResultsStore],
537 tuple[float, float, dict],
538]:
539 r"""
540 !!! note "Summary"
541 Computes the Range Unit-Root (RUR) test for the null hypothesis that x is stationary.
543 ???+ abstract "Details"
545 The Range Unit-Root (RUR) test is a statistical test used to determine whether a time series is stationary or not. It is based on the range of the time series and does not require any knowledge of the underlying stochastic process.
547 The RUR test involves dividing the time series into non-overlapping windows of a fixed size and calculating the range of each window. Then, the range of the entire time series is calculated. If the time series is stationary, the range of the entire time series should be proportional to the square root of the window size. If the time series is non-stationary, the range of the entire time series will grow with the window size.
549 The null hypothesis of the RUR test is that the time series is non-stationary (unit root). The alternative hypothesis is that the time series is stationary. If the test statistic is less than a critical value at a given significance level, typically 0.05, then we reject the null hypothesis and conclude that the time series is stationary. If the test statistic is greater than the critical value, then we fail to reject the null hypothesis and conclude that the time series is non-stationary.
551 In practical terms, if a time series is found to be non-stationary by the RUR test, one can apply differencing to the time series until it becomes stationary. This involves taking the difference between consecutive observations and potentially repeating this process until the time series is stationary.
553 The RUR test is a simple and computationally efficient test for stationarity, but it may not be as powerful as other unit root tests in detecting non-stationarity in some cases. It is important to use multiple tests to determine the stationarity of a time series, as no single test is perfect in all situations.
555 Params:
556 x (ArrayLike):
557 The data series to test.
558 store (bool, optional):
559 If `True`, then a result instance is returned additionally to the RUR statistic.<br>
560 Defaults to `False`.
562 Returns:
563 (Union[tuple[float, float, dict, ResultsStore], tuple[float, float, dict]]):
564 Returns a tuple containing:
565 - `stat` (float): The RUR test statistic.
566 - `pvalue` (float): The p-value of the test.
567 - `crit` (dict): The critical values at 10%, 5%, 2.5%, and 1%.
568 - `resstore` (Optional[ResultsStore]): Result instance (if `store` is `True`).
570 ???+ example "Examples"
572 ```pycon {.py .python linenums="1" title="Setup"}
573 >>> from ts_stat_tests.utils.data import data_airline, data_normal, data_trend, data_sine
574 >>> from ts_stat_tests.stationarity.algorithms import rur
575 >>> normal = data_normal
576 >>> trend = data_trend
577 >>> seasonal = data_sine
578 >>> airline = data_airline.values
580 ```
582 ```pycon {.py .python linenums="1" title="Example 1: Stationary Series"}
583 >>> stat, pvalue, crit = rur(x=normal)
584 >>> print(f"RUR statistic: {stat:.4f}")
585 RUR statistic: 0.3479
586 >>> print(f"p-value: {pvalue:.4f}")
587 p-value: 0.0100
589 ```
591 ```pycon {.py .python linenums="1" title="Example 2: Trend-Stationary Series"}
592 >>> stat, pvalue, crit = rur(x=trend)
593 >>> print(f"RUR statistic: {stat:.4f}")
594 RUR statistic: 31.5912
595 >>> print(f"p-value: {pvalue:.4f}")
596 p-value: 0.9500
598 ```
600 ```pycon {.py .python linenums="1" title="Example 3: Seasonal Series"}
601 >>> stat, pvalue, crit = rur(x=seasonal)
602 >>> print(f"RUR statistic: {stat:.4f}")
603 RUR statistic: 0.9129
604 >>> print(f"p-value: {pvalue:.04f}")
605 p-value: 0.0100
607 ```
609 ```pycon {.py .python linenums="1" title="Example 4: Real-World Time Series"}
610 >>> stat, pvalue, crit = rur(x=airline)
611 >>> print(f"RUR statistic: {stat:.4f}")
612 RUR statistic: 2.3333
613 >>> print(f"p-value: {pvalue:.4f}")
614 p-value: 0.9000
616 ```
618 ??? equation "Calculation"
620 The mathematical equation for the RUR test is:
622 $$
623 y_t = \rho y_{t-1} + \epsilon_t
624 $$
626 where:
628 - $y_t$ is the value of the time series at time $t$.
629 - $\rho$ is the parameter of the unit root process.
630 - $y_{t-1}$ is the value of the time series at time $t-1$.
631 - $\epsilon_t$ is a stationary error term with mean zero and constant variance.
633 The null hypothesis of the RUR test is that the time series is stationary, and the alternative hypothesis is that the time series is non-stationary with a unit root.
635 Here are the detailed steps for how to calculate the RUR test:
637 1. Collect your time series data and plot it to visually check for any trends, seasonal patterns, or other patterns that could make the data non-stationary. If you detect any such patterns, you will need to pre-process your data (e.g., detrending, deseasonalizing, etc.) to remove these effects.
639 1. Estimate the parameter $\rho$ using the ordinary least squares method. This involves regressing $y_t$ on $y_{t-1}$. The estimated equation is:
641 $$
642 y_t = \alpha + \rho y_{t-1} + \epsilon_t
643 $$
645 where:
647 - $\alpha$ is the intercept.
648 - $\epsilon_t$ is the error term.
650 1. Calculate the range of the time series, which is the difference between the maximum and minimum values of the time series:
652 $$
653 R = \max(y_t) - \min(y_t)
654 $$
656 1. Calculate the expected range of the time series under the null hypothesis of stationarity, which is given by:
658 $$
659 E(R) = \frac {T - 1} {2 \sqrt{T}}
660 $$
662 where:
664 - $T$ is the sample size.
666 1. Calculate the test statistic, which is given by:
668 $$
669 RUR = \frac {R - E(R)} {E(R)}
670 $$
672 1. Compare the test statistic to the critical values in the RUR distribution table to determine the level of significance. The critical values depend on the sample size and the level of significance.
674 1. Finally, interpret the results and draw conclusions about the stationarity of the time series. If the null hypothesis is rejected, then the time series is non-stationary with a unit root. If the null hypothesis is not rejected, then the time series is stationary.
676 In practice, the RUR test is often conducted using software packages such as R, Python, or MATLAB, which automate the estimation of parameters and calculation of the test statistic.
678 ??? note "Notes"
679 The p-values are interpolated from Table 1 of Aparicio et al. (2006). If the computed statistic is outside the table of critical values, then a warning message is generated.
681 Missing values are not handled.
683 !!! success "Credit"
684 - All credit goes to the [`statsmodels`](https://www.statsmodels.org/stable/generated/statsmodels.tsa.stattools.range_unit_root_test.html) library.
686 ??? question "References"
687 - Aparicio, F., Escribano A., Sipols, A.E. (2006). Range Unit-Root (RUR) tests: robust against nonlinearities, error distributions, structural breaks and outliers. Journal of Time Series Analysis, 27 (4): 545-576.
689 ??? tip "See Also"
690 - [`statsmodels.tsa.stattools.adfuller`](https://www.statsmodels.org/stable/generated/statsmodels.tsa.stattools.adfuller.html): Augmented Dickey-Fuller unit root test.
691 - [`statsmodels.tsa.stattools.kpss`](https://www.statsmodels.org/stable/generated/statsmodels.tsa.stattools.kpss.html): Kwiatkowski-Phillips-Schmidt-Shin stationarity test.
692 - [`statsmodels.tsa.stattools.range_unit_root_test`](https://www.statsmodels.org/stable/generated/statsmodels.tsa.stattools.range_unit_root_test.html): Range Unit-Root test.
693 - [`statsmodels.tsa.stattools.zivot_andrews`](https://www.statsmodels.org/stable/generated/statsmodels.tsa.stattools.zivot_andrews.html): Zivot-Andrews structural break test.
694 - [`pmdarima.arima.PPTest`](https://alkaline-ml.com/pmdarima/modules/generated/pmdarima.arima.PPTest.html): Phillips-Perron unit root test.
695 - [`arch.unitroot.DFGLS`](https://arch.readthedocs.io/en/latest/unitroot/generated/arch.unitroot.DFGLS.html): Elliot, Rothenberg and Stock's GLS-detrended Dickey-Fuller.
696 - [`arch.unitroot.VarianceRatio`](https://arch.readthedocs.io/en/latest/unitroot/generated/arch.unitroot.VarianceRatio.html): Variance Ratio test of a random walk.
697 - [`ts_stat_tests.stationarity.algorithms.adf`][ts_stat_tests.stationarity.algorithms.adf]: Augmented Dickey-Fuller unit root test.
698 - [`ts_stat_tests.stationarity.algorithms.kpss`][ts_stat_tests.stationarity.algorithms.kpss]: Kwiatkowski-Phillips-Schmidt-Shin stationarity test.
699 - [`ts_stat_tests.stationarity.algorithms.rur`][ts_stat_tests.stationarity.algorithms.rur]: Range Unit-Root test of stationarity.
700 - [`ts_stat_tests.stationarity.algorithms.za`][ts_stat_tests.stationarity.algorithms.za]: Zivot-Andrews structural break unit root test.
701 - [`ts_stat_tests.stationarity.algorithms.pp`][ts_stat_tests.stationarity.algorithms.pp]: Phillips-Perron unit root test.
702 - [`ts_stat_tests.stationarity.algorithms.ers`][ts_stat_tests.stationarity.algorithms.ers]: Elliot, Rothenberg and Stock's GLS-detrended Dickey-Fuller test.
703 - [`ts_stat_tests.stationarity.algorithms.vr`][ts_stat_tests.stationarity.algorithms.vr]: Variance Ratio test of a random walk.
704 """
705 return _rur(x=x, store=store)
708@typechecked
709def za(
710 x: ArrayLike,
711 trim: float = 0.15,
712 maxlag: Optional[int] = None,
713 regression: VALID_ZA_REGRESSION_OPTIONS = "c",
714 autolag: Optional[VALID_ZA_AUTOLAG_OPTIONS] = "AIC",
715) -> tuple[float, float, dict, int, int]:
716 r"""
717 !!! note "Summary"
718 The Zivot-Andrews (ZA) test tests for a unit root in a univariate process in the presence of serial correlation and a single structural break.
720 ???+ abstract "Details"
721 The Zivot-Andrews (ZA) test is a statistical test used to determine whether a time series is stationary or not in the presence of structural breaks. Structural breaks refer to significant changes in the underlying stochastic process of the time series, which can cause non-stationarity.
723 The ZA test involves running a regression of the time series on a constant and a linear time trend, and testing whether the residuals of the regression are stationary or not. The null hypothesis of the test is that the time series is stationary with a single break point, while the alternative hypothesis is that the time series is non-stationary with a single break point.
725 The test statistic is calculated by first estimating the break point using a likelihood ratio test. Then, the test statistic is calculated based on the estimated break point and the residuals of the regression. If the test statistic is greater than a critical value at a given significance level, typically 0.05, then we reject the null hypothesis and conclude that the time series is non-stationary with a structural break. If the test statistic is less than the critical value, then we fail to reject the null hypothesis and conclude that the time series is stationary with a structural break.
727 In practical terms, if a time series is found to be non-stationary with a structural break by the ZA test, one can apply methods to account for the structural break, such as including dummy variables in the regression or using time series models that allow for structural breaks.
729 Overall, the ZA test is a useful tool in time series analysis and forecasting when there is a suspicion of structural breaks in the data. However, it is important to note that the test may not detect multiple break points or breaks that are not well-separated in time.
731 Params:
732 x (ArrayLike):
733 The data series to test.
734 trim (float):
735 The percentage of series at begin/end to exclude.
736 Default: `0.15`
737 maxlag (Optional[int]):
738 The maximum lag which is included in test.
739 Default: `None`
740 regression (VALID_ZA_REGRESSION_OPTIONS):
741 Constant and trend order to include in regression.
743 - `"c"`: constant only (default).
744 - `"t"`: trend only.
745 - `"ct"`: constant and trend.
747 Default: `"c"`
748 autolag (Optional[VALID_ZA_AUTOLAG_OPTIONS]):
749 The method to select the lag length.
751 - If `None`, then `maxlag` lags are used.
752 - If `"AIC"` (default) or `"BIC"`, then the number of lags is chosen.
754 Default: `"AIC"`
756 Returns:
757 (tuple[float, float, dict, int, int]):
758 Returns a tuple containing:
759 - `zastat` (float): The test statistic.
760 - `pvalue` (float): The p-value.
761 - `cvdict` (dict): Critical values at the $1\%$, $5\%$, and $10\%$ levels.
762 - `baselag` (int): Lags used for period regressions.
763 - `pbidx` (int): Break period index.
765 ???+ example "Examples"
767 ```pycon {.py .python linenums="1" title="Setup"}
768 >>> from ts_stat_tests.utils.data import data_airline, data_normal, data_noise
769 >>> from ts_stat_tests.stationarity.algorithms import za
770 >>> normal = data_normal
771 >>> noise = data_noise
772 >>> airline = data_airline.values
774 ```
776 ```pycon {.py .python linenums="1" title="Example 1: Stationary Series"}
777 >>> stat, pvalue, crit, lags, break_idx = za(x=normal)
778 >>> print(f"ZA statistic: {stat:.4f}")
779 ZA statistic: -30.8800
780 >>> print(f"p-value: {pvalue:.4e}")
781 p-value: 1.0000e-05
783 ```
785 ```pycon {.py .python linenums="1" title="Example 2: Noisy Series"}
786 >>> stat, pvalue, crit, lags, break_idx = za(x=noise)
787 >>> print(f"ZA statistic: {stat:.4f}")
788 ZA statistic: -32.4316
789 >>> print(f"p-value: {pvalue:.4e}")
790 p-value: 1.0000e-05
792 ```
794 ```pycon {.py .python linenums="1" title="Example 3: Real-World Time Series"}
795 >>> stat, pvalue, crit, lags, break_idx = za(x=airline)
796 >>> print(f"ZA statistic: {stat:.4f}")
797 ZA statistic: -3.6508
798 >>> print(f"p-value: {pvalue:.4f}")
799 p-value: 0.5808
801 ```
803 ??? equation "Calculation"
805 The mathematical equation for the Zivot-Andrews test is:
807 $$
808 y_t = \alpha + \beta t + \gamma y_{t-1} + \delta_1 D_t + \delta_2 t D_t + \epsilon_t
809 $$
811 where:
813 - $y_t$ is the value of the time series at time $t$.
814 - $\alpha$ is the intercept.
815 - $\beta$ is the slope coefficient of the time trend.
816 - $\gamma$ is the coefficient of the lagged dependent variable.
817 - $D_t$ is a dummy variable that takes a value of 1 after the suspected structural break point, and 0 otherwise.
818 - $\delta_1$ and $\delta_2$ are the coefficients of the dummy variable and the interaction term of the dummy variable and time trend, respectively.
819 - $\epsilon_t$ is a stationary error term with mean zero and constant variance.
821 The null hypothesis of the Zivot-Andrews test is that the time series is non-stationary, and the alternative hypothesis is that the time series is stationary with a single structural break.
823 Here are the detailed steps for how to calculate the Zivot-Andrews test:
825 1. Collect your time series data and plot it to visually check for any trends, seasonal patterns, or other patterns that could make the data non-stationary. If you detect any such patterns, you will need to pre-process your data (e.g., detrending, deseasonalizing, etc.) to remove these effects.
827 1. Estimate the parameters of the model using the least squares method. This involves regressing $y_t$ on $t$, $y_{t-1}$, $D_t$, and $t D_t$. The estimated equation is:
829 $$
830 y_t = \alpha + \beta t + \gamma y_{t-1} + \delta_1 D_t + \delta_2 t D_t + \epsilon_t
831 $$
833 1. Perform a unit root test on the residuals to check for stationarity. The most commonly used unit root tests for this purpose are the Augmented Dickey-Fuller (ADF) test and the Phillips-Perron (PP) test.
835 1. Calculate the test statistic, which is based on the largest root of the following equation:
837 $$
838 \Delta y_t = \alpha + \beta t + \gamma y_{t-1} + \delta_1 D_t + \delta_2 t D_t + \epsilon_t
839 $$
841 where:
843 - $\Delta$ is the first difference operator.
845 1. Determine the critical values of the test statistic from the Zivot-Andrews distribution table. The critical values depend on the sample size, the level of significance, and the number of lagged dependent variables in the model.
847 1. Finally, interpret the results and draw conclusions about the stationarity of the time series. If the null hypothesis is rejected, then the time series is stationary with a structural break. If the null hypothesis is not rejected, then the time series is non-stationary and may require further processing to make it stationary.
849 In practice, the Zivot-Andrews test is often conducted using software packages such as R, Python, or MATLAB, which automate the estimation of parameters and calculation of the test statistic.
851 ??? note "Notes"
852 H0 = unit root with a single structural break
854 Algorithm follows Baum (2004/2015) approximation to original Zivot-Andrews method. Rather than performing an autolag regression at each candidate break period (as per the original paper), a single autolag regression is run up-front on the base model (constant + trend with no dummies) to determine the best lag length. This lag length is then used for all subsequent break-period regressions. This results in significant run time reduction but also slightly more pessimistic test statistics than the original Zivot-Andrews method, although no attempt has been made to characterize the size/power trade-off.
856 ??? success "Credit"
857 - All credit goes to the [`statsmodels`](https://www.statsmodels.org/stable/generated/statsmodels.tsa.stattools.zivot_andrews.html) library.
859 ??? question "References"
860 - Baum, C.F. (2004). ZANDREWS: Stata module to calculate Zivot-Andrews unit root test in presence of structural break," Statistical Software Components S437301, Boston College Department of Economics, revised 2015.
861 - Schwert, G.W. (1989). Tests for unit roots: A Monte Carlo investigation. Journal of Business & Economic Statistics, 7: 147-159.
862 - Zivot, E., and Andrews, D.W.K. (1992). Further evidence on the great crash, the oil-price shock, and the unit-root hypothesis. Journal of Business & Economic Studies, 10: 251-270.
864 ??? tip "See Also"
865 - [`statsmodels.tsa.stattools.adfuller`](https://www.statsmodels.org/stable/generated/statsmodels.tsa.stattools.adfuller.html): Augmented Dickey-Fuller unit root test.
866 - [`statsmodels.tsa.stattools.kpss`](https://www.statsmodels.org/stable/generated/statsmodels.tsa.stattools.kpss.html): Kwiatkowski-Phillips-Schmidt-Shin stationarity test.
867 - [`statsmodels.tsa.stattools.range_unit_root_test`](https://www.statsmodels.org/stable/generated/statsmodels.tsa.stattools.range_unit_root_test.html): Range Unit-Root test.
868 - [`statsmodels.tsa.stattools.zivot_andrews`](https://www.statsmodels.org/stable/generated/statsmodels.tsa.stattools.zivot_andrews.html): Zivot-Andrews structural break test.
869 - [`pmdarima.arima.PPTest`](https://alkaline-ml.com/pmdarima/modules/generated/pmdarima.arima.PPTest.html): Phillips-Perron unit root test.
870 - [`arch.unitroot.DFGLS`](https://arch.readthedocs.io/en/latest/unitroot/generated/arch.unitroot.DFGLS.html): Elliot, Rothenberg and Stock's GLS-detrended Dickey-Fuller.
871 - [`arch.unitroot.VarianceRatio`](https://arch.readthedocs.io/en/latest/unitroot/generated/arch.unitroot.VarianceRatio.html): Variance Ratio test of a random walk.
872 - [`ts_stat_tests.stationarity.algorithms.adf`][ts_stat_tests.stationarity.algorithms.adf]: Augmented Dickey-Fuller unit root test.
873 - [`ts_stat_tests.stationarity.algorithms.kpss`][ts_stat_tests.stationarity.algorithms.kpss]: Kwiatkowski-Phillips-Schmidt-Shin stationarity test.
874 - [`ts_stat_tests.stationarity.algorithms.rur`][ts_stat_tests.stationarity.algorithms.rur]: Range Unit-Root test of stationarity.
875 - [`ts_stat_tests.stationarity.algorithms.za`][ts_stat_tests.stationarity.algorithms.za]: Zivot-Andrews structural break unit root test.
876 - [`ts_stat_tests.stationarity.algorithms.pp`][ts_stat_tests.stationarity.algorithms.pp]: Phillips-Perron unit root test.
877 - [`ts_stat_tests.stationarity.algorithms.ers`][ts_stat_tests.stationarity.algorithms.ers]: Elliot, Rothenberg and Stock's GLS-detrended Dickey-Fuller test.
878 - [`ts_stat_tests.stationarity.algorithms.vr`][ts_stat_tests.stationarity.algorithms.vr]: Variance Ratio test of a random walk.
879 """
880 res: Any = _za(
881 x=x,
882 trim=trim,
883 maxlag=maxlag,
884 regression=regression,
885 autolag=autolag, # type: ignore[arg-type] # statsmodels stubs are often missing None
886 )
887 return (
888 float(res[0]),
889 float(res[1]),
890 dict(res[2]),
891 int(res[3]),
892 int(res[4]),
893 )
896@typechecked
897def pp(
898 x: ArrayLike,
899 lags: Optional[int] = None,
900 trend: VALID_PP_TREND_OPTIONS = "c",
901 test_type: VALID_PP_TEST_TYPE_OPTIONS = "tau",
902) -> tuple[float, float, int, dict]:
903 r"""
904 !!! note "Summary"
905 Conduct a Phillips-Perron (PP) test for stationarity.
907 In statistics, the Phillips-Perron test (named after Peter C. B. Phillips and Pierre Perron) is a unit root test. It is used in time series analysis to test the null hypothesis that a time series is integrated of order $1$. It builds on the Dickey-Fuller test of the null hypothesis $p=0$.
909 ???+ abstract "Details"
911 The Phillips-Perron (PP) test is a statistical test used to determine whether a time series is stationary or not. It is similar to the Augmented Dickey-Fuller (ADF) test, but it has some advantages, especially in the presence of autocorrelation and heteroscedasticity.
913 The PP test involves regressing the time series on a constant and a linear time trend, and testing whether the residuals of the regression are stationary or not. The null hypothesis of the test is that the time series is non-stationary, while the alternative hypothesis is that the time series is stationary.
915 The test statistic is calculated by taking the sum of the squared residuals of the regression, which is adjusted for autocorrelation and heteroscedasticity. The PP test also accounts for the bias in the standard errors of the test statistic, which can lead to incorrect inference in small samples.
917 If the test statistic is less than a critical value at a given significance level, typically 0.05, then we reject the null hypothesis and conclude that the time series is stationary. If the test statistic is greater than the critical value, then we fail to reject the null hypothesis and conclude that the time series is non-stationary.
919 In practical terms, if a time series is found to be non-stationary by the PP test, one can apply differencing to the time series until it becomes stationary. This involves taking the difference between consecutive observations and potentially repeating this process until the time series is stationary.
921 Overall, the PP test is a powerful and robust test for stationarity, and it is widely used in time series analysis and forecasting. However, it is important to use multiple tests and diagnostic tools to determine the stationarity of a time series, as no single test is perfect in all situations.
923 Params:
924 x (ArrayLike):
925 The data series to test.
926 lags (Optional[int], optional):
927 The number of lags to use in the Newey-West estimator of the variance. If omitted or `None`, the lag length is selected automatically.<br>
928 Defaults to `None`.
929 trend (VALID_PP_TREND_OPTIONS, optional):
930 The trend component to include in the test.
932 - `"n"`: No constant, no trend.
933 - `"c"`: Include a constant (default).
934 - `"ct"`: Include a constant and linear time trend.
936 Defaults to `"c"`.
937 test_type (VALID_PP_TEST_TYPE_OPTIONS, optional):
938 The type of test statistic to compute:
940 - `"tau"`: The t-statistic based on the augmented regression (default).
941 - `"rho"`: The normalized autocorrelation coefficient (also known as the $Z(\\alpha)$ test).
943 Defaults to `"tau"`.
945 Returns:
946 (tuple[float, float, int, dict]):
947 Returns a tuple containing:
948 - `stat` (float): The test statistic.
949 - `pvalue` (float): The p-value for the test statistic.
950 - `lags` (int): The number of lags used in the test.
951 - `crit` (dict): The critical values at 1%, 5%, and 10%.
953 ???+ example "Examples"
955 ```pycon {.py .python linenums="1" title="Setup"}
956 >>> from ts_stat_tests.stationarity.algorithms import pp
957 >>> from ts_stat_tests.utils.data import data_airline, data_normal, data_trend, data_sine
958 >>> normal = data_normal
959 >>> trend = data_trend
960 >>> seasonal = data_sine
961 >>> airline = data_airline.values
963 ```
965 ```pycon {.py .python linenums="1" title="Example 1: Stationary Series"}
966 >>> stat, pvalue, lags, crit = pp(x=normal)
967 >>> print(f"PP statistic: {stat:.4f}")
968 PP statistic: -30.7758
969 >>> print(f"p-value: {pvalue:.4f}")
970 p-value: 0.0000
972 ```
974 ```pycon {.py .python linenums="1" title="Example 2: Trend-Stationary Series"}
975 >>> stat, pvalue, lags, crit = pp(x=trend, trend="ct")
976 >>> print(f"p-value: {pvalue:.4f}")
977 p-value: 0.0000
979 ```
981 ```pycon {.py .python linenums="1" title="Example 3: Seasonal Series"}
982 >>> stat, pvalue, lags, crit = pp(x=seasonal)
983 >>> print(f"PP statistic: {stat:.4f}")
984 PP statistic: -8.0571
985 >>> print(f"p-value: {pvalue:.4f}")
986 p-value: 0.0000
988 ```
990 ```pycon {.py .python linenums="1" title="Example 4: Real-World Time Series"}
991 >>> stat, pvalue, lags, crit = pp(x=airline)
992 >>> print(f"PP statistic: {stat:.4f}")
993 PP statistic: -1.3511
994 >>> print(f"p-value: {pvalue:.4f}")
995 p-value: 0.6055
997 ```
999 ```pycon {.py .python linenums="1" title="Example 5: PP test with excessive lags (coverage check)"}
1000 >>> from ts_stat_tests.stationarity.algorithms import pp
1001 >>> from ts_stat_tests.utils.data import data_normal
1002 >>> # data_normal has 1000 observations. Force lags = 1000 to trigger adjustment.
1003 >>> res = pp(data_normal, lags=1000)
1004 >>> print(f"stat: {res[0]:.4f}, lags: {res[2]}")
1005 stat: -43.6895, lags: 998
1007 ```
1009 ??? equation "Calculation"
1011 The Phillips-Perron (PP) test is a commonly used test for stationarity in time series forecasting. The mathematical equation for the PP test is:
1013 $$
1014 y_t = \delta + \pi t + \rho y_{t-1} + \epsilon_t
1015 $$
1017 where:
1019 - $y_t$ is the value of the time series at time $t$.
1020 - $\delta$ is a constant term.
1021 - $\pi$ is a coefficient that captures the trend in the data.
1022 - $\rho$ is a coefficient that captures the autocorrelation in the data.
1023 - $y_{t-1}$ is the lagged value of the time series at time $t-1$.
1024 - $\epsilon_t$ is a stationary error term with mean zero and constant variance.
1026 The PP test is based on the idea that if the time series is stationary, then the coefficient $\rho$ should be equal to zero. Therefore, the null hypothesis of the PP test is that the time series is stationary, and the alternative hypothesis is that the time series is non-stationary with a non-zero value of $\rho$.
1028 Here are the detailed steps for how to calculate the PP test:
1030 1. Collect your time series data and plot it to visually check for any trends, seasonal patterns, or other patterns that could make the data non-stationary. If you detect any such patterns, you will need to pre-process your data (e.g., detrending, deseasonalizing, etc.) to remove these effects.
1032 1. Estimate the regression model by regressing $y_t$ on a constant, a linear trend, and the lagged value of $y_{t-1}$. The regression equation is:
1034 $$
1035 y_t = \delta + \pi t + \rho y_{t-1} + \epsilon_t
1036 $$
1038 1. Calculate the test statistic, which is based on the following equation:
1040 $$
1041 z = \left( T^{-\frac{1}{2}} \right) \times \left( \sum_{t=1}^T \left( y_t - \delta - \pi t - \rho y_{t-1} \right) - \left( \frac{1}{T} \right) \times \sum_{t=1}^T \sum_{s=1}^T K \left( \frac{s-t}{h} \right) (y_s - \delta - \pi s - \rho y_{s-1}) \right)
1042 $$
1044 where:
1046 - $T$ is the sample size.
1047 - $K(\dots)$ is the kernel function, which determines the weight of each observation in the smoothed series. The choice of the kernel function depends on the degree of serial correlation in the data. Typically, a Gaussian kernel or a Bartlett kernel is used.
1048 - $h$ is the bandwidth parameter, which controls the degree of smoothing of the series. The optimal value of $h$ depends on the sample size and the noise level of the data.
1050 1. Determine the critical values of the test statistic from the PP distribution table. The critical values depend on the sample size and the level of significance.
1052 1. Finally, interpret the results and draw conclusions about the stationarity of the time series. If the null hypothesis is rejected, then the time series is non-stationary with a non-zero value of $\rho$. If the null hypothesis is not rejected, then the time series is stationary.
1054 In practice, the PP test is often conducted using software packages such as R, Python, or MATLAB, which automate the estimation of the regression model and calculation of the test statistic.
1056 ??? note "Notes"
1057 This test is generally used indirectly via the [`pmdarima.arima.ndiffs()`](https://alkaline-ml.com/pmdarima/modules/generated/pmdarima.arima.ndiffs.html) function, which computes the differencing term, `d`.
1059 The R code allows for two types of tests: `'Z(alpha)'` and `'Z(t_alpha)'`. Since sklearn does not allow extraction of std errors from the linear model fit, `t_alpha` is much more difficult to achieve, so we do not allow that variant.
1061 !!! success "Credit"
1062 - All credit goes to the [`arch`](https://arch.readthedocs.io/en/latest/unitroot/generated/arch.unitroot.PhillipsPerron.html) library.
1064 ??? question "References"
1065 - Phillips, P. C. B.; Perron, P. (1988). Testing for a Unit Root in Time Series Regression. Biometrika. 75 (2): 335-346.
1067 ??? tip "See Also"
1068 - [`statsmodels.tsa.stattools.adfuller`](https://www.statsmodels.org/stable/generated/statsmodels.tsa.stattools.adfuller.html): Augmented Dickey-Fuller unit root test.
1069 - [`statsmodels.tsa.stattools.kpss`](https://www.statsmodels.org/stable/generated/statsmodels.tsa.stattools.kpss.html): Kwiatkowski-Phillips-Schmidt-Shin stationarity test.
1070 - [`statsmodels.tsa.stattools.range_unit_root_test`](https://www.statsmodels.org/stable/generated/statsmodels.tsa.stattools.range_unit_root_test.html): Range Unit-Root test.
1071 - [`statsmodels.tsa.stattools.zivot_andrews`](https://www.statsmodels.org/stable/generated/statsmodels.tsa.stattools.zivot_andrews.html): Zivot-Andrews structural break test.
1072 - [`pmdarima.arima.PPTest`](https://alkaline-ml.com/pmdarima/modules/generated/pmdarima.arima.PPTest.html): Phillips-Perron unit root test.
1073 - [`arch.unitroot.DFGLS`](https://arch.readthedocs.io/en/latest/unitroot/generated/arch.unitroot.DFGLS.html): Elliot, Rothenberg and Stock's GLS-detrended Dickey-Fuller.
1074 - [`arch.unitroot.VarianceRatio`](https://arch.readthedocs.io/en/latest/unitroot/generated/arch.unitroot.VarianceRatio.html): Variance Ratio test of a random walk.
1075 - [`ts_stat_tests.stationarity.algorithms.adf`][ts_stat_tests.stationarity.algorithms.adf]: Augmented Dickey-Fuller unit root test.
1076 - [`ts_stat_tests.stationarity.algorithms.kpss`][ts_stat_tests.stationarity.algorithms.kpss]: Kwiatkowski-Phillips-Schmidt-Shin stationarity test.
1077 - [`ts_stat_tests.stationarity.algorithms.rur`][ts_stat_tests.stationarity.algorithms.rur]: Range Unit-Root test of stationarity.
1078 - [`ts_stat_tests.stationarity.algorithms.za`][ts_stat_tests.stationarity.algorithms.za]: Zivot-Andrews structural break unit root test.
1079 - [`ts_stat_tests.stationarity.algorithms.pp`][ts_stat_tests.stationarity.algorithms.pp]: Phillips-Perron unit root test.
1080 - [`ts_stat_tests.stationarity.algorithms.ers`][ts_stat_tests.stationarity.algorithms.ers]: Elliot, Rothenberg and Stock's GLS-detrended Dickey-Fuller test.
1081 - [`ts_stat_tests.stationarity.algorithms.vr`][ts_stat_tests.stationarity.algorithms.vr]: Variance Ratio test of a random walk.
1082 """
1083 _x = np.asarray(x)
1084 nobs = _x.shape[0]
1085 _lags = lags
1086 if _lags is None:
1087 _lags = int(np.ceil(12.0 * np.power(nobs / 100.0, 1 / 4.0)))
1089 # arch PP test requires lags < nobs-1
1090 if _lags >= nobs - 1:
1091 _lags = max(0, nobs - 2)
1093 res = _pp(y=_x, lags=_lags, trend=trend, test_type=test_type)
1094 return (float(res.stat), float(res.pvalue), int(res.lags), dict(res.critical_values))
1097@typechecked
1098def ers(
1099 y: ArrayLike,
1100 lags: Optional[int] = None,
1101 trend: VALID_ERS_TREND_OPTIONS = "c",
1102 max_lags: Optional[int] = None,
1103 method: VALID_ERS_METHOD_OPTIONS = "aic",
1104 low_memory: Optional[bool] = None,
1105) -> tuple[float, float, int, dict]:
1106 r"""
1107 !!! note "Summary"
1108 Elliott, Rothenberg and Stock's GLS detrended Dickey-Fuller.
1110 ???+ abstract "Details"
1112 The Elliott-Rothenberg-Stock (ERS) test is a statistical test used to determine whether a time series is stationary or not. It is a robust test that is able to handle a wide range of non-stationary processes, including ones with structural breaks, heteroscedasticity, and autocorrelation.
1114 The ERS test involves fitting a local-to-zero regression of the time series on a constant and a linear time trend, using a kernel function to weight the observations. The test statistic is then calculated based on the sum of the squared residuals of the local-to-zero regression, which is adjusted for the bandwidth of the kernel function and for the correlation of the residuals.
1116 If the test statistic is less than a critical value at a given significance level, typically 0.05, then we reject the null hypothesis and conclude that the time series is stationary. If the test statistic is greater than the critical value, then we fail to reject the null hypothesis and conclude that the time series is non-stationary.
1118 In practical terms, if a time series is found to be non-stationary by the ERS test, one can apply differencing to the time series until it becomes stationary. This involves taking the difference between consecutive observations and potentially repeating this process until the time series is stationary.
1120 Overall, the ERS test is a powerful and flexible test for stationarity, and it is widely used in time series analysis and forecasting. However, it is important to use multiple tests and diagnostic tools to determine the stationarity of a time series, as no single test is perfect in all situations.
1122 Params:
1123 y (ArrayLike):
1124 The data to test for a unit root.
1125 lags (Optional[int], optional):
1126 The number of lags to use in the ADF regression. If omitted or `None`, method is used to automatically select the lag length with no more than `max_lags` are included.<br>
1127 Defaults to `None`.
1128 trend (VALID_ERS_TREND_OPTIONS, optional):
1129 The trend component to include in the test
1131 - `"c"`: Include a constant (Default)
1132 - `"ct"`: Include a constant and linear time trend
1134 Defaults to `"c"`.
1135 max_lags (Optional[int], optional):
1136 The maximum number of lags to use when selecting lag length. When using automatic lag length selection, the lag is selected using OLS detrending rather than GLS detrending.<br>
1137 Defaults to `None`.
1138 method (VALID_ERS_METHOD_OPTIONS, optional):
1139 The method to use when selecting the lag length
1141 - `"AIC"`: Select the minimum of the Akaike IC
1142 - `"BIC"`: Select the minimum of the Schwarz/Bayesian IC
1143 - `"t-stat"`: Select the minimum of the Schwarz/Bayesian IC
1145 Defaults to `"aic"`.
1146 low_memory (Optional[bool], optional):
1147 Flag indicating whether to use the low-memory algorithm for lag-length selection.
1148 Defaults to `None`.
1150 Returns:
1151 (tuple[float, float, int, dict]):
1152 Returns a tuple containing:
1153 - `stat` (float): The test statistic for a unit root.
1154 - `pvalue` (float): The p-value for the test statistic.
1155 - `lags` (int): The number of lags used in the test.
1156 - `crit` (dict): The critical values for the test statistic at the 1%, 5%, and 10% levels.
1158 ???+ example "Examples"
1160 ```pycon {.py .python linenums="1" title="Setup"}
1161 >>> from ts_stat_tests.stationarity.algorithms import ers
1162 >>> from ts_stat_tests.utils.data import data_airline, data_normal, data_noise
1163 >>> normal = data_normal
1164 >>> noise = data_noise
1165 >>> airline = data_airline.values
1167 ```
1169 ```pycon {.py .python linenums="1" title="Example 1: Stationary Series"}
1170 >>> stat, pvalue, lags, crit = ers(y=normal)
1171 >>> print(f"ERS statistic: {stat:.4f}")
1172 ERS statistic: -30.1517
1173 >>> print(f"p-value: {pvalue:.4f}")
1174 p-value: 0.0000
1176 ```
1178 ```pycon {.py .python linenums="1" title="Example 2: Noisy Series"}
1179 >>> stat, pvalue, lags, crit = ers(y=noise)
1180 >>> print(f"ERS statistic: {stat:.4f}")
1181 ERS statistic: -12.6897
1182 >>> print(f"p-value: {pvalue:.4e}")
1183 p-value: 1.0956e-21
1185 ```
1187 ```pycon {.py .python linenums="1" title="Example 3: Real-World Time Series"}
1188 >>> stat, pvalue, lags, crit = ers(y=airline)
1189 >>> print(f"ERS statistic: {stat:.4f}")
1190 ERS statistic: 0.9918
1191 >>> print(f"p-value: {pvalue:.4f}")
1192 p-value: 0.9232
1194 ```
1196 ??? equation "Calculation"
1198 The mathematical equation for the ERS test is:
1200 $$
1201 y_t = \mu_t + \epsilon_t
1202 $$
1204 where:
1206 - $y_t$ is the value of the time series at time $t$.
1207 - $\mu_t$ is a time-varying mean function.
1208 - $\epsilon_t$ is a stationary error term with mean zero and constant variance.
1210 The ERS test is based on the idea that if the time series is stationary, then the mean function should be a constant over time. Therefore, the null hypothesis of the ERS test is that the time series is non-stationary (unit root), and the alternative hypothesis is that the time series is stationary.
1212 Here are the detailed steps for how to calculate the ERS test:
1214 1. Collect your time series data and plot it to visually check for any trends, seasonal patterns, or other patterns that could make the data non-stationary. If you detect any such patterns, you will need to pre-process your data (e.g., detrending, deseasonalizing, etc.) to remove these effects.
1216 1. Estimate the time-varying mean function using a local polynomial regression method. The choice of the polynomial degree depends on the complexity of the mean function and the sample size. Typically, a quadratic or cubic polynomial is used. The estimated mean function is denoted as $\mu_t$.
1218 1. Calculate the test statistic, which is based on the following equation:
1220 $$
1221 z = \left( \frac {T-1} {( \frac {1} {12\pi^2 \times \Delta^2} )} \right) ^{\frac{1}{2}} \times \left( \sum_{t=1}^T \frac {(y_t - \mu_t)^2} {T-1} \right)
1222 $$
1224 where:
1226 - $T$ is the sample size
1227 - $\Delta$ is the bandwidth parameter, which controls the degree of smoothing of the mean function. The optimal value of $\Delta$ depends on the sample size and the noise level of the data.
1228 - $\pi$ is the constant pi.
1230 1. Determine the critical values of the test statistic from the ERS distribution table. The critical values depend on the sample size and the level of significance.
1232 1. Finally, interpret the results and draw conclusions about the stationarity of the time series. If the null hypothesis is rejected, then the time series is non-stationary with a time-varying mean function. If the null hypothesis is not rejected, then the time series is stationary.
1234 In practice, the ERS test is often conducted using software packages such as R, Python, or MATLAB, which automate the estimation of the time-varying mean function and calculation of the test statistic.
1236 ??? note "Notes"
1237 The null hypothesis of the Dickey-Fuller GLS is that there is a unit root, with the alternative that there is no unit root. If the p-value is above a critical size, then the null cannot be rejected and the series appears to be a unit root.
1239 DFGLS differs from the ADF test in that an initial GLS detrending step is used before a trend-less ADF regression is run.
1241 Critical values and p-values when trend is `"c"` are identical to the ADF. When trend is set to `"ct"`, they are from Elliott, Rothenberg, and Stock (1996).
1243 !!! success "Credit"
1244 - All credit goes to the [`arch`](https://arch.readthedocs.io/en/latest/unitroot/generated/arch.unitroot.DFGLS.html) library.
1246 ??? question "References"
1247 - Elliott, G. R., T. J. Rothenberg, and J. H. Stock. 1996. Efficient bootstrap for an autoregressive unit root. Econometrica 64: 813-836.
1248 - Perron, P., & Qu, Z. (2007). A simple modification to improve the finite sample properties of Ng and Perron’s unit root tests. Economics letters, 94(1), 12-19.
1250 ??? tip "See Also"
1251 - [`statsmodels.tsa.stattools.adfuller`](https://www.statsmodels.org/stable/generated/statsmodels.tsa.stattools.adfuller.html): Augmented Dickey-Fuller unit root test.
1252 - [`statsmodels.tsa.stattools.kpss`](https://www.statsmodels.org/stable/generated/statsmodels.tsa.stattools.kpss.html): Kwiatkowski-Phillips-Schmidt-Shin stationarity test.
1253 - [`statsmodels.tsa.stattools.range_unit_root_test`](https://www.statsmodels.org/stable/generated/statsmodels.tsa.stattools.range_unit_root_test.html): Range Unit-Root test.
1254 - [`statsmodels.tsa.stattools.zivot_andrews`](https://www.statsmodels.org/stable/generated/statsmodels.tsa.stattools.zivot_andrews.html): Zivot-Andrews structural break test.
1255 - [`pmdarima.arima.PPTest`](https://alkaline-ml.com/pmdarima/modules/generated/pmdarima.arima.PPTest.html): Phillips-Perron unit root test.
1256 - [`arch.unitroot.DFGLS`](https://arch.readthedocs.io/en/latest/unitroot/generated/arch.unitroot.DFGLS.html): Elliot, Rothenberg and Stock's GLS-detrended Dickey-Fuller.
1257 - [`arch.unitroot.VarianceRatio`](https://arch.readthedocs.io/en/latest/unitroot/generated/arch.unitroot.VarianceRatio.html): Variance Ratio test of a random walk.
1258 - [`ts_stat_tests.stationarity.algorithms.adf`][ts_stat_tests.stationarity.algorithms.adf]: Augmented Dickey-Fuller unit root test.
1259 - [`ts_stat_tests.stationarity.algorithms.kpss`][ts_stat_tests.stationarity.algorithms.kpss]: Kwiatkowski-Phillips-Schmidt-Shin stationarity test.
1260 - [`ts_stat_tests.stationarity.algorithms.rur`][ts_stat_tests.stationarity.algorithms.rur]: Range Unit-Root test of stationarity.
1261 - [`ts_stat_tests.stationarity.algorithms.za`][ts_stat_tests.stationarity.algorithms.za]: Zivot-Andrews structural break unit root test.
1262 - [`ts_stat_tests.stationarity.algorithms.pp`][ts_stat_tests.stationarity.algorithms.pp]: Phillips-Perron unit root test.
1263 - [`ts_stat_tests.stationarity.algorithms.ers`][ts_stat_tests.stationarity.algorithms.ers]: Elliot, Rothenberg and Stock's GLS-detrended Dickey-Fuller test.
1264 - [`ts_stat_tests.stationarity.algorithms.vr`][ts_stat_tests.stationarity.algorithms.vr]: Variance Ratio test of a random walk.
1265 """
1266 res = _ers(
1267 y=np.asarray(y),
1268 lags=lags,
1269 trend=trend,
1270 max_lags=max_lags,
1271 method=method,
1272 low_memory=low_memory,
1273 )
1274 return (float(res.stat), float(res.pvalue), int(res.lags), dict(res.critical_values))
1277@typechecked
1278def vr(
1279 y: ArrayLike,
1280 lags: int = 2,
1281 trend: VALID_VR_TREND_OPTIONS = "c",
1282 debiased: bool = True,
1283 robust: bool = True,
1284 overlap: bool = True,
1285) -> tuple[float, float, float]:
1286 r"""
1287 !!! note "Summary"
1288 Variance Ratio test of a random walk.
1290 ???+ abstract "Details"
1292 The Variance Ratio (VR) test is a statistical test used to determine whether a time series is stationary or not based on the presence of long-term dependence in the series. It is a non-parametric test that can be used to test for the presence of a unit root or a trend in the series.
1294 The VR test involves calculating the ratio of the variance of the differences of the logarithms of the time series over different time intervals. The variance of the differences of the logarithms is a measure of the volatility of the series, and the ratio of the variances over different intervals is a measure of the long-term dependence in the series.
1296 If the series is stationary, then the variance ratio will be close to one for all intervals. If the series is non-stationary, then the variance ratio will tend to increase as the length of the interval increases, reflecting the presence of long-term dependence in the series.
1298 The VR test involves comparing the observed variance ratio to the distribution of variance ratios expected under the null hypothesis of a random walk (non-stationary). If the test statistic is less than a critical value at a given significance level, typically 0.05, then we reject the null hypothesis and conclude that the time series is stationary. If the test statistic is greater than the critical value, then we fail to reject the null hypothesis and conclude that the time series is non-stationary.
1300 In practical terms, if a time series is found to be non-stationary by the VR test, one can apply differencing to the time series until it becomes stationary. This involves taking the difference between consecutive observations and potentially repeating this process until the time series is stationary.
1302 Overall, the VR test is a useful and relatively simple test for stationarity that can be applied to a wide range of time series. However, it is important to use multiple tests and diagnostic tools to confirm the stationarity of a time series, as no single test is perfect in all situations.
1304 Params:
1305 y (ArrayLike):
1306 The data to test for a random walk.
1307 lags (int):
1308 The number of periods to used in the multi-period variance, which is the numerator of the test statistic. Must be at least 2.<br>
1309 Defaults to `2`.
1310 trend (VALID_VR_TREND_OPTIONS, optional):
1311 `"c"` allows for a non-zero drift in the random walk, while `"n"` requires that the increments to `y` are mean `0`.<br>
1312 Defaults to `"c"`.
1313 debiased (bool, optional):
1314 Indicates whether to use a debiased version of the test. Only applicable if `overlap` is `True`.<br>
1315 Defaults to `True`.
1316 robust (bool, optional):
1317 Indicates whether to use heteroskedasticity robust inference.<br>
1318 Defaults to `True`.
1319 overlap (bool, optional):
1320 Indicates whether to use all overlapping blocks. If `False`, the number of observations in $y-1$ must be an exact multiple of `lags`. If this condition is not satisfied, some values at the end of `y` will be discarded.<br>
1321 Defaults to `True`.
1323 Returns:
1324 (tuple[float, float, float]):
1325 Returns a tuple containing:
1326 - `stat` (float): The test statistic for a unit root.
1327 - `pvalue` (float): The p-value for the test statistic.
1328 - `vr` (float): The ratio of the long block lags-period variance.
1330 ???+ example "Examples"
1332 ```pycon {.py .python linenums="1" title="Setup"}
1333 >>> from ts_stat_tests.stationarity.algorithms import vr
1334 >>> from ts_stat_tests.utils.data import data_airline, data_normal, data_noise, data_sine
1335 >>> normal = data_normal
1336 >>> noise = data_noise
1337 >>> seasonal = data_sine
1338 >>> airline = data_airline.values
1340 ```
1342 ```pycon {.py .python linenums="1" title="Example 1: Stationary Series"}
1343 >>> stat, pvalue, variance_ratio = vr(y=normal)
1344 >>> print(f"VR statistic: {stat:.4f}")
1345 VR statistic: -12.8518
1346 >>> print(f"p-value: {pvalue:.4f}")
1347 p-value: 0.0000
1348 >>> print(f"Variance ratio: {variance_ratio:.4f}")
1349 Variance ratio: 0.5202
1351 ```
1353 ```pycon {.py .python linenums="1" title="Example 2: Noisy Series"}
1354 >>> stat, pvalue, variance_ratio = vr(y=noise)
1355 >>> print(f"VR statistic: {stat:.4f}")
1356 VR statistic: -11.5007
1357 >>> print(f"p-value: {pvalue:.4f}")
1358 p-value: 0.0000
1359 >>> print(f"Variance ratio: {variance_ratio:.4f}")
1360 Variance ratio: 0.5094
1362 ```
1364 ```pycon {.py .python linenums="1" title="Example 3: Seasonal Series"}
1365 >>> stat, pvalue, variance_ratio = vr(y=seasonal)
1366 >>> print(f"VR statistic: {stat:.4f}")
1367 VR statistic: 44.7019
1368 >>> print(f"p-value: {pvalue:.4f}")
1369 p-value: 0.0000
1370 >>> print(f"Variance ratio: {variance_ratio:.4f}")
1371 Variance ratio: 1.9980
1373 ```
1375 ```pycon {.py .python linenums="1" title="Example 4: Real-World Time Series"}
1376 >>> stat, pvalue, variance_ratio = vr(y=airline)
1377 >>> print(f"VR statistic: {stat:.4f}")
1378 VR statistic: 3.1511
1379 >>> print(f"p-value: {pvalue:.4f}")
1380 p-value: 0.0016
1381 >>> print(f"Variance ratio: {variance_ratio:.4f}")
1382 Variance ratio: 1.3163
1384 ```
1386 ??? equation "Calculation"
1388 The Variance Ratio (VR) test is a statistical test for stationarity in time series forecasting that is based on the idea that if the time series is stationary, then the variance of the returns should be constant over time. The mathematical equation for the VR test is:
1390 $$
1391 VR(k) = \frac {\sigma^2(k)} {k\sigma^2(1)}
1392 $$
1394 where:
1396 - $VR(k)$ is the variance ratio for the time series over $k$ periods.
1397 - $\sigma^2(k)$ is the variance of the returns over $k$ periods.
1398 - $\sigma^2(1)$ is the variance of the returns over $1$ period.
1400 The VR test involves comparing the variance ratio to a critical value, which is derived from the null distribution of the variance ratio under the assumption of a random walk with drift.
1402 Here are the detailed steps for how to calculate the VR test:
1404 1. Collect your time series data and compute the log returns, which are defined as:
1406 $$
1407 r_t = \log(y_t) - \log(y_{t-1})
1408 $$
1410 where:
1412 - $y_t$ is the value of the time series at time $t$.
1414 1. Compute the variance of the returns over $k$ periods, which is defined as:
1416 $$
1417 \sigma^2(k) = \left( \frac {1} {n-k} \right) \times \sum_{t=k+1}^n (r_t - \mu_k)^2
1418 $$
1420 where:
1422 - $n$ is the sample size.
1423 - $\mu_k$ is the mean of the returns over $k$ periods, which is defined as:
1425 $\mu_k = \left( \frac{1} {n-k} \right) \times \sum_{t=k+1}^n r_t$
1427 1. Compute the variance of the returns over $1$ period, which is defined as:
1429 $$
1430 \sigma^2(1) = \left( \frac{1} {n-1} \right) \times \sum_{t=2}^n (r_t - \mu_1)^2
1431 $$
1433 where:
1435 - $\mu_1$ is the mean of the returns over $1$ period, which is defined as:
1437 $\mu_1 = \left( \frac{1} {n-1} \right) \times \sum_{t=2}^n r_t$
1439 1. Compute the variance ratio for each value of $k$, which is defined as:
1441 $$
1442 VR(k) = \frac {\sigma^2(k)} {k\sigma^2(1)}
1443 $$
1445 1. Determine the critical values of the variance ratio from the null distribution table of the VR test, which depend on the sample size, the level of significance, and the lag length $k$.
1447 1. Finally, compare the variance ratio to the critical value. If the variance ratio is greater than the critical value, then the null hypothesis of a random walk with drift is rejected, and the time series is considered stationary. If the variance ratio is less than or equal to the critical value, then the null hypothesis cannot be rejected, and the time series is considered non-stationary.
1449 In practice, the VR test is often conducted using software packages such as R, Python, or MATLAB, which automate the calculation of the variance ratio and the determination of the critical value.
1451 ??? note "Notes"
1452 The null hypothesis of a VR is that the process is a random walk, possibly plus drift. Rejection of the null with a positive test statistic indicates the presence of positive serial correlation in the time series.
1454 !!! success "Credit"
1455 - All credit goes to the [`arch`](https://arch.readthedocs.io/en/latest/unitroot/generated/arch.unitroot.VarianceRatio.html) library.
1457 ??? question "References"
1458 - Campbell, John Y., Lo, Andrew W. and MacKinlay, A. Craig. (1997) The Econometrics of Financial Markets. Princeton, NJ: Princeton University Press.
1460 ??? tip "See Also"
1461 - [`statsmodels.tsa.stattools.adfuller`](https://www.statsmodels.org/stable/generated/statsmodels.tsa.stattools.adfuller.html): Augmented Dickey-Fuller unit root test.
1462 - [`statsmodels.tsa.stattools.kpss`](https://www.statsmodels.org/stable/generated/statsmodels.tsa.stattools.kpss.html): Kwiatkowski-Phillips-Schmidt-Shin stationarity test.
1463 - [`statsmodels.tsa.stattools.range_unit_root_test`](https://www.statsmodels.org/stable/generated/statsmodels.tsa.stattools.range_unit_root_test.html): Range Unit-Root test.
1464 - [`statsmodels.tsa.stattools.zivot_andrews`](https://www.statsmodels.org/stable/generated/statsmodels.tsa.stattools.zivot_andrews.html): Zivot-Andrews structural break test.
1465 - [`pmdarima.arima.PPTest`](https://alkaline-ml.com/pmdarima/modules/generated/pmdarima.arima.PPTest.html): Phillips-Perron unit root test.
1466 - [`arch.unitroot.DFGLS`](https://arch.readthedocs.io/en/latest/unitroot/generated/arch.unitroot.DFGLS.html): Elliot, Rothenberg and Stock's GLS-detrended Dickey-Fuller.
1467 - [`arch.unitroot.VarianceRatio`](https://arch.readthedocs.io/en/latest/unitroot/generated/arch.unitroot.VarianceRatio.html): Variance Ratio test of a random walk.
1468 - [`ts_stat_tests.stationarity.algorithms.adf`][ts_stat_tests.stationarity.algorithms.adf]: Augmented Dickey-Fuller unit root test.
1469 - [`ts_stat_tests.stationarity.algorithms.kpss`][ts_stat_tests.stationarity.algorithms.kpss]: Kwiatkowski-Phillips-Schmidt-Shin stationarity test.
1470 - [`ts_stat_tests.stationarity.algorithms.rur`][ts_stat_tests.stationarity.algorithms.rur]: Range Unit-Root test of stationarity.
1471 - [`ts_stat_tests.stationarity.algorithms.za`][ts_stat_tests.stationarity.algorithms.za]: Zivot-Andrews structural break unit root test.
1472 - [`ts_stat_tests.stationarity.algorithms.pp`][ts_stat_tests.stationarity.algorithms.pp]: Phillips-Perron unit root test.
1473 - [`ts_stat_tests.stationarity.algorithms.ers`][ts_stat_tests.stationarity.algorithms.ers]: Elliot, Rothenberg and Stock's GLS-detrended Dickey-Fuller test.
1474 - [`ts_stat_tests.stationarity.algorithms.vr`][ts_stat_tests.stationarity.algorithms.vr]: Variance Ratio test of a random walk.
1475 """
1476 res = _vr(
1477 y=np.asarray(y),
1478 lags=lags,
1479 trend=trend,
1480 debiased=debiased,
1481 robust=robust,
1482 overlap=overlap,
1483 )
1484 return float(res.stat), float(res.pvalue), float(res.vr)