Coverage for src / ts_stat_tests / heteroscedasticity / algorithms.py: 100%
28 statements
« prev ^ index » next coverage.py v7.13.2, created at 2026-02-01 09:48 +0000
« prev ^ index » next coverage.py v7.13.2, created at 2026-02-01 09:48 +0000
1# ============================================================================ #
2# #
3# Title: Heteroscedasticity Algorithms #
4# Purpose: Implementation of heteroscedasticity tests. #
5# #
6# ============================================================================ #
9# ---------------------------------------------------------------------------- #
10# #
11# Overview ####
12# #
13# ---------------------------------------------------------------------------- #
16# ---------------------------------------------------------------------------- #
17# Description ####
18# ---------------------------------------------------------------------------- #
21"""
22!!! note "Summary"
23 This module implements various heteroscedasticity tests including:
24 - ARCH Test
25 - Breusch-Pagan Test
26 - Goldfeld-Quandt Test
27 - White's Test
28"""
31# ---------------------------------------------------------------------------- #
32# #
33# Setup ####
34# #
35# ---------------------------------------------------------------------------- #
38# ---------------------------------------------------------------------------- #
39# Imports ####
40# ---------------------------------------------------------------------------- #
43# ## Python StdLib Imports ----
44from typing import (
45 Literal,
46 Optional,
47 Union,
48 cast,
49 overload,
50)
52# ## Python Third Party Imports ----
53from numpy.typing import ArrayLike
54from statsmodels.stats.diagnostic import (
55 ResultsStore,
56 het_arch,
57 het_breuschpagan,
58 het_goldfeldquandt,
59 het_white,
60)
61from typeguard import typechecked
64# ---------------------------------------------------------------------------- #
65# Exports ####
66# ---------------------------------------------------------------------------- #
69__all__: list[str] = ["arch", "bpl", "gq", "wlm"]
72## --------------------------------------------------------------------------- #
73## Constants ####
74## --------------------------------------------------------------------------- #
77VALID_GQ_ALTERNATIVES_OPTIONS = Literal["two-sided", "increasing", "decreasing"]
80# ---------------------------------------------------------------------------- #
81# #
82# Algorithms ####
83# #
84# ---------------------------------------------------------------------------- #
87@overload
88def arch(
89 resid: ArrayLike, nlags: Optional[int] = None, ddof: int = 0, *, store: Literal[False] = False
90) -> tuple[float, float, float, float]: ...
91@overload
92def arch(
93 resid: ArrayLike, nlags: Optional[int] = None, ddof: int = 0, *, store: Literal[True]
94) -> tuple[float, float, float, float, ResultsStore]: ...
95@typechecked
96def arch(resid: ArrayLike, nlags: Optional[int] = None, ddof: int = 0, *, store: bool = False) -> Union[
97 tuple[float, float, float, float],
98 tuple[float, float, float, float, ResultsStore],
99]:
100 r"""
101 !!! note "Summary"
102 Engle's Test for Autoregressive Conditional Heteroscedasticity (ARCH).
104 ???+ abstract "Details"
105 This test is used to determine whether the residuals of a time-series model exhibit ARCH effects. ARCH effects are characterized by clusters of volatility, where periods of high volatility are followed by periods of high volatility, and vice versa. The test is essentially a Lagrange Multiplier (LM) test for autocorrelation in the squared residuals.
107 Params:
108 resid (ArrayLike):
109 The residuals from a linear regression model.
110 nlags (Optional[int]):
111 The number of lags to include in the test regression. If `None`, the number of lags is determined based on the number of observations.
112 Default: `None`
113 ddof (int):
114 Degrees of freedom to adjust for in the calculation of the F-statistic.
115 Default: `0`
116 store (bool):
117 Whether to return a `ResultsStore` object containing additional test results.
118 Default: `False`
120 Returns:
121 (Union[tuple[float, float, float, float], tuple[float, float, float, float, ResultsStore]]):
122 A tuple containing:
123 - `lmstat` (float): The Lagrange Multiplier statistic.
124 - `lmpval` (float): The p-value for the LM statistic.
125 - `fstat` (float): The F-statistic.
126 - `fpval` (float): The p-value for the F-statistic.
127 - `resstore` (ResultsStore, optional): Returned only if `store` is `True`.
129 ???+ example "Examples"
131 ```pycon {.py .python linenums="1" title="Setup"}
132 >>> import statsmodels.api as sm
133 >>> from ts_stat_tests.heteroscedasticity.algorithms import arch
134 >>> from ts_stat_tests.utils.data import data_line, data_random
135 >>> X = sm.add_constant(data_line)
136 >>> y = 2 * data_line + data_random
137 >>> res = sm.OLS(y, X).fit()
138 >>> resid = res.resid
140 ```
142 ```pycon {.py .python linenums="1" title="Example 1: Basic ARCH test"}
143 >>> lm, lmp, f, fp = arch(resid)
144 >>> print(f"LM p-value: {lmp:.4f}")
145 LM p-value: 0.9124
147 ```
149 ??? equation "Calculation"
150 The test is performed by regressing the squared residuals $e_t^2$ on a constant and $q$ lags of the squared residuals:
152 $$
153 e_t^2 = \gamma_0 + \gamma_1 e_{t-1}^2 + \gamma_2 e_{t-2}^2 + \dots + \gamma_q e_{t-q}^2 + \nu_t
154 $$
156 The null hypothesis of no ARCH effects is:
158 $$
159 H_0: \gamma_1 = \gamma_2 = \dots = \gamma_q = 0
160 $$
162 The LM statistic is calculated as $T \times R^2$ from this regression, where $T$ is the number of observations and $R^2$ is the coefficient of determination.
164 ??? success "Credit"
165 Calculations are performed by `statsmodels`.
167 ??? question "References"
168 - Engle, R. F. (1982). Autoregressive Conditional Heteroscedasticity with Estimates of the Variance of United Kingdom Inflation. Econometrica, 50(4), 987-1007.
169 """
170 if store:
171 res_5 = cast(
172 tuple[float, float, float, float, ResultsStore],
173 het_arch(resid=resid, nlags=nlags, store=True, ddof=ddof),
174 )
175 return (
176 float(res_5[0]),
177 float(res_5[1]),
178 float(res_5[2]),
179 float(res_5[3]),
180 res_5[4],
181 )
183 res_4 = cast(
184 tuple[float, float, float, float],
185 het_arch(resid=resid, nlags=nlags, store=False, ddof=ddof),
186 )
187 return (float(res_4[0]), float(res_4[1]), float(res_4[2]), float(res_4[3]))
190@typechecked
191def bpl(resid: ArrayLike, exog_het: ArrayLike, robust: bool = True) -> tuple[float, float, float, float]:
192 r"""
193 !!! note "Summary"
194 Breusch-Pagan Lagrange Multiplier Test for Heteroscedasticity.
196 ???+ abstract "Details"
197 This test checks whether the variance of the errors in a regression model depends on the values of the independent variables. If it does, the errors are heteroscedastic. The null hypothesis assumes homoscedasticity (constant variance).
199 Params:
200 resid (ArrayLike):
201 The residuals from a linear regression model.
202 exog_het (ArrayLike):
203 The explanatory variables for the variance (heteroscedasticity). Usually, these are the same as the original regression's exogenous variables.
204 robust (bool):
205 Whether to use a robust version of the test that does not assume the errors are normally distributed (Koenker's version).
206 Default: `True`
208 Returns:
209 (tuple[float, float, float, float]):
210 A tuple containing:
211 - `lmstat` (float): The Lagrange Multiplier statistic.
212 - `lmpval` (float): The p-value for the LM statistic.
213 - `fstat` (float): The F-statistic.
214 - `fpval` (float): The p-value for the F-statistic.
216 ???+ example "Examples"
218 ```pycon {.py .python linenums="1" title="Setup"}
219 >>> import statsmodels.api as sm
220 >>> from ts_stat_tests.heteroscedasticity.algorithms import bpl
221 >>> from ts_stat_tests.utils.data import data_line, data_random
222 >>> X = sm.add_constant(data_line)
223 >>> y = 2 * data_line + data_random
224 >>> res = sm.OLS(y, X).fit()
225 >>> resid, exog = res.resid, X
227 ```
229 ```pycon {.py .python linenums="1" title="Example 1: Basic Breusch-Pagan test"}
230 >>> lm, lmp, f, fp = bpl(resid, exog)
231 >>> print(f"LM p-value: {lmp:.4f}")
232 LM p-value: 0.2461
234 ```
236 ??? equation "Calculation"
237 The test first fits a regression of squared residuals (or standardized version) on the specified exogenous variables:
239 $$
240 e_t^2 = \delta_0 + \delta_1 z_{t1} + \dots + \delta_k z_{tk} + u_t
241 $$
243 The null hypothesis is:
245 $$
246 H_0: \delta_1 = \dots = \delta_k = 0
247 $$
249 Koenker's robust version uses the scores of the likelihood function and does not require the normality assumption.
251 ??? success "Credit"
252 Calculations are performed by `statsmodels`.
254 ??? question "References"
255 - Breusch, T. S., & Pagan, A. R. (1979). A Simple Test for Heteroscedasticity and Random Coefficient Variation. Econometrica, 47(5), 1287-1294.
256 - Koenker, R. (1981). A Note on Studentizing a Test for Heteroscedasticity. Journal of Econometrics, 17(1), 107-112.
257 """
258 res = het_breuschpagan(resid=resid, exog_het=exog_het, robust=robust)
259 return (float(res[0]), float(res[1]), float(res[2]), float(res[3]))
262@overload
263def gq(
264 y: ArrayLike,
265 x: ArrayLike,
266 idx: Optional[int] = None,
267 split: Optional[Union[int, float]] = None,
268 drop: Optional[Union[int, float]] = None,
269 alternative: VALID_GQ_ALTERNATIVES_OPTIONS = "increasing",
270 *,
271 store: Literal[False] = False,
272) -> tuple[float, float, str]: ...
273@overload
274def gq(
275 y: ArrayLike,
276 x: ArrayLike,
277 idx: Optional[int] = None,
278 split: Optional[Union[int, float]] = None,
279 drop: Optional[Union[int, float]] = None,
280 alternative: VALID_GQ_ALTERNATIVES_OPTIONS = "increasing",
281 *,
282 store: Literal[True],
283) -> tuple[float, float, str, ResultsStore]: ...
284@typechecked
285def gq(
286 y: ArrayLike,
287 x: ArrayLike,
288 idx: Optional[int] = None,
289 split: Optional[Union[int, float]] = None,
290 drop: Optional[Union[int, float]] = None,
291 alternative: VALID_GQ_ALTERNATIVES_OPTIONS = "increasing",
292 *,
293 store: bool = False,
294) -> Union[
295 tuple[float, float, str],
296 tuple[float, float, str, ResultsStore],
297]:
298 r"""
299 !!! note "Summary"
300 Goldfeld-Quandt Test for Heteroscedasticity.
302 ???+ abstract "Details"
303 The Goldfeld-Quandt test checks for heteroscedasticity by dividing the dataset into two subsets (usually at the beginning and end of the sample) and comparing the variance of the residuals in each subset using an F-test.
305 Params:
306 y (ArrayLike):
307 The dependent variable (endogenous).
308 x (ArrayLike):
309 The independent variables (exogenous).
310 idx (Optional[int]):
311 The column index of the variable to sort by. If `None`, the data is assumed to be ordered.
312 Default: `None`
313 split (Optional[Union[int, float]]):
314 The index at which to split the sample. If a float between 0 and 1, it represents the fraction of observations.
315 Default: `None`
316 drop (Optional[Union[int, float]]):
317 The number of observations to drop in the middle. If a float between 0 and 1, it represents the fraction of observations.
318 Default: `None`
319 alternative (VALID_GQ_ALTERNATIVES_OPTIONS):
320 The alternative hypothesis. Options are `"increasing"`, `"decreasing"`, or `"two-sided"`.
321 Default: `"increasing"`
322 store (bool):
323 Whether to return a `ResultsStore` object.
324 Default: `False`
326 Returns:
327 (Union[tuple[float, float, str], tuple[float, float, str, ResultsStore]]):
328 A tuple containing:
329 - `fstat` (float): The F-statistic.
330 - `fpval` (float): The p-value for the F-statistic.
331 - `alternative` (str): The alternative hypothesis used.
332 - `resstore` (ResultsStore, optional): Returned only if `store` is `True`.
334 ???+ example "Examples"
336 ```pycon {.py .python linenums="1" title="Setup"}
337 >>> import statsmodels.api as sm
338 >>> from ts_stat_tests.utils.data import data_line, data_random
339 >>> from ts_stat_tests.heteroscedasticity.algorithms import gq
340 >>> X = sm.add_constant(data_line)
341 >>> y = 2 * data_line + data_random
343 ```
345 ```pycon {.py .python linenums="1" title="Example 1: Basic Goldfeld-Quandt test"}
346 >>> f, p, alt = gq(y, X)
347 >>> print(f"F p-value: {p:.4f}")
348 F p-value: 0.2269
350 ```
352 ??? equation "Calculation"
353 The dataset is split into two samples after sorting by an independent variable (or using the natural order). Separate regressions are run on each sample:
355 $$
356 RSS_1 = \sum e_{1,t}^2, \quad RSS_2 = \sum e_{2,t}^2
357 $$
359 The test statistic is the ratio of variances:
361 $$
362 F = \frac{RSS_2 / df_2}{RSS_1 / df_1}
363 $$
365 where $RSS_i$ are the residual sum of squares and $df_i$ are the degrees of freedom.
367 ??? success "Credit"
368 Calculations are performed by `statsmodels`.
370 ??? question "References"
371 - Goldfeld, S. M., & Quandt, R. E. (1965). Some Tests for Homoscedasticity. Journal of the American Statistical Association, 60(310), 539-547.
372 """
373 if store:
374 res_4 = cast(
375 tuple[float, float, str, ResultsStore],
376 het_goldfeldquandt(
377 y=y,
378 x=x,
379 idx=idx,
380 split=split,
381 drop=drop,
382 alternative=alternative,
383 store=True,
384 ),
385 )
386 return (float(res_4[0]), float(res_4[1]), str(res_4[2]), res_4[3])
388 res_3 = cast(
389 tuple[float, float, str],
390 het_goldfeldquandt(
391 y=y,
392 x=x,
393 idx=idx,
394 split=split,
395 drop=drop,
396 alternative=alternative,
397 store=False,
398 ),
399 )
400 return (float(res_3[0]), float(res_3[1]), str(res_3[2]))
403@typechecked
404def wlm(resid: ArrayLike, exog_het: ArrayLike) -> tuple[float, float, float, float]:
405 r"""
406 !!! note "Summary"
407 White's Test for Heteroscedasticity.
409 ???+ abstract "Details"
410 White's test is a general test for heteroscedasticity that does not require a specific functional form for the variance of the error terms. It is essentially a test of whether the squared residuals can be explained by the levels, squares, and cross-products of the independent variables.
412 Params:
413 resid (ArrayLike):
414 The residuals from a linear regression model.
415 exog_het (ArrayLike):
416 The explanatory variables for the variance. Usually, these are the original exogenous variables; the test internally handles adding their squares and cross-products.
418 Returns:
419 (tuple[float, float, float, float]):
420 A tuple containing:
421 - `lmstat` (float): The Lagrange Multiplier statistic.
422 - `lmpval` (float): The p-value for the LM statistic.
423 - `fstat` (float): The F-statistic.
424 - `fpval` (float): The p-value for the F-statistic.
426 ???+ example "Examples"
428 ```pycon {.py .python linenums="1" title="Setup"}
429 >>> import statsmodels.api as sm
430 >>> from ts_stat_tests.heteroscedasticity.algorithms import wlm
431 >>> from ts_stat_tests.utils.data import data_line, data_random
432 >>> X = sm.add_constant(data_line)
433 >>> y = 2 * data_line + data_random
434 >>> res = sm.OLS(y, X).fit()
435 >>> resid, exog = res.resid, X
437 ```
439 ```pycon {.py .python linenums="1" title="Example 1: Basic White's test"}
440 >>> lm, lmp, f, fp = wlm(resid, exog)
441 >>> print(f"White p-value: {lmp:.4f}")
442 White p-value: 0.4558
444 ```
446 ??? equation "Calculation"
447 Squared residuals are regressed on all distinct variables in the cross-product of the original exogenous variables (including constant, linear terms, squares, and interactions):
449 $$
450 e_t^2 = \delta_0 + \sum \delta_i z_{it} + \sum \delta_{ij} z_{it} z_{jt} + u_t
451 $$
453 The LM statistic is $T \times R^2$ from this auxiliary regression, where $T$ is the number of observations.
455 ??? success "Credit"
456 Calculations are performed by `statsmodels`.
458 ??? question "References"
459 - White, H. (1980). A Heteroskedasticity-Consistent Covariance Matrix Estimator and a Direct Test for Heteroskedasticity. Econometrica, 48(4), 817-838.
460 """
461 res = het_white(resid=resid, exog=exog_het)
462 return (float(res[0]), float(res[1]), float(res[2]), float(res[3]))