Coverage for src / ts_stat_tests / heteroscedasticity / algorithms.py: 100%

28 statements  

« prev     ^ index     » next       coverage.py v7.13.2, created at 2026-02-01 09:48 +0000

1# ============================================================================ # 

2# # 

3# Title: Heteroscedasticity Algorithms # 

4# Purpose: Implementation of heteroscedasticity tests. # 

5# # 

6# ============================================================================ # 

7 

8 

9# ---------------------------------------------------------------------------- # 

10# # 

11# Overview #### 

12# # 

13# ---------------------------------------------------------------------------- # 

14 

15 

16# ---------------------------------------------------------------------------- # 

17# Description #### 

18# ---------------------------------------------------------------------------- # 

19 

20 

21""" 

22!!! note "Summary" 

23 This module implements various heteroscedasticity tests including: 

24 - ARCH Test 

25 - Breusch-Pagan Test 

26 - Goldfeld-Quandt Test 

27 - White's Test 

28""" 

29 

30 

31# ---------------------------------------------------------------------------- # 

32# # 

33# Setup #### 

34# # 

35# ---------------------------------------------------------------------------- # 

36 

37 

38# ---------------------------------------------------------------------------- # 

39# Imports #### 

40# ---------------------------------------------------------------------------- # 

41 

42 

43# ## Python StdLib Imports ---- 

44from typing import ( 

45 Literal, 

46 Optional, 

47 Union, 

48 cast, 

49 overload, 

50) 

51 

52# ## Python Third Party Imports ---- 

53from numpy.typing import ArrayLike 

54from statsmodels.stats.diagnostic import ( 

55 ResultsStore, 

56 het_arch, 

57 het_breuschpagan, 

58 het_goldfeldquandt, 

59 het_white, 

60) 

61from typeguard import typechecked 

62 

63 

64# ---------------------------------------------------------------------------- # 

65# Exports #### 

66# ---------------------------------------------------------------------------- # 

67 

68 

69__all__: list[str] = ["arch", "bpl", "gq", "wlm"] 

70 

71 

72## --------------------------------------------------------------------------- # 

73## Constants #### 

74## --------------------------------------------------------------------------- # 

75 

76 

77VALID_GQ_ALTERNATIVES_OPTIONS = Literal["two-sided", "increasing", "decreasing"] 

78 

79 

80# ---------------------------------------------------------------------------- # 

81# # 

82# Algorithms #### 

83# # 

84# ---------------------------------------------------------------------------- # 

85 

86 

87@overload 

88def arch( 

89 resid: ArrayLike, nlags: Optional[int] = None, ddof: int = 0, *, store: Literal[False] = False 

90) -> tuple[float, float, float, float]: ... 

91@overload 

92def arch( 

93 resid: ArrayLike, nlags: Optional[int] = None, ddof: int = 0, *, store: Literal[True] 

94) -> tuple[float, float, float, float, ResultsStore]: ... 

95@typechecked 

96def arch(resid: ArrayLike, nlags: Optional[int] = None, ddof: int = 0, *, store: bool = False) -> Union[ 

97 tuple[float, float, float, float], 

98 tuple[float, float, float, float, ResultsStore], 

99]: 

100 r""" 

101 !!! note "Summary" 

102 Engle's Test for Autoregressive Conditional Heteroscedasticity (ARCH). 

103 

104 ???+ abstract "Details" 

105 This test is used to determine whether the residuals of a time-series model exhibit ARCH effects. ARCH effects are characterized by clusters of volatility, where periods of high volatility are followed by periods of high volatility, and vice versa. The test is essentially a Lagrange Multiplier (LM) test for autocorrelation in the squared residuals. 

106 

107 Params: 

108 resid (ArrayLike): 

109 The residuals from a linear regression model. 

110 nlags (Optional[int]): 

111 The number of lags to include in the test regression. If `None`, the number of lags is determined based on the number of observations. 

112 Default: `None` 

113 ddof (int): 

114 Degrees of freedom to adjust for in the calculation of the F-statistic. 

115 Default: `0` 

116 store (bool): 

117 Whether to return a `ResultsStore` object containing additional test results. 

118 Default: `False` 

119 

120 Returns: 

121 (Union[tuple[float, float, float, float], tuple[float, float, float, float, ResultsStore]]): 

122 A tuple containing: 

123 - `lmstat` (float): The Lagrange Multiplier statistic. 

124 - `lmpval` (float): The p-value for the LM statistic. 

125 - `fstat` (float): The F-statistic. 

126 - `fpval` (float): The p-value for the F-statistic. 

127 - `resstore` (ResultsStore, optional): Returned only if `store` is `True`. 

128 

129 ???+ example "Examples" 

130 

131 ```pycon {.py .python linenums="1" title="Setup"} 

132 >>> import statsmodels.api as sm 

133 >>> from ts_stat_tests.heteroscedasticity.algorithms import arch 

134 >>> from ts_stat_tests.utils.data import data_line, data_random 

135 >>> X = sm.add_constant(data_line) 

136 >>> y = 2 * data_line + data_random 

137 >>> res = sm.OLS(y, X).fit() 

138 >>> resid = res.resid 

139 

140 ``` 

141 

142 ```pycon {.py .python linenums="1" title="Example 1: Basic ARCH test"} 

143 >>> lm, lmp, f, fp = arch(resid) 

144 >>> print(f"LM p-value: {lmp:.4f}") 

145 LM p-value: 0.9124 

146 

147 ``` 

148 

149 ??? equation "Calculation" 

150 The test is performed by regressing the squared residuals $e_t^2$ on a constant and $q$ lags of the squared residuals: 

151 

152 $$ 

153 e_t^2 = \gamma_0 + \gamma_1 e_{t-1}^2 + \gamma_2 e_{t-2}^2 + \dots + \gamma_q e_{t-q}^2 + \nu_t 

154 $$ 

155 

156 The null hypothesis of no ARCH effects is: 

157 

158 $$ 

159 H_0: \gamma_1 = \gamma_2 = \dots = \gamma_q = 0 

160 $$ 

161 

162 The LM statistic is calculated as $T \times R^2$ from this regression, where $T$ is the number of observations and $R^2$ is the coefficient of determination. 

163 

164 ??? success "Credit" 

165 Calculations are performed by `statsmodels`. 

166 

167 ??? question "References" 

168 - Engle, R. F. (1982). Autoregressive Conditional Heteroscedasticity with Estimates of the Variance of United Kingdom Inflation. Econometrica, 50(4), 987-1007. 

169 """ 

170 if store: 

171 res_5 = cast( 

172 tuple[float, float, float, float, ResultsStore], 

173 het_arch(resid=resid, nlags=nlags, store=True, ddof=ddof), 

174 ) 

175 return ( 

176 float(res_5[0]), 

177 float(res_5[1]), 

178 float(res_5[2]), 

179 float(res_5[3]), 

180 res_5[4], 

181 ) 

182 

183 res_4 = cast( 

184 tuple[float, float, float, float], 

185 het_arch(resid=resid, nlags=nlags, store=False, ddof=ddof), 

186 ) 

187 return (float(res_4[0]), float(res_4[1]), float(res_4[2]), float(res_4[3])) 

188 

189 

190@typechecked 

191def bpl(resid: ArrayLike, exog_het: ArrayLike, robust: bool = True) -> tuple[float, float, float, float]: 

192 r""" 

193 !!! note "Summary" 

194 Breusch-Pagan Lagrange Multiplier Test for Heteroscedasticity. 

195 

196 ???+ abstract "Details" 

197 This test checks whether the variance of the errors in a regression model depends on the values of the independent variables. If it does, the errors are heteroscedastic. The null hypothesis assumes homoscedasticity (constant variance). 

198 

199 Params: 

200 resid (ArrayLike): 

201 The residuals from a linear regression model. 

202 exog_het (ArrayLike): 

203 The explanatory variables for the variance (heteroscedasticity). Usually, these are the same as the original regression's exogenous variables. 

204 robust (bool): 

205 Whether to use a robust version of the test that does not assume the errors are normally distributed (Koenker's version). 

206 Default: `True` 

207 

208 Returns: 

209 (tuple[float, float, float, float]): 

210 A tuple containing: 

211 - `lmstat` (float): The Lagrange Multiplier statistic. 

212 - `lmpval` (float): The p-value for the LM statistic. 

213 - `fstat` (float): The F-statistic. 

214 - `fpval` (float): The p-value for the F-statistic. 

215 

216 ???+ example "Examples" 

217 

218 ```pycon {.py .python linenums="1" title="Setup"} 

219 >>> import statsmodels.api as sm 

220 >>> from ts_stat_tests.heteroscedasticity.algorithms import bpl 

221 >>> from ts_stat_tests.utils.data import data_line, data_random 

222 >>> X = sm.add_constant(data_line) 

223 >>> y = 2 * data_line + data_random 

224 >>> res = sm.OLS(y, X).fit() 

225 >>> resid, exog = res.resid, X 

226 

227 ``` 

228 

229 ```pycon {.py .python linenums="1" title="Example 1: Basic Breusch-Pagan test"} 

230 >>> lm, lmp, f, fp = bpl(resid, exog) 

231 >>> print(f"LM p-value: {lmp:.4f}") 

232 LM p-value: 0.2461 

233 

234 ``` 

235 

236 ??? equation "Calculation" 

237 The test first fits a regression of squared residuals (or standardized version) on the specified exogenous variables: 

238 

239 $$ 

240 e_t^2 = \delta_0 + \delta_1 z_{t1} + \dots + \delta_k z_{tk} + u_t 

241 $$ 

242 

243 The null hypothesis is: 

244 

245 $$ 

246 H_0: \delta_1 = \dots = \delta_k = 0 

247 $$ 

248 

249 Koenker's robust version uses the scores of the likelihood function and does not require the normality assumption. 

250 

251 ??? success "Credit" 

252 Calculations are performed by `statsmodels`. 

253 

254 ??? question "References" 

255 - Breusch, T. S., & Pagan, A. R. (1979). A Simple Test for Heteroscedasticity and Random Coefficient Variation. Econometrica, 47(5), 1287-1294. 

256 - Koenker, R. (1981). A Note on Studentizing a Test for Heteroscedasticity. Journal of Econometrics, 17(1), 107-112. 

257 """ 

258 res = het_breuschpagan(resid=resid, exog_het=exog_het, robust=robust) 

259 return (float(res[0]), float(res[1]), float(res[2]), float(res[3])) 

260 

261 

262@overload 

263def gq( 

264 y: ArrayLike, 

265 x: ArrayLike, 

266 idx: Optional[int] = None, 

267 split: Optional[Union[int, float]] = None, 

268 drop: Optional[Union[int, float]] = None, 

269 alternative: VALID_GQ_ALTERNATIVES_OPTIONS = "increasing", 

270 *, 

271 store: Literal[False] = False, 

272) -> tuple[float, float, str]: ... 

273@overload 

274def gq( 

275 y: ArrayLike, 

276 x: ArrayLike, 

277 idx: Optional[int] = None, 

278 split: Optional[Union[int, float]] = None, 

279 drop: Optional[Union[int, float]] = None, 

280 alternative: VALID_GQ_ALTERNATIVES_OPTIONS = "increasing", 

281 *, 

282 store: Literal[True], 

283) -> tuple[float, float, str, ResultsStore]: ... 

284@typechecked 

285def gq( 

286 y: ArrayLike, 

287 x: ArrayLike, 

288 idx: Optional[int] = None, 

289 split: Optional[Union[int, float]] = None, 

290 drop: Optional[Union[int, float]] = None, 

291 alternative: VALID_GQ_ALTERNATIVES_OPTIONS = "increasing", 

292 *, 

293 store: bool = False, 

294) -> Union[ 

295 tuple[float, float, str], 

296 tuple[float, float, str, ResultsStore], 

297]: 

298 r""" 

299 !!! note "Summary" 

300 Goldfeld-Quandt Test for Heteroscedasticity. 

301 

302 ???+ abstract "Details" 

303 The Goldfeld-Quandt test checks for heteroscedasticity by dividing the dataset into two subsets (usually at the beginning and end of the sample) and comparing the variance of the residuals in each subset using an F-test. 

304 

305 Params: 

306 y (ArrayLike): 

307 The dependent variable (endogenous). 

308 x (ArrayLike): 

309 The independent variables (exogenous). 

310 idx (Optional[int]): 

311 The column index of the variable to sort by. If `None`, the data is assumed to be ordered. 

312 Default: `None` 

313 split (Optional[Union[int, float]]): 

314 The index at which to split the sample. If a float between 0 and 1, it represents the fraction of observations. 

315 Default: `None` 

316 drop (Optional[Union[int, float]]): 

317 The number of observations to drop in the middle. If a float between 0 and 1, it represents the fraction of observations. 

318 Default: `None` 

319 alternative (VALID_GQ_ALTERNATIVES_OPTIONS): 

320 The alternative hypothesis. Options are `"increasing"`, `"decreasing"`, or `"two-sided"`. 

321 Default: `"increasing"` 

322 store (bool): 

323 Whether to return a `ResultsStore` object. 

324 Default: `False` 

325 

326 Returns: 

327 (Union[tuple[float, float, str], tuple[float, float, str, ResultsStore]]): 

328 A tuple containing: 

329 - `fstat` (float): The F-statistic. 

330 - `fpval` (float): The p-value for the F-statistic. 

331 - `alternative` (str): The alternative hypothesis used. 

332 - `resstore` (ResultsStore, optional): Returned only if `store` is `True`. 

333 

334 ???+ example "Examples" 

335 

336 ```pycon {.py .python linenums="1" title="Setup"} 

337 >>> import statsmodels.api as sm 

338 >>> from ts_stat_tests.utils.data import data_line, data_random 

339 >>> from ts_stat_tests.heteroscedasticity.algorithms import gq 

340 >>> X = sm.add_constant(data_line) 

341 >>> y = 2 * data_line + data_random 

342 

343 ``` 

344 

345 ```pycon {.py .python linenums="1" title="Example 1: Basic Goldfeld-Quandt test"} 

346 >>> f, p, alt = gq(y, X) 

347 >>> print(f"F p-value: {p:.4f}") 

348 F p-value: 0.2269 

349 

350 ``` 

351 

352 ??? equation "Calculation" 

353 The dataset is split into two samples after sorting by an independent variable (or using the natural order). Separate regressions are run on each sample: 

354 

355 $$ 

356 RSS_1 = \sum e_{1,t}^2, \quad RSS_2 = \sum e_{2,t}^2 

357 $$ 

358 

359 The test statistic is the ratio of variances: 

360 

361 $$ 

362 F = \frac{RSS_2 / df_2}{RSS_1 / df_1} 

363 $$ 

364 

365 where $RSS_i$ are the residual sum of squares and $df_i$ are the degrees of freedom. 

366 

367 ??? success "Credit" 

368 Calculations are performed by `statsmodels`. 

369 

370 ??? question "References" 

371 - Goldfeld, S. M., & Quandt, R. E. (1965). Some Tests for Homoscedasticity. Journal of the American Statistical Association, 60(310), 539-547. 

372 """ 

373 if store: 

374 res_4 = cast( 

375 tuple[float, float, str, ResultsStore], 

376 het_goldfeldquandt( 

377 y=y, 

378 x=x, 

379 idx=idx, 

380 split=split, 

381 drop=drop, 

382 alternative=alternative, 

383 store=True, 

384 ), 

385 ) 

386 return (float(res_4[0]), float(res_4[1]), str(res_4[2]), res_4[3]) 

387 

388 res_3 = cast( 

389 tuple[float, float, str], 

390 het_goldfeldquandt( 

391 y=y, 

392 x=x, 

393 idx=idx, 

394 split=split, 

395 drop=drop, 

396 alternative=alternative, 

397 store=False, 

398 ), 

399 ) 

400 return (float(res_3[0]), float(res_3[1]), str(res_3[2])) 

401 

402 

403@typechecked 

404def wlm(resid: ArrayLike, exog_het: ArrayLike) -> tuple[float, float, float, float]: 

405 r""" 

406 !!! note "Summary" 

407 White's Test for Heteroscedasticity. 

408 

409 ???+ abstract "Details" 

410 White's test is a general test for heteroscedasticity that does not require a specific functional form for the variance of the error terms. It is essentially a test of whether the squared residuals can be explained by the levels, squares, and cross-products of the independent variables. 

411 

412 Params: 

413 resid (ArrayLike): 

414 The residuals from a linear regression model. 

415 exog_het (ArrayLike): 

416 The explanatory variables for the variance. Usually, these are the original exogenous variables; the test internally handles adding their squares and cross-products. 

417 

418 Returns: 

419 (tuple[float, float, float, float]): 

420 A tuple containing: 

421 - `lmstat` (float): The Lagrange Multiplier statistic. 

422 - `lmpval` (float): The p-value for the LM statistic. 

423 - `fstat` (float): The F-statistic. 

424 - `fpval` (float): The p-value for the F-statistic. 

425 

426 ???+ example "Examples" 

427 

428 ```pycon {.py .python linenums="1" title="Setup"} 

429 >>> import statsmodels.api as sm 

430 >>> from ts_stat_tests.heteroscedasticity.algorithms import wlm 

431 >>> from ts_stat_tests.utils.data import data_line, data_random 

432 >>> X = sm.add_constant(data_line) 

433 >>> y = 2 * data_line + data_random 

434 >>> res = sm.OLS(y, X).fit() 

435 >>> resid, exog = res.resid, X 

436 

437 ``` 

438 

439 ```pycon {.py .python linenums="1" title="Example 1: Basic White's test"} 

440 >>> lm, lmp, f, fp = wlm(resid, exog) 

441 >>> print(f"White p-value: {lmp:.4f}") 

442 White p-value: 0.4558 

443 

444 ``` 

445 

446 ??? equation "Calculation" 

447 Squared residuals are regressed on all distinct variables in the cross-product of the original exogenous variables (including constant, linear terms, squares, and interactions): 

448 

449 $$ 

450 e_t^2 = \delta_0 + \sum \delta_i z_{it} + \sum \delta_{ij} z_{it} z_{jt} + u_t 

451 $$ 

452 

453 The LM statistic is $T \times R^2$ from this auxiliary regression, where $T$ is the number of observations. 

454 

455 ??? success "Credit" 

456 Calculations are performed by `statsmodels`. 

457 

458 ??? question "References" 

459 - White, H. (1980). A Heteroskedasticity-Consistent Covariance Matrix Estimator and a Direct Test for Heteroskedasticity. Econometrica, 48(4), 817-838. 

460 """ 

461 res = het_white(resid=resid, exog=exog_het) 

462 return (float(res[0]), float(res[1]), float(res[2]), float(res[3]))