Coverage for src/ts_stat_tests/heteroscedasticity/algorithms.py: 100%

1# ============================================================================ #

2# #

3# Title: Heteroscedasticity Algorithms #

4# Purpose: Implementation of heteroscedasticity tests. #

5# #

6# ============================================================================ #

9# ---------------------------------------------------------------------------- #

10# #

11# Overview ####

12# #

13# ---------------------------------------------------------------------------- #

16# ---------------------------------------------------------------------------- #

17# Description ####

18# ---------------------------------------------------------------------------- #

21"""

22!!! note "Summary"

23 This module implements various heteroscedasticity tests including:

24 - ARCH Test

25 - Breusch-Pagan Test

26 - Goldfeld-Quandt Test

27 - White's Test

28"""

31# ---------------------------------------------------------------------------- #

32# #

33# Setup ####

34# #

35# ---------------------------------------------------------------------------- #

38# ---------------------------------------------------------------------------- #

39# Imports ####

40# ---------------------------------------------------------------------------- #

43# ## Python StdLib Imports ----

44from typing import (

45 Literal,

46 Optional,

47 Union,

48 cast,

49 overload,

50)

52# ## Python Third Party Imports ----

53from numpy.typing import ArrayLike

54from statsmodels.stats.diagnostic import (

55 ResultsStore,

56 het_arch,

57 het_breuschpagan,

58 het_goldfeldquandt,

59 het_white,

60)

61from typeguard import typechecked

64# ---------------------------------------------------------------------------- #

65# Exports ####

66# ---------------------------------------------------------------------------- #

69__all__: list[str] = ["arch", "bpl", "gq", "wlm"]

72## --------------------------------------------------------------------------- #

73## Constants ####

74## --------------------------------------------------------------------------- #

77VALID_GQ_ALTERNATIVES_OPTIONS = Literal["two-sided", "increasing", "decreasing"]

80# ---------------------------------------------------------------------------- #

81# #

82# Algorithms ####

83# #

84# ---------------------------------------------------------------------------- #

87@overload

88def arch(

89 resid: ArrayLike, nlags: Optional[int] = None, ddof: int = 0, *, store: Literal[False] = False

90) -> tuple[float, float, float, float]: ...

91@overload

92def arch(

93 resid: ArrayLike, nlags: Optional[int] = None, ddof: int = 0, *, store: Literal[True]

94) -> tuple[float, float, float, float, ResultsStore]: ...

95@typechecked

96def arch(resid: ArrayLike, nlags: Optional[int] = None, ddof: int = 0, *, store: bool = False) -> Union[

97 tuple[float, float, float, float],

98 tuple[float, float, float, float, ResultsStore],

99]:

100 r"""

101 !!! note "Summary"

102 Engle's Test for Autoregressive Conditional Heteroscedasticity (ARCH).

103

104 ???+ abstract "Details"

105 This test is used to determine whether the residuals of a time-series model exhibit ARCH effects. ARCH effects are characterized by clusters of volatility, where periods of high volatility are followed by periods of high volatility, and vice versa. The test is essentially a Lagrange Multiplier (LM) test for autocorrelation in the squared residuals.

106

107 Params:

108 resid (ArrayLike):

109 The residuals from a linear regression model.

110 nlags (Optional[int]):

111 The number of lags to include in the test regression. If `None`, the number of lags is determined based on the number of observations.

112 Default: `None`

113 ddof (int):

114 Degrees of freedom to adjust for in the calculation of the F-statistic.

115 Default: `0`

116 store (bool):

117 Whether to return a `ResultsStore` object containing additional test results.

118 Default: `False`

119

120 Returns:

121 (Union[tuple[float, float, float, float], tuple[float, float, float, float, ResultsStore]]):

122 A tuple containing:

123 - `lmstat` (float): The Lagrange Multiplier statistic.

124 - `lmpval` (float): The p-value for the LM statistic.

125 - `fstat` (float): The F-statistic.

126 - `fpval` (float): The p-value for the F-statistic.

127 - `resstore` (ResultsStore, optional): Returned only if `store` is `True`.

128

129 ???+ example "Examples"

130

131 ```pycon {.py .python linenums="1" title="Setup"}

132 >>> import statsmodels.api as sm

133 >>> from ts_stat_tests.heteroscedasticity.algorithms import arch

134 >>> from ts_stat_tests.utils.data import data_line, data_random

135 >>> X = sm.add_constant(data_line)

136 >>> y = 2 * data_line + data_random

137 >>> res = sm.OLS(y, X).fit()

138 >>> resid = res.resid

139

140 ```

141

142 ```pycon {.py .python linenums="1" title="Example 1: Basic ARCH test"}

143 >>> lm, lmp, f, fp = arch(resid)

144 >>> print(f"LM p-value: {lmp:.4f}")

145 LM p-value: 0.9124

146

147 ```

148

149 ??? equation "Calculation"

150 The test is performed by regressing the squared residuals $e_t^2$ on a constant and $q$ lags of the squared residuals:

151

152 $$

153 e_t^2 = \gamma_0 + \gamma_1 e_{t-1}^2 + \gamma_2 e_{t-2}^2 + \dots + \gamma_q e_{t-q}^2 + \nu_t

154 $$

155

156 The null hypothesis of no ARCH effects is:

157

158 $$

159 H_0: \gamma_1 = \gamma_2 = \dots = \gamma_q = 0

160 $$

161

162 The LM statistic is calculated as $T \times R^2$ from this regression, where $T$ is the number of observations and $R^2$ is the coefficient of determination.

163

164 ??? success "Credit"

165 Calculations are performed by `statsmodels`.

166

167 ??? question "References"

168 - Engle, R. F. (1982). Autoregressive Conditional Heteroscedasticity with Estimates of the Variance of United Kingdom Inflation. Econometrica, 50(4), 987-1007.

169 """

170 if store:

171 res_5 = cast(

172 tuple[float, float, float, float, ResultsStore],

173 het_arch(resid=resid, nlags=nlags, store=True, ddof=ddof),

174 )

175 return (

176 float(res_5[0]),

177 float(res_5[1]),

178 float(res_5[2]),

179 float(res_5[3]),

180 res_5[4],

181 )

182

183 res_4 = cast(

184 tuple[float, float, float, float],

185 het_arch(resid=resid, nlags=nlags, store=False, ddof=ddof),

186 )

187 return (float(res_4[0]), float(res_4[1]), float(res_4[2]), float(res_4[3]))

188

189

190@typechecked

191def bpl(resid: ArrayLike, exog_het: ArrayLike, robust: bool = True) -> tuple[float, float, float, float]:

192 r"""

193 !!! note "Summary"

194 Breusch-Pagan Lagrange Multiplier Test for Heteroscedasticity.

195

196 ???+ abstract "Details"

197 This test checks whether the variance of the errors in a regression model depends on the values of the independent variables. If it does, the errors are heteroscedastic. The null hypothesis assumes homoscedasticity (constant variance).

198

199 Params:

200 resid (ArrayLike):

201 The residuals from a linear regression model.

202 exog_het (ArrayLike):

203 The explanatory variables for the variance (heteroscedasticity). Usually, these are the same as the original regression's exogenous variables.

204 robust (bool):

205 Whether to use a robust version of the test that does not assume the errors are normally distributed (Koenker's version).

206 Default: `True`

207

208 Returns:

209 (tuple[float, float, float, float]):

210 A tuple containing:

211 - `lmstat` (float): The Lagrange Multiplier statistic.

212 - `lmpval` (float): The p-value for the LM statistic.

213 - `fstat` (float): The F-statistic.

214 - `fpval` (float): The p-value for the F-statistic.

215

216 ???+ example "Examples"

217

218 ```pycon {.py .python linenums="1" title="Setup"}

219 >>> import statsmodels.api as sm

220 >>> from ts_stat_tests.heteroscedasticity.algorithms import bpl

221 >>> from ts_stat_tests.utils.data import data_line, data_random

222 >>> X = sm.add_constant(data_line)

223 >>> y = 2 * data_line + data_random

224 >>> res = sm.OLS(y, X).fit()

225 >>> resid, exog = res.resid, X

226

227 ```

228

229 ```pycon {.py .python linenums="1" title="Example 1: Basic Breusch-Pagan test"}

230 >>> lm, lmp, f, fp = bpl(resid, exog)

231 >>> print(f"LM p-value: {lmp:.4f}")

232 LM p-value: 0.2461

233

234 ```

235

236 ??? equation "Calculation"

237 The test first fits a regression of squared residuals (or standardized version) on the specified exogenous variables:

238

239 $$

240 e_t^2 = \delta_0 + \delta_1 z_{t1} + \dots + \delta_k z_{tk} + u_t

241 $$

242

243 The null hypothesis is:

244

245 $$

246 H_0: \delta_1 = \dots = \delta_k = 0

247 $$

248

249 Koenker's robust version uses the scores of the likelihood function and does not require the normality assumption.

250

251 ??? success "Credit"

252 Calculations are performed by `statsmodels`.

253

254 ??? question "References"

255 - Breusch, T. S., & Pagan, A. R. (1979). A Simple Test for Heteroscedasticity and Random Coefficient Variation. Econometrica, 47(5), 1287-1294.

256 - Koenker, R. (1981). A Note on Studentizing a Test for Heteroscedasticity. Journal of Econometrics, 17(1), 107-112.

257 """

258 res = het_breuschpagan(resid=resid, exog_het=exog_het, robust=robust)

259 return (float(res[0]), float(res[1]), float(res[2]), float(res[3]))

260

261

262@overload

263def gq(

264 y: ArrayLike,

265 x: ArrayLike,

266 idx: Optional[int] = None,

267 split: Optional[Union[int, float]] = None,

268 drop: Optional[Union[int, float]] = None,

269 alternative: VALID_GQ_ALTERNATIVES_OPTIONS = "increasing",

270 *,

271 store: Literal[False] = False,

272) -> tuple[float, float, str]: ...

273@overload

274def gq(

275 y: ArrayLike,

276 x: ArrayLike,

277 idx: Optional[int] = None,

278 split: Optional[Union[int, float]] = None,

279 drop: Optional[Union[int, float]] = None,

280 alternative: VALID_GQ_ALTERNATIVES_OPTIONS = "increasing",

281 *,

282 store: Literal[True],

283) -> tuple[float, float, str, ResultsStore]: ...

284@typechecked

285def gq(

286 y: ArrayLike,

287 x: ArrayLike,

288 idx: Optional[int] = None,

289 split: Optional[Union[int, float]] = None,

290 drop: Optional[Union[int, float]] = None,

291 alternative: VALID_GQ_ALTERNATIVES_OPTIONS = "increasing",

292 *,

293 store: bool = False,

294) -> Union[

295 tuple[float, float, str],

296 tuple[float, float, str, ResultsStore],

297]:

298 r"""

299 !!! note "Summary"

300 Goldfeld-Quandt Test for Heteroscedasticity.

301

302 ???+ abstract "Details"

303 The Goldfeld-Quandt test checks for heteroscedasticity by dividing the dataset into two subsets (usually at the beginning and end of the sample) and comparing the variance of the residuals in each subset using an F-test.

304

305 Params:

306 y (ArrayLike):

307 The dependent variable (endogenous).

308 x (ArrayLike):

309 The independent variables (exogenous).

310 idx (Optional[int]):

311 The column index of the variable to sort by. If `None`, the data is assumed to be ordered.

312 Default: `None`

313 split (Optional[Union[int, float]]):

314 The index at which to split the sample. If a float between 0 and 1, it represents the fraction of observations.

315 Default: `None`

316 drop (Optional[Union[int, float]]):

317 The number of observations to drop in the middle. If a float between 0 and 1, it represents the fraction of observations.

318 Default: `None`

319 alternative (VALID_GQ_ALTERNATIVES_OPTIONS):

320 The alternative hypothesis. Options are `"increasing"`, `"decreasing"`, or `"two-sided"`.

321 Default: `"increasing"`

322 store (bool):

323 Whether to return a `ResultsStore` object.

324 Default: `False`

325

326 Returns:

327 (Union[tuple[float, float, str], tuple[float, float, str, ResultsStore]]):

328 A tuple containing:

329 - `fstat` (float): The F-statistic.

330 - `fpval` (float): The p-value for the F-statistic.

331 - `alternative` (str): The alternative hypothesis used.

332 - `resstore` (ResultsStore, optional): Returned only if `store` is `True`.

333

334 ???+ example "Examples"

335

336 ```pycon {.py .python linenums="1" title="Setup"}

337 >>> import statsmodels.api as sm

338 >>> from ts_stat_tests.utils.data import data_line, data_random

339 >>> from ts_stat_tests.heteroscedasticity.algorithms import gq

340 >>> X = sm.add_constant(data_line)

341 >>> y = 2 * data_line + data_random

342

343 ```

344

345 ```pycon {.py .python linenums="1" title="Example 1: Basic Goldfeld-Quandt test"}

346 >>> f, p, alt = gq(y, X)

347 >>> print(f"F p-value: {p:.4f}")

348 F p-value: 0.2269

349

350 ```

351

352 ??? equation "Calculation"

353 The dataset is split into two samples after sorting by an independent variable (or using the natural order). Separate regressions are run on each sample:

354

355 $$

356 RSS_1 = \sum e_{1,t}^2, \quad RSS_2 = \sum e_{2,t}^2

357 $$

358

359 The test statistic is the ratio of variances:

360

361 $$

362 F = \frac{RSS_2 / df_2}{RSS_1 / df_1}

363 $$

364

365 where $RSS_i$ are the residual sum of squares and $df_i$ are the degrees of freedom.

366

367 ??? success "Credit"

368 Calculations are performed by `statsmodels`.

369

370 ??? question "References"

371 - Goldfeld, S. M., & Quandt, R. E. (1965). Some Tests for Homoscedasticity. Journal of the American Statistical Association, 60(310), 539-547.

372 """

373 if store:

374 res_4 = cast(

375 tuple[float, float, str, ResultsStore],

376 het_goldfeldquandt(

377 y=y,

378 x=x,

379 idx=idx,

380 split=split,

381 drop=drop,

382 alternative=alternative,

383 store=True,

384 ),

385 )

386 return (float(res_4[0]), float(res_4[1]), str(res_4[2]), res_4[3])

387

388 res_3 = cast(

389 tuple[float, float, str],

390 het_goldfeldquandt(

391 y=y,

392 x=x,

393 idx=idx,

394 split=split,

395 drop=drop,

396 alternative=alternative,

397 store=False,

398 ),

399 )

400 return (float(res_3[0]), float(res_3[1]), str(res_3[2]))

401

402

403@typechecked

404def wlm(resid: ArrayLike, exog_het: ArrayLike) -> tuple[float, float, float, float]:

405 r"""

406 !!! note "Summary"

407 White's Test for Heteroscedasticity.

408

409 ???+ abstract "Details"

410 White's test is a general test for heteroscedasticity that does not require a specific functional form for the variance of the error terms. It is essentially a test of whether the squared residuals can be explained by the levels, squares, and cross-products of the independent variables.

411

412 Params:

413 resid (ArrayLike):

414 The residuals from a linear regression model.

415 exog_het (ArrayLike):

416 The explanatory variables for the variance. Usually, these are the original exogenous variables; the test internally handles adding their squares and cross-products.

417

418 Returns:

419 (tuple[float, float, float, float]):

420 A tuple containing:

421 - `lmstat` (float): The Lagrange Multiplier statistic.

422 - `lmpval` (float): The p-value for the LM statistic.

423 - `fstat` (float): The F-statistic.

424 - `fpval` (float): The p-value for the F-statistic.

425

426 ???+ example "Examples"

427

428 ```pycon {.py .python linenums="1" title="Setup"}

429 >>> import statsmodels.api as sm

430 >>> from ts_stat_tests.heteroscedasticity.algorithms import wlm

431 >>> from ts_stat_tests.utils.data import data_line, data_random

432 >>> X = sm.add_constant(data_line)

433 >>> y = 2 * data_line + data_random

434 >>> res = sm.OLS(y, X).fit()

435 >>> resid, exog = res.resid, X

436

437 ```

438

439 ```pycon {.py .python linenums="1" title="Example 1: Basic White's test"}

440 >>> lm, lmp, f, fp = wlm(resid, exog)

441 >>> print(f"White p-value: {lmp:.4f}")

442 White p-value: 0.4558

443

444 ```

445

446 ??? equation "Calculation"

447 Squared residuals are regressed on all distinct variables in the cross-product of the original exogenous variables (including constant, linear terms, squares, and interactions):

448

449 $$

450 e_t^2 = \delta_0 + \sum \delta_i z_{it} + \sum \delta_{ij} z_{it} z_{jt} + u_t

451 $$

452

453 The LM statistic is $T \times R^2$ from this auxiliary regression, where $T$ is the number of observations.

454

455 ??? success "Credit"

456 Calculations are performed by `statsmodels`.

457

458 ??? question "References"

459 - White, H. (1980). A Heteroskedasticity-Consistent Covariance Matrix Estimator and a Direct Test for Heteroskedasticity. Econometrica, 48(4), 817-838.

460 """

461 res = het_white(resid=resid, exog=exog_het)

462 return (float(res[0]), float(res[1]), float(res[2]), float(res[3]))

Coverage for src / ts_stat_tests / heteroscedasticity / algorithms.py: 100%

28 statements