Refining the central limit theorem approximation via extreme value theory

https://doi.org/10.1016/j.spl.2019.108564Get rights and content

Abstract

We suggest approximating the distribution of the sum of independent and identically distributed random variables with a Pareto-like tail by combining extreme value approximations for the largest summands with a normal approximation for the sum of the smaller summands. If the tail is well approximated by a Pareto density, then this new approximation has substantially smaller error rates compared to the usual normal approximation for underlying distributions with finite variance and less than three moments. It can also provide an accurate approximation for some infinite variance distributions.

Introduction

Consider approximations to the distribution of the sum Sn=i=1nXi of independent mean-zero random variables Xi with distribution function F. If σ02=x2dF(x) exists, then n12Sn is asymptotically normal by the central limit theorem. The quality of this approximation is poor if maxin|Xi| is not much smaller than n12, since then a single non-normal random variable has non-negligible influence on n12Sn. Extreme value theory provides large sample approximations to the behavior of the largest observations, suggesting that it may be fruitfully employed in the derivation of better approximations to the distribution of Sn.

For simplicity, consider the case where F has a light left tail and a heavy right tail. Specifically, assume 0|x|3dF(x)< and limx1F(x)x1ξ=ω1ξ,ω>0for 13<ξ<1, so that the right tail of F is approximately Pareto with shape parameter 1ξ and scale parameter ω. Let Xi:n be the order statistics. For a given sequence k=k(n), 1k<n, split Sn into two pieces Sn=i=1nkXi:n+i=1kXni+1:n.Note that conditional on the nkth order statistic Tn=Xnk+1:n, i=1nkXi:n has the same distribution as i=1nkX̃i, where X̃i are i.i.d. from the truncated distribution F̃Tn(x) with F̃t(x)=F(x)F(t) for xt and F̃t(x)=1 otherwise. Let μ(t) and σ2(t) be the mean and variance of F̃t. Since F̃Tn is less skewed than F, one would expect the distributional approximation (denoted by “a”) of the central limit theorem, i=1nkXi:n|Tna(nk)μ(Tn)+(nk)12σ(Tn)Z   for  ZN(0,1)to be relatively accurate. At the same time, extreme value theory implies that under (1), i=1kXni+1:nanξωi=1kΓiξ for Γi=j=1iEjEji.i.d. exponential.Combining (3), (4) suggests Snan12σ(nξΓkξ)Z+(nk)μ(nξΓkξ)+nξωi=1kΓiξwith Z independent of (Γi)i=1k.

If ξ<12, the approximate Pareto tail (1) and E[X1]=0 imply μ(x)ω1ξx11ξ(1ξ)(1(xω)1ξ)and σ2(x)σ02ω1ξ112ξx21ξ for x large. From (nk)(nΓk)a1, this further yields Snan12σ02ω212ξ(Γkn)12ξ12Znξω1ξΓk1ξ+nξωi=1kΓiξwhich depends on F only through the unconditional variance σ02 and the two tail parameters (ω,ξ). Note that E[Γiξ]=Γ(iξ)Γ(i) and E[Γk1ξ]=Γ(1+kξ)Γ(k)=(1ξ)i=1kΓ(iξ)Γ(i), so the right-hand side of (6) is the sum of a mean-zero right skewed random variable, and a (dependent) random-scale mean-zero normal variable.

Our main Theorem 1 provides an upper bound on the convergence rate of the error in the approximation (6). The proof combines the Berry–Esseen bound for the central limit theorem approximation in (3) and the rate result in Corollary 5.5.5 of Reiss (1989) for the extreme value approximation in (4). If the tail of F is such that the approximation in (4) is accurate, then for both fixed and diverging k the error in (6) converges to zero faster than the error in the usual mean-zero normal approximation. The approximation (6) thus helps illuminate the nature and origin of the leading error terms in the first order normal approximation, as derived in Chapter 2 of Hall (1982), for such F. We also provide a characterization of the bound minimizing choice of k.

If ξ>12, then the distribution of nξSn converges to a one-sided stable law with index ξ. An elegant argument by LePage et al. (1981) shows that this limiting law can be written as ωi=1Γiξ. The approximation (5) thus remains potentially accurate under k also for infinite variance distributions. To obtain a further approximation akin to (6), note that (1) implies σ2(ωx)σ2(ωy)(ω2ξ)yxt11ξdt for large x,y. Let un=(nk)ξ. Then Snan12σ2(ωun)+ω2ξun(nΓk)ξy11ξdy12Znξω1ξΓk1ξ+nξωi=1kΓiξwhich depends on F only through the tail parameters (ω,ξ) and the sequence of truncated variances σ2(ωun). The approximation (7) could also be applied to the case ξ<12, so that one obtains a unifying approximation for values of ξ both smaller and larger than 12. Indeed, for F mean-centered Pareto of index ξ, the results below imply that for suitable choice of k, this approximation has an error that converges to zero much faster than the error from the first order approximation via the normal or non-normal stable limit for ξ close to 12. The approach here thus also sheds light on the nature of the leading error terms of the non-normal stable limit, such as those derived by Christoph and Wolf (1992).

For ξ>12, the idea of splitting up Sn as in (2) and to jointly analyze the asymptotic behavior of the pieces is already pursued in Csörgö et al. (1988). The contribution here is to derive error rates for resulting approximation to the distribution of the sum, especially for 13<ξ<12, and to develop the additional approximation of the truncated mean and variance induced by the approximate Pareto tail.

The next section formalizes these arguments and discusses various forms of writing the variance term and the approximation for the case where both tails are heavy. Section 3 contains the proofs.

Section snippets

Assumptions and main results

The following condition imposes the right tail of F to be in the δ-neighborhood of the Pareto distribution with index ξ, as defined in Chapter 2 of Falk et al. (2004).

Condition 1

For some x0,δ,ω,LF>0 and 13<ξ<1,F(x) admits a density for all xx0 of the form f(x)=(ωξ)1(xω)1ξ1(1+h(x))with |h(x)|LFxδξ uniformly in xx0.

As discussed in Falk et al. (2004), Condition 1 can be motivated by considering the remainder in the von Mises condition for extreme value theory. It is also closely related to the

Proofs

Let Xne=(Xnk+1:n,Xnk:n,,Xn:n). The proof of Theorem 1 relies heavily on Corollary 5.5.5 of Reiss (1989) (also see Theorem 2.2.4 of Falk et al. (2004)), which implies that under Condition 1, supBk|P(nξω1XneBk)P((Γkξ,Γk1ξ,,Γ1ξ)Bk)|C((kn)δk12+kn)where the supremum is over Borel sets Bk in Rk.

Without loss of generality, assume x0>e, 1(x0ω)1ξ>0 and σ02ω1ξx012ξ(12ξ)>0. We first prove two elementary lemmas. Let L denote a generic positive constant that does not depend on x or

References (13)

There are more references available in the full text version of this article.
1

Funding via NSF grant SES-1627660 is gratefully acknowledged.

View full text