[MathStat] 2. Statistical Inequalities

This post is an overall summary of Chapter 3, 4 of the textbook Statistical Inference by Casella and Berger.

Table of Contents

2.1. Inequalities for expectations

  • Jensen’s Inequality
  • Hölder’s Inequality
  • Minkowski’s Inequality
  • Association Inequality

2.2. Inequalities for variances

  • Efron-Stein Inequality

2.3 Inequalities for probabilities

  • Markov’s Inequality
  • Chebyshev’s Inequality
  • Hoeffding’s Inequality
  • Dvoretzky-Keifer-Wolfowitz (DKW) Inequality

 


2. Statistical Inequalities

One of the main goals in statistics and related fields is to make an inference about unknown parameters. For this purpose, we define an estimator that can reasonably surrogate the true parameter.

However, the problem is that there are many possible choices of estimators. So we often consider a risk function between the true parameter θ and its estimator ˆθ to evaluate its performance.

Nevertheless, the problem still exists in that computing a risk function is not so simple . Thus, we make use of various inequalities to upper bound the risk function by a function that is more easy to manipulate and study how fast the risk function goes to zero as sample sizes increase.

 


2.1 Inequalities for expectations

Jensen’s Inequality

For a convex function g such that

λg(x)+(1λ)g(y)g(λx+(1λ)y)

for all λ(0,1) and x,yR. Then provided that expectations exist for both random variables,

E[g(X)]g(E[X])
  • An application of Jensen’s Inequality is the relationship between different means.
Arithmetric Mean (AM)=ni=1xinGeometric Mean (GM)=(ni=1xi)1/nHarmonic Mean (HM)=11nni=11xi
HMGMAM
  • Another famous application is the positiveness of Kullback-Leibler divergence.
DKLP(||Q)=xp(x)log(p(x)q(x))=E[log(p(X)q(X))]=E[log(q(X)p(X))]0

 

Hölder’s Inequality

For p,q(1,) with 1p+1q=1,

|E[XY]|E[|XY|](E[|X|p])1/p(E[|Y|q])1/q
  • A special case of Hölder’s Inequality when p=q=2 is the Cauchy-Schwarz inequality such that:
|E[XY]|E[|XY|]E[X2]E[Y2]
  • By applying the Cauchy-Schwarz inequality, we have the covariance inequality such that:
Cov(X,Y)=E[(XμX)(YμY){E[(XμX)2]}1/2{E[(YμY)2]}1/2
{Cov(X,Y)}2Var(X)Var(Y)

 

Minkowski’s Inequality

For two random variables X and Y and a constant 1<p<,

(E[|X+Y|p])1/p(E[|X|p])1/p(E[|Y|q])1/q
||X+Y||p||X||p+||Y||p

which implies triangle inequality in p-dimensional space.

 

Association Inequality

Suppose we have a random variable X and functions f,g:RR. Assuming that all expectations are well-defined, we have:

1.f,g non-decreasing implies E[f(X)g(X)]E[f(X)]E[g(X)]2.f,g non-increasing implies E[f(X)g(X)]E[f(X)]E[g(X)]3.f non-decreasing and g non-increasing implies E[f(X)g(X)]E[f(X)]E[g(X)]
  • By a direct application of association inequality, we have:
E[X4]E[X]E[X3]
E[XeX]E[X]E[eX]
E[X1(Xa)]E[X]P(Xa)

 


2.2 Inequalities for variances

Efron-Stein Inequality

Suppose that X1,,Xn,X1,Xn are independent with Xi and Xi having the same distribution for all i1,,n. Let X=(X1,,Xn) and X(i)=(X1,,Xi1,Xi,Xi+1,,Xn). Then,

Var[f(X)]12ni=1E[{f(X)f(X(i))}2]
  • As an example, we can apply the Efron-Stein Inequality to the mean function such that:
For f(X)=1nni=1Xi,Var(f(X))ni=112n2E[(XiXi)2]=Var(X)n

 


2.3 Inequalities for probabilities

Markov’s Inequality

For any t0 and integrable nonnegative random variable X, we have:

P(Xt)E[X]t
E[X]tP(Xt)

Markov’s Inequality is the weakest inequality, though it cannot be improved in general, unless we put some restrictions on the shape of a distribution function. For example, if we assume that the density is non-increasing (i.e. fX(x)0 for all x0), we have:

P(Xt)E[X]2t

 

Chebyshev’s Inequality

Let X be a random variable with finite mean μ and finite variance σ2>0. Then,

P(|Xμ|t)Var(X)t2
  • The proof is simple, which is just to square both sides of the equation and apply Markov’s inequality.

 

Hoeffding’s Inequality

Before digging into the Hoeffding’s Inequality, let’s recall Hoeffiding’s lemma such that:

E[exp(tX)]exp(18t2(ba)2)

Based on this lemma, suppose X1,,Xn are independent bounded random variables in the range of [ai,bi] for each Xi. Then, for any t>0, we have

P{|ni=1(XiE[Xi])|t}exp(2t2ni=1(biai)2)

Note that the upper bound of Chebyshev’s Inequality goes up polynomially, while the bound of Hoeffding’s Inequality goes up logarithmically. Suppose that we have i.i.d. Rademacher random variables such that E[X]=0 and Var(X)=1, let’s apply both inequalities to the sample mean of X1,,Xn. Then we have:

Chebyshev:P(|ˉXE[ˉX]|t)1nt2P(|ˉXE[ˉX]|1nδ)δ,(δ=1nt2)Hoeffding:P(|ˉXE[ˉX]|t)2exp(nt22)P(|ˉXE[ˉX]|2nlog(2δ))δ,(δ=2exp(nt22))

Thus, suppose we assume δ=0.001. Then with probability 0.999, the sample mean is within:

Chebyshev:1n×0.00131nHoeffding:2nlog(20.001)2.57n

Now it is clear that Hoeffing’s inequality is a huge improvement over the bound of Chebyshev’s inequality.

 

Dvoretzky-Keifer-Wolfowitz (DKW) Inequality

DKW inequality can be useful when we want to upper bound the deviation of an emphirical cdf from the true pdf.

Specifically, there exists a finite positive constant C such that

Pr(supxR|Fn(x)F(x)|t)Ce2nt2,(for all t0)

 


Reference

  • Casella, G., & Berger, R. L. (2002). Statistical inference. 2nd ed. Australia ; Pacific Grove, CA: Thomson Learning.

You might also enjoy