[MathStat] 2. Statistical Inequalities
This post is an overall summary of Chapter 3, 4 of the textbook Statistical Inference by Casella and Berger.
Table of Contents
2.1. Inequalities for expectations
- Jensen’s Inequality
- Hölder’s Inequality
- Minkowski’s Inequality
- Association Inequality
2.2. Inequalities for variances
- Efron-Stein Inequality
2.3 Inequalities for probabilities
- Markov’s Inequality
- Chebyshev’s Inequality
- Hoeffding’s Inequality
- Dvoretzky-Keifer-Wolfowitz (DKW) Inequality
2. Statistical Inequalities
One of the main goals in statistics and related fields is to make an inference about unknown parameters. For this purpose, we define an estimator that can reasonably surrogate the true parameter.
However, the problem is that there are many possible choices of estimators. So we often consider a risk function between the true parameter $\theta$ and its estimator $\hat \theta$ to evaluate its performance.
Nevertheless, the problem still exists in that computing a risk function is not so simple . Thus, we make use of various inequalities to upper bound the risk function by a function that is more easy to manipulate and study how fast the risk function goes to zero as sample sizes increase.
2.1 Inequalities for expectations
Jensen’s Inequality
For a convex function $g$ such that
for all $\lambda \in (0,1)$ and $x,y \in \mathbb{R}$. Then provided that expectations exist for both random variables,
- An application of Jensen’s Inequality is the relationship between different means.
- Another famous application is the positiveness of Kullback-Leibler divergence.
Hölder’s Inequality
For $p, q \in (1,\infty)$ with $\frac{1}{p} + \frac{1}{q} = 1$,
- A special case of Hölder’s Inequality when $p=q=2$ is the Cauchy-Schwarz inequality such that:
- By applying the Cauchy-Schwarz inequality, we have the covariance inequality such that:
Minkowski’s Inequality
For two random variables $X$ and $Y$ and a constant $1 < p < \infty$,
which implies triangle inequality in p-dimensional space.
Association Inequality
Suppose we have a random variable $X$ and functions $f, g: \mathbf{R} \to \mathbf{R}$. Assuming that all expectations are well-defined, we have:
- By a direct application of association inequality, we have:
2.2 Inequalities for variances
Efron-Stein Inequality
Suppose that $X_1, \dots, X_n, X_1^\prime, X_n^\prime$ are independent with $X_i$ and $X_i^\prime$ having the same distribution for all $i \in {1,\dots,n}$. Let $X = (X_1, \dots, X_n)$ and $X^{(i)} = (X_1, \dots, X_{i-1}, X_i^\prime, X_{i+1}, \dots, X_n)$. Then,
- As an example, we can apply the Efron-Stein Inequality to the mean function such that:
2.3 Inequalities for probabilities
Markov’s Inequality
For any $t \geq 0$ and integrable nonnegative random variable $X$, we have:
Markov’s Inequality is the weakest inequality, though it cannot be improved in general, unless we put some restrictions on the shape of a distribution function. For example, if we assume that the density is non-increasing (i.e. $f_X^\prime(x) \leq 0$ for all $x \geq 0$), we have:
Chebyshev’s Inequality
Let $X$ be a random variable with finite mean $\mu$ and finite variance $\sigma^2 > 0$. Then,
- The proof is simple, which is just to square both sides of the equation and apply Markov’s inequality.
Hoeffding’s Inequality
Before digging into the Hoeffding’s Inequality, let’s recall Hoeffiding’s lemma such that:
Based on this lemma, suppose $X_1, \dots, X_n$ are independent bounded random variables in the range of $[a_i, b_i]$ for each $X_i$. Then, for any $t > 0$, we have
Note that the upper bound of Chebyshev’s Inequality goes up polynomially, while the bound of Hoeffding’s Inequality goes up logarithmically. Suppose that we have i.i.d. Rademacher random variables such that $E[X] = 0$ and $Var(X) = 1$, let’s apply both inequalities to the sample mean of $X_1, \dots, X_n$. Then we have:
Thus, suppose we assume $\delta = 0.001$. Then with probability 0.999, the sample mean is within:
Now it is clear that Hoeffing’s inequality is a huge improvement over the bound of Chebyshev’s inequality.
Dvoretzky-Keifer-Wolfowitz (DKW) Inequality
DKW inequality can be useful when we want to upper bound the deviation of an emphirical cdf from the true pdf.
Specifically, there exists a finite positive constant $C$ such that
- The best possible constant $C$ is known as 2 due to Massart (1990).
Reference
- Casella, G., & Berger, R. L. (2002). Statistical inference. 2nd ed. Australia ; Pacific Grove, CA: Thomson Learning.