[MathStat] 3. Large Sample Theory
This post is an overall summary of Chapter 5 of the textbook Statistical Inference by Casella and Berger.
Table of Contents
3.1. Convergence of Random Variables
- Convergence of Sequences
- Almost-sure convergence
- Convergence in probability
- Convergence in quadratic mean
- Convergence in distribution
- Relationships among various convergences
- Slutsky’s Theorem
3.2. Central Limit Theorem
- Classical CLT
- Lyapunov CLT
- Multivariate CLT
3.3. Delta Methods
- First-order delta method
- Second-order delta method
- Multivariate delta method
3.4. Supplements
- Dominated Convergence Theorem
- Continuous Mapping Theorem
- Stochastic Order Notation
3. Large Sample Theory
In this section, we’ll review some of the fundamental concepts and techniques in asymptotic statistics.
Statistical inference often requires the knowledge of the underlying distribution of statistics which however, cannot often be derived in a closed form.
To this end, one of the main goals in asymptotic statistics is to address this issue by observing what happens to the statistic under the assumption of infinite samples (i.e. $n \to \infty$). Thus, we want to find out the limiting distribution of a given statistic, which is often much easier to handle than the actual distribution albeit precise enough to be useful in practical scenarios.
3.1 Convergence of Random Variables
Convergence of Sequences
As a building block, let’s briefly review the concept of convergence in deterministic real numbers.
We say that a sequence of real numbers $a_1, a_2, \dots $ converges to a fixed real number $a$, if for every positive number $\epsilon$ there exists a natural number $N(\epsilon)$ such that for all $n \geq N(\epsilon), \quad|a_n - a| < \epsilon$.
If this is the case, then we call $a$ the limit of the sequence and denote $\underset{n \to \infty}{\text{lim} a_n = a}$.
Now we focus on how to extend of this notion to the “sequences” of random variables. Specifically, we will see the following convergences:
- almost sure convergence
- convergence in probability
- convergence in quadratic mean
- convergence in distribution
Almost-sure convergence
<Definition>
Conceptually, this can be thought of as if there is some set of “exceptional” events on which the random variables can disagree, but these exceptional events have probability converging to 0 as $n \to \infty$.
Convergence in probability
A sequence of random variables $X_1, \dots, X_n$ converges in probability to a random variable $X$ if for every $\epsilon > 0$, we have that
Conceptually, convergence in probability implies that as $n$ gets large, the distribution of $X_n$ gets more peaked around the value of convergence. Hence the random variable converges in a “probabilistic” sense.
A famous example of convergence in probability is the weak law of large numbers. Suppose that $Y_1,Y_2,\dots$ are i.i.d. with $E[Y]=\mu$ and $Var(Y_i) = \sigma^2 < \infty$, then
Convergence in quadratic mean
<Definition>
We say that a sequence converges to $X$ in quadratic mean if:
This convergence is often used to prove convergence in probability as it is a stronger condition.
Convergence in distribution
<Definition>
We say that a sequence converges to $X$ in distribution if:
This convergence is by far, the weakest form of convergence.
A fundamental example of convergence in distribution is the Central Limit Theorem (CLT) and we will see this in a second.
Relationships among various convergences
-
Convergence in probability does not imply almost sure convergence.
-
Convergence in quadratic mean imples convergence in probability (reverse is not true).
- Suppose that $X_1, \dots, X_n$ converges in quadratic mean to $X$. Then,
3.Convergence in probability imples convergence in distribution (reverse is not true).
- Although the mathematical detail is omitted here, this statement can be proved with the idea of “trapping” the CDF of $X_n$ by the CDF of $X$ with an interval of length converging to 0.
Slutsky’s Theorem
<Definition>
Note that in general, $X_n \overset{d}{\to} X$ and $Y_n \overset{d}{\to} Y$ does not guarentee that the sum and product also converge.
3.2 Central Limit Theorem
The central limit theorm is one of the most famous and important examples of convergence in distribution.
Classical CLT
<Definition>
Let $X_1, \dots, X_n$ be a independent sequence of random variables from an arbitrary probability distribution with finite mean and variance $\mu, \sigma^2$. Then for the sample mean $\bar X = \sum_{i=1}^n X_i / n$,
The variance $\sigma^2$ can be estimated by the sample variance $\hat \sigma^2$ and the CLT still holds (CLT with estimated variance).
Note that the CLT is incredibly general, in that it can be applied to any kind of random variables.
Consider that we want to perform a statistical test for the unknown mean $\mu$. In order to find the confidence interval, we have to know the underlying distribution of test statistic $T$. However, since CLT guarentees normal approximation of the sample mean (or equivalent summation), the confidence interval is asymptotically defined such that:
<Proof>
The CLT can be proved by using the property of mgfs. Specifically, we are going to use the following facts:
Thus, we have shown that the mgf converges to that of the standard normal, which completes the proof.
Lyapunov CLT
Suppose that $X_1, X_2, \dots$ are independent, but not necessarily identically distributed.
The CLT for the sum holds as desired, but it requires some extra conditions to ensure that one or a small number of abnormal random variables do not dominate the sum.
In this sense, the Lyapunov Central Limit Theorem guarentees normal approximation for this exceptional cases.
<Definition>
Suppose $X_1, \dots, X_n$ are independent. Let:
Suppose that the following Lyapunov condition is satisfied:
Then, it holds that:
Intuitively, what can happen for the non-identically distributed random variables is that only one random variable can possibly dominate the sum, so we are not really averaging diverse information among random variables. To this end, the Lyapunov condition is added to prevent such extreme cases.
Multivariate CLT
As its name explictly suggests, multivariate central limit theorem is an extension of the classical CLT to random vectors.
<Definition>
If $X_1, \dots, X_n$ are i.i.d. random vectors with mean vector $\mu \in \mathbb{R}^d$ and covariance matrix $\Sigma \in \mathbb{R}^{d \times d}$,
Although the dimension $d$ can be sufficiently larger than 1, it is taken to be fixed as $n \to \infty$.
For the cases where $d$ is also allowed to increase with respect to the sample size $n$, such high-dimensional CLTs are very complicated and are active topic of research.
3.3 Delta Method
Building on top of the CLT, a natural question is that what happens to the function of random variables $X_1, \dots, X_n$ which converges in distribution to the normal (i.e. $g(X_1), \dots, g(X_n)$).
First-order delta method
By applying the continuous mapping theorem (see section 3.4 of this post), we can derive the following (first-order) delta method:
Second-order delta method
However, there might be some instances where $g^\prime(\mu) = 0$. For these cases, the (second-order) delta method is:
Multivariate delta method
Suppose we have random vectors $X_1, \dots, X_n \in \mathbb{R}^d$, and $g: \mathbb{R}^d \to \mathbb{R}$ is a continuously differentiable function.
Then,
3.4 Supplements
Dominated Convergence Theorem
If ${f_n : \mathbb{R} \to \mathbb{R} }$ is a sequence of measureable functions which converge pointwise almost everywhere to $f$, and if there exists an integrable function $g$ which converges almost everywhere to $f$, and if there exists an integrable function $g$ such that $|f_n(x)| \leq |g(x)|$ for all $n$ and for all $x$, then $f$ is integrable and
Continuous Mapping Theorem
If a sequence $X_1, \dots, X_n$ converges in probability to $X$, then for any continuous function $h$:
A direct consequence of this theorem in terms of convergence in probability is:
Stochastic Order Notation
The stochastic order notation implies that:
The typical use case of stochastic order notation is the WLLN and CLT
Reference
- Casella, G., & Berger, R. L. (2002). Statistical inference. 2nd ed. Australia ; Pacific Grove, CA: Thomson Learning.