AMS 507 | Applied Mathematics & Statistics

EECS 126 104D DIS 104D

CAS MA 213

Laplace’s Central “Limit” Theorem

Sums of independent random variables had played an important role in Laplace’s probabilistic work from the very beginning. the problem of calculating the probability distribution of the sum of angles of inclination, which were assumed to be determined randomly, as well as the related problem of calculating the probabilities of the deviations between the arithmetic mean of data which were afflicted by observational errors and the underlying “true value,” became especially important. In one of his first published papers, Laplace had already set out to determine the probability that the sum of the angles of inclination of comet orbits (or the arithmetic mean of these angles respectively) is within given limits. He assumed that all angles, which had to be measured against the ecliptic, were distributed randomly according to a uniform distribution between $0^{\circ}$ and $90^{\circ}$ (and also tacitly presupposed that all angles were stochastically independent). In this 1781 paper, Laplace even introduced a general—however very intricate — method, based on convolutions of density functions, in order to exactly determine the probability that a sum of independent random variables (“quantités variables,” as Laplace put it) was within given limits. In the most simple case, each of the $n$ variables had the same rectangular distribution between 0 and $h$. For the probability $P$ that the sum of those variables was between $a$ and $b$ with $0 \leq a<b \leq n h$, Laplace obtained (in modern notation)
$$P=\frac{1}{h^n n !}\left(\sum_{i=0}^N\left(\begin{array}{l} n \ i \end{array}\right)(-1)^i(b-i h)^n-\sum_{i=0}^M\left(\begin{array}{l} n \ i \end{array}\right)(-1)^i(a-i h)^n\right),$$
where $N=\min \left(n,\left[\frac{b}{h}\right]\right)$ and $M=\min \left(n,\left[\frac{a}{h}\right]\right)$. Formulae of this kind were too complicated for a direct numerical evaluation if the number of random variables exceeded a relatively small value. The arithmetic mean of the actual observed angles of inclination of the then known 63 comets was $46^{\circ} 16^{\prime}$. Through the use of it alone, Laplace was unable to address the hypothesis that the comets’ planes of motion resulted at “random.” At this stage of his mathematical work, however, Laplace could not develop usable approximations.

From
$$\frac{1}{2 \pi} \int_{-\pi}^\pi \mathrm{e}^{-\mathrm{i} t x} \mathrm{e}^{\mathrm{i} s x} d x=\delta_{t s} \quad(t, s \in \mathbb{Z})$$
it follows that
$$P(j)=\frac{1}{2 \pi} \int_{-\pi}^\pi \mathrm{e}^{-\mathrm{i} j x}\left[\sum_{k=-m}^m p_k \mathrm{e}^{\mathrm{i} k x}\right]^n d x .$$
The last integral above was at least formally accessible to Laplace’s method of approximation. There was, however, a certain modification necessary, as Laplace did not consider an expansion of the whole integrand around its maximum at $x=0$, but only of the power with exponent $n$ (equal to the characteristic function). By expanding $\mathrm{e}^{\mathrm{i} k x}$ into powers of $x$ one gets

\begin{aligned} P(j)=\frac{1}{2 \pi} & \int_{-\pi}^\pi \mathrm{e}^{-\mathrm{i} j x}\left[\sum_{k=-m}^m p_k \mathrm{e}^{\mathrm{i} k x}\right]^n d x \ & =\frac{1}{2 \pi} \int_{-\pi}^\pi \mathrm{e}^{-\mathrm{i} j x}\left[\sum_{k=-m}^m p_k\left(1+\mathrm{i} k x-\frac{k^2 x^2}{2}-\frac{\mathrm{i} k^3 x^3}{6}+\cdots\right]^n d x .\right. \end{aligned}
Taking into consideration that $\sum_{k=-m}^m p_k k=0$, and with the substitution $m^2 \sigma^2=$ $\sum_{k=-m}^m p_k k^2$, we obtain
$$P(j)=\frac{1}{2 \pi} \int_{-\pi}^\pi \mathrm{e}^{-\mathrm{i} j x}\left[1-\frac{m^2 \sigma^2 x^2}{2}-\mathrm{i} A x^3+\cdots\right]^n d x,$$
where $A$ is a constant depending on $\sum_{k=-m}^m p_k k^3$. The formal expansion of
$$\log \left[1-\frac{m^2 \sigma^2 x^2}{2}-\mathrm{i} A x^3+\cdots\right]^n=: \log z$$
into a series of powers of $x$ leads to
$$\log z=-\frac{m^2 \sigma^2 n x^2}{2}-\mathrm{i} A n x^3+\cdots,$$
and therefrom to
$$z=\mathrm{e}^{-\frac{m^2 \sigma^2 n x^2}{2}-\mathrm{i} A n x^3+\cdots}=\mathrm{e}^{-\frac{m^2 \sigma^2 n x^2}{2}}\left(1-\mathrm{i} A n x^3+\cdots\right) .$$

After transforming the variable of integration according to $x=\frac{y}{\sqrt{n}}$, the result is
$$P(j)=\frac{1}{2 \pi \sqrt{n}} \int_{-\pi \sqrt{n}}^{\pi \sqrt{n}} \mathrm{e}^{-\mathrm{i} j \frac{y}{\sqrt{n}}} \mathrm{e}^{-\frac{m^2 \sigma^2 y^2}{2}}\left(1-\frac{\mathrm{i} A y^3}{\sqrt{n}}+\cdots\right) d y .$$
For an approximation with a “very large” $n$ we ignore, like Laplace, all series terms with a power of $\sqrt{n}$ in the denominator, and at the same time, set the limits of integration equal to $\pm \infty$. In this way we get
$$P(j) \approx \frac{1}{2 \pi \sqrt{n}} \int_{-\infty}^{\infty} \mathrm{e}^{-\mathrm{i} j \frac{y}{\sqrt{n}}} \mathrm{e}^{-\frac{m^2 \sigma^2 y^2}{2}} d y,$$
where the last integral is, as Laplace showed in different ways, equal to
$$\frac{1}{m \sigma \sqrt{2 \pi n}} \mathrm{e}^{-\frac{j^2}{2 m^2 \sigma^2 n}} .$$
Summing up (2.4) for $\frac{j}{m} \in\left[r_1 \sqrt{n} ; r_2 \sqrt{n}\right]$, which can be approximated by integration $\left(d x \approx \frac{1}{\sqrt{n}}\right)$, leads to the result

\begin{aligned} P\left(r_1 \sqrt{n} \leq \sum X_l \leq r_2 \sqrt{n}\right) & \approx \sum_{j \in\left[m r_1 \sqrt{n} ; m r_2 \sqrt{n}\right]} \frac{1}{m \sigma \sqrt{2 \pi n}} \mathrm{e}^{-\frac{j^2}{2 m^2 \sigma^2 n}} \ & \approx \int_{m r_1}^{m r_2} \frac{1}{m \sigma \sqrt{2 \pi}} \mathrm{e}^{-\frac{x^2}{2 m^2 \sigma^2}} d x=\int_{r_1}^{r_2} \frac{1}{\sigma \sqrt{2 \pi}} \mathrm{e}^{-\frac{x^2}{2 \sigma^2}} d x \end{aligned}
which corresponds to the integral form of the CLT. With only one exception (see Sect. 2.1.5.3) Laplace dealt with independent identically distributed and bounded random variables with densities. ${ }^7$ To this aim he at first considered the range of values of those random variables discrete (as described above), and then he assumed $m$ “infinitely large.”

Nowhere in his work did Laplace state a general theorem which would have corresponded to the CLT in today’s sense. He only treated particular problems concerning the approximation of probabilities of sums or linear combinations of a great number of random variables (in many cases errors of observation, by methods which in principle corresponded to the procedure described above. In modern notation, Laplace’s most general version of the CLT was as follows: Let $\epsilon_1, \ldots, \epsilon_n$ be a large number of independent errors of observation, each having the same density with mean $\mu$ and variance $\sigma^2$. If $\lambda_1, \ldots, \lambda_n$ are constant multipliers and $a>0$, then
$$P\left(\left|\sum_{j=1}^n \lambda_j\left(\epsilon_j-\mu\right)\right| \leq a \sqrt{\sum_{j=1}^n \lambda_j^2}\right) \approx \frac{2}{\sigma \sqrt{2 \pi}} \int_0^a \mathrm{e}^{-\frac{x^2}{2 \sigma^2}} d x .$$
The special case of a CLT for the binomial distribution Laplace on the basis of Stirling’s formula treated in a particular section of his TAP by methods which are in principle due to de Moivre and still employed in modern textbooks.

Lindeberg: An Entirely New Method

Lindeberg considered the convolution of the function $U$ and an auxiliary function $f$ with derivatives up to a certain order. He assumed that $\left|f^{\prime \prime \prime}(x)\right| \leq k$ for all $x \in \mathbb{R}$ with a suitable positive constant $k$. In his first paper, Lindeberg had still taken a normal distribution function for $f$. Although this trick is strongly reminiscent of Lyapunov’s procedure , Lindeberg, who according to his own statement, had initially no knowledge of Lyapunov’s works, had developed the idea of an auxiliary distribution independently. However, it is quite possible that Lindeberg was influenced by the account on Crofton’s method in the standard monograph on error theory , in which the use of an auxiliary distribution was described.
Under the general assumptions of “Theorem III,” Lindeberg in the first part of his discussed distributions with finite third-order moments. He started his considerations with a general estimate for arbitrary distributions $V$ with zero expectation and variance $\sigma^2$. Let
$$F(x):=\int_{-\infty}^{\infty} f(x-t) d V(t),$$
and, with the abbreviation $\varphi(x, \sigma):=\frac{1}{\sigma \sqrt{2 \pi}} \mathrm{e}^{-\frac{x^2}{2 \sigma^2}}$, let
$$\Phi(x):=\int_{-\infty}^{\infty} f(x-t) \varphi(t, \sigma) d t .$$
Using the Taylor expansion of $f$ up to the third power (there is a certain similarity with Crofton’s procedure again) Lindeberg showed that
$$|F(x)-\Phi(x)|<k \int_{-\infty}^{\infty}|x|^3 d V(x) .$$
By repeated application of to
$$\begin{array}{r} F_1(x):=\int_{-\infty}^{\infty} f(x-t) d U_1(t), \quad F_2(x):=\int_{-\infty}^{\infty} F_1(x-t) d U_2(t), \ldots, \ F_n(x):=\int_{-\infty}^{\infty} F_{n-1}(x-t) d U_n(t) \end{array}$$

and
$$\begin{array}{r} \Phi_1(x):=\int_{-\infty}^{\infty} f(x-t) \varphi\left(t, \sigma_1\right) d t, \quad \Phi_2(x):=\int_{-\infty}^{\infty} \Phi_1(x-t) \varphi\left(t, \sigma_2\right) d t, \ldots, \ \Phi_n(x):=\int_{-\infty}^{\infty} \Phi_{n-1}(x-t) \varphi\left(t, \sigma_n\right) d t, \end{array}$$
respectively, Lindeberg obtained
\begin{aligned} \left|\int_{-\infty}^{\infty} f(x-t) d U(t)-\int_{-\infty}^{\infty} f(x-t) \varphi(x, 1) d t\right| & \ & <k \sum_{i=1}^n \int_{-\infty}^{\infty}|x|^3 d U_i(x) . \end{aligned}
With the aid of special, piecewise defined functions $f$, Lindeberg [1922c, $216 \mathrm{f}$.] deduced inequalities from (5.42), whose combination yielded the estimate
$$\left|U(x)-\int_{-\infty}^x \varphi(t, 1) d t\right|<k \sum_{i=1}^n|x|^3 d U_i(x)+\frac{4}{\sqrt{2 \pi} \cdot \sqrt[3]{2 k}} .$$
By choosing $k$ such that
$$\frac{1}{\sqrt[3]{2 k}}:=\left(\sum_{i=1}^n \int_{-\infty}^{\infty}|x|^3 d U_i(x)\right)^{\frac{1}{4}}$$
the inequality
$$\left|U(x)-\int_{-\infty}^x \varphi(t, 1) d t\right|<3\left(\sum_{i=1}^n \int_{-\infty}^{\infty}|x|^3 d U_i(x)\right)^{\frac{1}{4}}$$
followed. Lindeberg had already reached an analogous inequality, but with a far more complicated right-hand side.

BS equation代写