Central Limit Theorem: Introduction

There's this theorem that explains why the Normal (or Gaussian) distribution crops up so often.
>Don't tell me! It's the central limit thing, right?
Yes. The Central Limit Theorem.
Consider some (unknown!) probability density function f(x). (We'll refer to it as pdf).
We assume that the Mean M = 0 and the Standard Deviation is S.


Then we note the following, in Table 1:
... where the integration is over all possible values of the (random) variable x, say: ∞ to +∞.
>Okay, all those guys have names. So?
Patience.

 ∫ f(x) dx = 1
 Expected value of g(x) = E [g(x)] = ∫ g(x) f(x) dx
 Expected value of x = Mean: M = ∫ x f(x) dx = 0
 Variance: S^{2} = ∫ x^{2} f(x) dx
 Skew = ∫ x^{3} f(x) dx
 Kurtosis = ∫ x^{4} f(x) dx
 etc. etc. for other "moments" of f

Table 1

Okay, now consider the following integral:

[1] The E xpected value of e^{tx} 

Of course, we assume that f(x) is respectable and so wellbehaved so that the integral converges.
Expanding e^{tx} in a Taylor series, we get:
[2a] M_{x}(t) = ∫ (1 + tx + t^{2}x^{2}/2 + ...) f(x) dx
= 1 + Mt + S^{2}t^{2}/2 + ...
= 1 + S^{2}t^{2}/2 + ...
using Table 1, noting that M = 0 for the case we're considering.

See? Ain't that neat? The various "moments" of the distribution function f(x) are just the coefficients of this guy, M_{x}(t).
For that reason M_{x}(t) is called the Moment Generating Function.
>Moments? Why are they called moments?
Uh ... let's recall something about moments of a force about a point.
Imagine a bunch of weights on a board, like so:
The "moment" of each about the left end of the board is: (weight)*(distance to the left end).
The total moment of all the weights is the sum of these moments.
Suppose the weights (at distances x_{1}, x_{2}, etc.) were f_{1}, f_{2} ...
 Moments of forces about a point 
The total moment is then: f_{1} x_{1} + f_{2} x_{2} + ... and we want to know where to put a fulcrum in order to balance all them weights on the board.
It'd be a place where the total weight would have the same total moment. That is, we'd want:
[A] (f_{1} + f_{2} + ... )*X = f_{1} x_{1} + f_{2} x_{2} + ...
See that right side? It looks just like #3 in Table 1, eh?
>Except that you're summing instead of integrating, right?
Yes. However, if the weights were not discrete but uniformly distributed along our board, we'd integrate.
 Moments of forces about a point 
If all the weights were equal to 1 (so the sum equals n), then [A] says:
[B] n X = x_{1} + x_{2} + ... +x_{n}
which says we should place our fulcrum at:
X = ( x_{1} + x_{2} + ... +x_{n}) / n.
>That's the average distance. I knew that!
Good for you.
In [A] we've summed f_{j} x_{j}, but we could also sum f_{j} x_{j}^{2}
(the second moment) or maybe f_{j} x_{j}^{3} or maybe ...
>Yeah, I get it.
That second moment (for the weights on our board) has a name: Moment of Inertia and Engineers use it all the time to determine the strength of beams.
For example, the Ibeam is designed so that ...
>Could you get back to ... uh, what were we talking about?
Sorry. Let's continue.
The magic formulas [1] and [2a] define the Moment Generating Function. Note the following:
[i] If we know the distribution function f we can calculate M_{x}(t).
[ii] Further, if we know M_{x}(t) we can calculate the pdf, f.

It's [ii] that'll really come in handy ... soon.
Now let's enumerate some interesting characteristics of our Moment Generating Function:
 M_{(a x + b)}(t) = ∫ e^{(ax + b)t} f(x) dx
= e^{b t}∫ e^{(ax)t} f(x) dx
= e^{b t} M_{(a x)}(t)
= e^{b t}∫ e^{x(at)} f(x) dx
= e^{b t} M_{ x}(at)
 M_{(x + y)}(t)
= M_{x}(t) M_{y}(t)
where x and y are independent random variables.
(That is, the value of one doesn't affect the value of the other.)
 Similarly: M_{(x1 + x2 + ... )}(t)
= M_{x1}(t) M_{x2}(t)...
 Table 2
>Wait! What's that about? When you add random variables you multiply the generating functions together?
This relationship, where sums get replaced by products or products get replaced by sums ... it's familiar.
Remember that the logarithm of a product is the sum of logarithms.
Check out Stat Stuff 4.
If you multiply uncorrelated, independent random variables, the Mean of the product is the product of the Means.
Hence, the Mean (or Expected) value of the product e^{x t}e^{y t} is the product of the two Expected values:
M_{x}(t) and M_{y}(t).
Central Limit Theorem: Proof

Okay, now consider the Variance of an Average (or Mean) of n independent / uncorrelated random variables selected from the same distribution.
We assume that distribution has Variance S^{2} (where S is the Standard Deviation).
>Huh?
We're picking n random values and averaging them. Then we pick another n and average them. Then another and another.
We look carefully at all the averages we've calculated and ask: "What's the distribution of all these averages?"
>But you need to know the distribution of the random values ... don't you?
Patience. The result is really quite remarkable.
We first inspect Stat Stuff 8.
If they're independent, the Variance of a sum is the sum of the Variances. Hence:
[C1] Var[ (x_{1} + x_{2} + ... + x_{n}) / n ]
= Var[ x_{1} / n ] + Var[ x_{2} / n ] + ... + Var[ x_{n} / n ]
= (1/n^{2}) Var[ x_{k}] ... using Stat Stuff 2
Continuing:
[C2]
Var[ (x_{1} + x_{2} + ... + x_{n}) / n ] = (1/n^{2}) (n S^{2}) = S^{2} / n
... since the variance for all selected random variables is the same, namely S^{2}.
If the Variance is S^{2} / n, then the Standard Deviation is S / √n.
We're interested in the pdf distribution of these averages. Let's call the collection of averages Y_{n}.
A "typical" value would look like: Y_{n} = (x_{1} + x_{2} + ... + x_{n}) / n.
Now, if we knew the Moment Generating Function for these averages we would know their distribution.
To this end we consider related "normalized" variables:
[D1] Z_{n}
= Y_{n} / (S/√n)
... where a typical value looks like:
Z_{n} = { (x_{1} + x_{2} + ... + x_{n}) / n } / (S/√n)
= (x_{1} + x_{2} + ... + x_{n}) / (S√n).
Note that the Z_{n} are sums of n terms each looking like: x_{j} / (S√n).
Then their Moment Generating Function is a product of n Moment Generating Functions, each looking like:
[D2] ... using Table 2.1 stuff.
>zzzZZZ
Don't you see? It's just a matter of changing the scale for t. We replace t by t / (S√n).
Remember that we want the product of all n of these generating functions.
Since all the xs are chosen from the same distribution, the generating functions will all be the same.
Hence and therefore (I love that phrase!):
[D3]
M_{Z}_{n} = {
M_{x}(t /S√n }^{n}
Do you remember [2a]?
>zzzZZZ
Well, now it comes in handy. We're going to change to t / (S√n) and consider n to be large.
That means that t / (S√n) is small. That means that we can rewrite [2a] like so:
[2b]
M_{Z}_{n}
= {1 + t^{2}/2n + ...}^{n}
where we've replaced t by t / (S√n) and set the Mean M = 0. 
Now here's a neat thing to know:
The limit of (1 + k/n)^{n} as n → infinity is e^{k}.
Now stare carefully at [2b] and assume all them neglected terms are really, really small compared to the term t^{2}/2n
... then wave our magic wand and get, for large samples:
[3]
M_{Z}_{n}
e^{t2/2} as n infinity.

>zzz ... huh? Is that it?
Yes. We have the Moment Generating Function of the distribution of averages (for large samples), taken from the same (almost) arbitrary distribution.
>And that gives the Central Limit Theorem?
Yes, because we know the generating function (for large n) hence we know the probability density distribution (for large n) ... and that's the Normal Distribution.
I forgot to mention that the Moment Generating Function for the Normal distribution is nobody else but e^{t2/2}.
Remember, Normal distribution pdfs contain a factor like: e^{x2/2} ... and I leave the integration of e^{xt} e^{x2/2}
= e^{t2/2} e^{(xt)2/2} up to you.
Central Limit Theorem: Examples

I should also mention an example or two:
>So you regard 5000 as a large value for n?
No, I regard n = 50 as a large value for n. Those 5000 average calculations are just to get the distribution of the nvalue averages.
Have you been paying attention? For large n, the distribution of these averages approach a Normal distribution.
The more averages you compute, the better the picture of the distribution:
>zzzZZZ
Now we consider random numbers that are either 0 or 1, with equal probability
... using IF(RAND()<0.5, 1, 0).
It's like tossing a coin and assigning a 0 for a Head and 1 for a Tail.
The pdf for these numbers looks something like this:
Again, we select 50 at random and average them.
We repeat this 5000 times and plot the distribution of the averages ... and get this:
 
Now we consider random variables that are generated like so:
 Pick five random numbers, uniformly distributed in (0,1) ... using RAND().
 Call them R_{1}, R_{2}, R_{3}, R_{4} and R_{5}.
 If R_{1} < R_{2}, then select R_{3} as our random variable.
 Otherwise, select our random variable from a normal distribution with Mean = R_{4} and Standard Deviation R_{5}.
 Calculate the average of 50 such random variables.
 Repeat the above steps 5000 times and plot the distribution of averages.
We can use the Excel function IF(RAND()<RAND(), RAND(), NORMDIST(RAND(), RAND(), RAND(), 0)).
It'll give the random variables described above.
The pdf for these variables looks something like ... uh ... I have no idea.
When I do this, I get Figure 1. Does it look normal?
 Figure 1 
