|
Nobel 2003 : Part III ... GARCH Models, a continuation of
Part II
|
We're talking about the lack of constancy in parameters extracted from historical data and ...
>You've said that already.
Okay, we're assuming that some set of variables, y, depends upon another set, x, according to:
Expected value of yk, given the values of the xj = E[yk | x]
Note: (x1, y1), (x2, y2) etc. are the values at times t1, t2 ...
Since this Expected value will have some error, we introduce error terms, ek ... sometimes called "residuals" :
yk = E[yk | x] + ek
|
To "see" the relationship between the y's and the x's we can plot
(x1, y1), (x2, y2) etc., as in Figure 1.
Suppose the "best" straight line fit to the data (the regression line) has equation:
y = A x + B
The actual points will rarely be exactly ON this line ... so there's an error:
yk = A xk + B + ek
|  Figure 1
|
The simplest GARCH model assumes that:
ek = zk hk1/2
where the z's are random variables with Mean = 0
(we'll write M[z] = 0) and Standard Deviation = 1
(we'll write VAR[z] = 1).
Notes: See Stat stuff
Variance = (Standard Deviation)2 and we'll call it VAR.
If two variables are independent / uncorrelated, then the mean of their product is the product of their means: M[xy] = M[x]M[y].
Finally VAR[x] = M[x2]
We now show that the volatility of the residuals/errors is buried in the h's.
In fact, if we assume that the z's and the h's are uncorrelated then we get the following:
- The Mean of the z's is M[z] = 0, so M[e] = M[z]M[h1/2] = 0 as well.
Conclusion? The Mean of the residuals/errors is zero.
- The Variance of the residuals ek is the mean of ek2, hence is given by:
VAR[e] = M[e2] = M[z2 h]
= M[z2]M[h] = M[h] ... since M[z2] = VAR[z] = 1
Conclusion? The h's provide the Variance for the residuals/errors !
So, we're trying to find the appropriate prescription for the h's.
Indeed, we're trying to predict the best estimate of the (time-varying) variance of the residuals/errors !!
Note that h depends upon previous values of e's and h's, like so:
hk = a0 + [a1ek-12+ a2ek-22 + ...
+ aqek-q2] +
[ b1hk-1+ b2hk-2 + ...
+ bphk-p]
This is called a GARCH(p,q) model. It depends upon q previous "residuals" and p
previous variances. (Them's the h's, eh?)
>You've said all this before!
Yes. I'm just reviewing.
The problem now is to deduce the "best" parameters a0,
a1, b1, a2, b2 etc. etc.
Note:
We introduced the numbers ek as the "errors" in a regression plot (as in Figure 1).
However, in order to introduce "shocks" in the sequence of returns, they are sometimes just the change in daily returns, rk, like so:
ek = rk - rk-1
In this case the Variance of the "residuals" is just the Variance of the changes in daily returns.
As well, some analysts use ek = log[ Pk / Pk-1], where the P's are stock prices.
We'll consider the simplest model : GARCH(1,1).
We must determine the "best" values for a0, a1 and b1 where
[1] hk = a0 + a1ek-12 + b1hk-1
Note that today's Variance (that's hk) is a linear combination of:
some (as yet unknown) constant a0
yesterday's squared error (that's ek-12)
yesterday's Variance (that's hk-1)
A popular technique for determining the "best" parameters is to maximize the log-likelihood function
>Huh?
Yeah ... well, we gotta talk about that guy.
Suppose we have a set of numbers u which depend upon some parameters a.
We ask: "What values for a would be most likely to generate the observed numbers u?"
(The symbols u and a stand for a sets of numbers: u1, u2 ... and a1, a2 ...
Suppose that, for each choice of a, we could calculate the probability of getting the numbers u.
We'll call this probability p(u | a) ... the probability of getting u, given the value of a
Then we'd want to determine that a-value which maximizes p(u | a)
... so we'd get the maximum probability, eh?
>But why log-likelihood?
The probability (a number between 0 and 1) can be maximized by maximizing
its logarithm ... and that's often simpler (mathematically).
Here's an example with two parameters:
- Suppose the set of observed values u = { u1, u2 ... un} are from some (unknown) Normal distribution.
- Suppose, too, that the Normal distribution has (unknown) mean M and Standard Deviation S.
It's the values of the parameters M and S that we're looking for, eh?
- The probability of observing the value u1 is then
... that's the Normal distribution
- The probability of observing u2, u3 etc. is similar
... where we replace u1 by u2 then u3 etc.
- The probability of observing all the u-values is the product of the individual probabilities, namely
p(u | M, S) = ... a function of the two parameters M and S
- Our problem is now to determine the values of the two parameters M and S so this probability is a maximum.
- To do this we'd set to zero the derivatives with respect to M and S and, to do this, it's easier to work with the logarithm of p.
- The two derivatives are ...
>Can't you just give the answer? Calculus ain't my best suit and ...
Well, at least we want to note the following:
- Call p(1,M,S) the probability of oberving the value u1 ... as in #3, above.
- Then the probability of observing all the u-values is the product p(1,M,S) p(2,M,S) ... p(n,M,S).
- The logarithm of this product is log[p(1,M,S)] + log[p(2,M,S)] + ... + log[p(n,M,S)].
- Hence we need only apply the standard calculus ritual to each log-term ... differentiating with respect to M and S.
- Each log-term has the form: log[p(M,S)] = (-1/2)log(2π) - log(S) - (u-M)2/2S2.
- We then differentiate with respect to M then add all these derivaties together and set the sum to zero.
- We do the same with derivatives with respect to S ...setting the sum to zero.
- Then we solve these two equations for M and S and ...
>Can't you just give the answer!!
Okay, it turns out that the "best" value for M is the mean of the set {u1, u2, ... un}
and
the "best" value for S is the standard deviation of {u1, u2, ... un}
>Isn't that obvious?
Pay attention.
Suppose we had a bunch of returns, say
16.2%, 40.1%, 32.5%, 18.7%, 12.0%, -16.2%, 30.6%, 25.1%, and 20.6%
Them's our u1, u2 ... u9
We assume these came from a Normal distribution and calculate the product of the nine probabilities.
That'd give a formula as noted in step #5 above.
We have those two unknown values for M and S, but we generate a bunch of probability charts for various M- and S-values as in Figure 2.
In other words, we plot the function given in step #5 above, for various values of M and S. Each one is a Normal curve.
> The infamous Bell curve, eh?
|  Figure 2
|
|
Perhaps it's better if we gaze in awe at a chart, as in Figure 2A, a 3D version of Figure 2.
>I'd say the maximum is at M = 20% and ... uh, S = 16%. Am I right?
Well, we'd need to close in on the value for S. It certainly seems to be around 15% or 16%.
In fact, it turns out to be 15.2%.
>Hold on! The average of those nine returns is 20% and ...
The standard deviation is 15.2%.
>What a coincidence!!
Isn't it?
|  Figure 2A
|
In general, we might expect the probability surface to look something like this:
We consider the GARCH(1,1) model ...
>Please, remind me of what we're doing.
We're trying to predict tomorrow's volatility using the GARCH(1,1) model, so we remind ourselves of [1] above:
[1] hk = a0 + a1ek-12 + b1hk-1
We've shown, above, that M[e2] = M[h] but
M[h] = M[ a0 + a1ek-12 + b1hk-1]
= M[ a0] + a1M[ek-12] + b1M[hk-1]
... since Mean of a SUM = SUM of Means
Hence we can write:
M[h] = a0 + a1 M[e2] + b1M[h]
Now replace M[h] by M[e2] and this becomes:
M[e2] = a0 + a1 M[e2] + b1 M[e2]
Hence we conclude:
M[e2] = M[h] = a0 / (1 - a1 - b1)
... asymptotically speaking
|
Note that we're in BIG trouble if a1 + b1 = 1.
>Asymptotically speaking?
Over an infinite time period so there are an infinite number of past e-values and ...
>Does that make sense?
If we look at 2000 market days for the S&P500 (see Figure 3, below) we'd get
the variance of log(daily returns) looking like this
After a long while the Variance seems to approach some particular value.
|
|
>Hey! That's pretty simple, eh?
Well, it assumes we're dealing with a process that started infinitely far in the past so that there are an infinite number of ek values.
>That makes little sense to me because ...
Okay. The guy we're calling M[e2] stands for the mean of all possible values for e2.
However, M[ek-12] stands for the mean of recent historical values.
Replacing one by the other assumes an infinite number of past ek2 values.
>So what if you just have a few past values?
We'll get to that ... soon.
Typically, the "best" choices for a0, a1 and b1 are something like:
a0 = 10-6, a1 = 0.09 and b1 = 0.90
Note that a1+b1 = 0.99 is very close to (but less than) 1 ... and that's typical.
>Typical? What's that mean?
There are many applications of GARCH(1,1) in the literature. For example, the "best" parameters for S&P500 returns
during the period May, 1995 to April, 2003 (as noted in the Nobel 2003 info)
are similar to those noted above.
In fact, in the Bank of Sweden advanced information on the 2003 Nobel prize, there's a neat pair of charts.
The graph of log(returns) = log[ value(today) / value(yesterday) ] is the upper chart in Figure 3 below
 Figure 3
and the lower chart is the GARCH(1,1) variance estimate, day by day. Note that were about 2000 market days during that time period.
|
>It predicted those large changes in returns?
Well, I think the GARCH model predicted a short-term continuation of large variance, once the historical data initiated the big change
... but those large variances soon die out.
Remember this chart from Part II ?
The z's are random variables with mean = 0 and variance = 1, so they usually have a value less than 1.
Hence the slope of those lines are usually less than a1 + b1.
Hence those lines usually intersect the line y = x.
Hence the successive variance values usually approach some intersection.
|
|
If, for some LARGE z-value, we get lines with slopes larger than 1,
we'd get something like this
... and to see an example of the kind of variance plots one can generate with GARCH(1,1),
click here.
>Okay, we have a model with some parameters, but how ...?
How to deduce the "best" values ... from historical data?
Wait until I finished reading all the
stuff
that's on the Net.
In the meantime, I was interested in the distribution of h-values using GARCH(1,1) so I generated 32,000 values using:
a0 = 0.001, a1 = 0.095, b1 = 0.90 and starting with h = 0.1 and repeating equation [1], like so :
hk = a0 +[ a1zk-12 + b1 ] hk-1
= 0.001 + (0.095*zk-12 + 0.90 ] hk-1
where the sequence of random numbers zk-1 were generated via Excel's NORMINV(RAND(),0,1) command.
This is a picture of the sequence of 32,000 h-values:
... and their distribution:

>That distribution looks famiiar.
Doesn't it. (The faint grey curve is a lognormal distribution.)
>So why don't you just assume a distribution like that and select them at random and ...?
How would I guarantee that large (or small) variances stick around for a while, as is the case for actually observed variances?
>I give up.
for Part IV
|