Johnson Distributions: Part II ... a continuation of Part I

One thing about inventing a density distribution, say f(x) (where x can vary from -infinity to infinity), it must satisfy:

  1. f(x) ≥ 0 for all values of x   ... since f(x) dx is the probability that x lies between x and x+dx and probabilities ain't negative
  2. f(x) dx = 1   ... since it's guaranteed that x lies between -∞ and ∞ ... so probability = 1
Conclusion?
If we invent a distribution such as:
[A]       f(x) =   where z = h(x) is some (invented) increasing function of x, (like the Johnson distributions)
then
[B]       dx = 1   ... and that puts restrictions of the function z = h(x)

Indeed, we might just as well do this:

  • Consider g(x) = e-z2   ... instead of f(x)
  • Replace z by h(x)   ... where h(x) is our increasing function of x
  • Evaluate g(x) dx = e-z2 dx = e-h2(x) dx = C   ... some constant that'll depend upon our choice of h(x)
  • Then write our distribution as f(x) = (1/C) g(x).
Then we'll guarantee that f(x) dx = (1/C) g(x) dx = (1/C) C = 1  

>So you'll be talking about e-z2, right?
Right and ...
>But you'll need to evaluate that integral e-h2(x) dx, right?
Yes, but ...
>I suspect that won't be easy since ...
Be quiet and I'll explain!

We can invent z = h(x) so that the integration can be performed explicitly.
For example:

  • If z = sinh-1(x)   then x = sinh(z) = (1/2)(ez - e-z)
  • Then dx = cosh(z) dz = (1/2)(ez + e-z) dz
  • Then e-z2 dx = (1/2) e-z2 (ez + e-z) dz   ... and the integrals that arise are pretty easy to evaluate.

>Easy? That's easy for you to say!
Oh, I've long ago forgotten how to do these guys. I just look them up  
In the case of sinh(z), the hyperbolic sine, we just get a bunch of integrals like e-(z-k)2 dz = π

>So you invent your probability functions so the integrals are easy? Shouldn't you invent them so that ... ?
So they match stock data? Sure. Go ahead. I'll wait ...
>Why can't I just use the spreadsheet in Part I?
Sure. Match historical data using the ol' EB-method.
>Huh?
You mean you don't know the ol' EyeBall method?

I asked Jay, my son-in-law, to use the inverse hyperbolic sine distribution
      z = A + B sinh-1((x-m)/s))
and pick the parameters A, B, m and s, to match the distribution of thirty years of S&P500 returns ... using the ol' EB-method.

Then I determined the parameters again so the the mean squared error (between the invented distribution and the actual returns) was a minimum.

It turned out that ...
>Jay was bang on, right?
You got it ... which says something interesting about abtruse mathematical rituals and the ol' EyeBall, eh?

Here's a couple of EyeBall fits to thirty years of monthly S&P500 data ... using two functions:

 
Note:
The average absolute error (between the EB-fit and the actual returns) is minimized in each case.
Although J(u) = u is smaller error (that's the normal distribution), J(u) = asinh has fatter tails  
However, there's not that much difference, eh?
That's because there's little difference between J(u) = u and J(u) = asinh(u) when u is small
... and u measures the distance of returns from their Mean (measured in units of Standard Deviation).

Indeed, they differ when u is larger (them's the tails, eh?)

>Why not invent J(u) so it looks like u for small values and ... ?
And deviates for larger u? Good idea.


Figure 1A
Aside:
Note that sinh(u) = (1/2)(eu - e-u) and the inverse is asinh(u) = log{ u + SQRT(1 + u2) }.
For small values of u, SQRT(1 + u2) ≈ 1 + (1/2) u2
so log{ u + SQRT(1 + u2) } ≈ log{ 1 + u + (1/2)u2 }.
Further, for small values of w, log(1 + w) ≈ w - (1/2) w2 + (1/3) w3
so, for w = u + (1/2)u2
log{ 1 + u + (1/2)u2 } ≈ u + (1/2)u2 - (1/2)[ u + (1/2)u2 ]2 + (1/3)[ u + (1/2)u2 ]3
Ignoring all the higher powers of the small number u, we get: asinh(u) ≈ u - (1/6) u3

Figure 1B
>But Figure 1B looks like Figure 1A.
Yes.


more Distributions

It'd be nice if we could find a J(u) so that:

  1. J(u) ≈ u when u is small
  2. J(u) is smaller, in absolute value, when u is large
    This'd make J(u) like the asinh(u) function, as in Figure 1, above. It deviates from u as |u| increases.
  3. There are several parameters to select, to match historical data.
  4. The invented J(u) gives fatter tails.
  5. AND ... e-z2 can be integrated if z = A + B J(u)  
>You're dreaming, right?
Well ... I'm not sure.
For example, we could choose:
      J(u) = u   when 0 < u < 0.6
      J(u) = 0.6 + 0.5*(u - 0.6) when u > 0.6
and give J(u) odd symmetry ... as in Figure 2A.

>Is it any good?
It's simple, right? It's just a broken line, right?

>Yeah, but ...
It gives a pretty good match, right? Look at like Figure 2B.

>And how about integrating? If you invent f(z), can you easily ... ?
Can I calculate F(x) = f(z) dz? Well, not exactly, but ...

>Then why don't you invent F(x)?
Huh?
Well ... uh ...
Why didn't I think of that?

>Do you really want to know? It involves cerebral prowess and ...


Figure 2A

Figure 2B
Okay, I have another idea. Here's what I'll do:
  1. Invent F(u), where u = (x - m)/s, such that it goes from -1 to +1 x go from -∞ to +∞.
    ... and introduce a few parameters so F can be "adjusted".
  2. Calculate f(x) = dF/dx and adjust parameters so that f (x) dx = 1.

    (Note that u = (x - m)/s is the x-deviation from the mean, in units of standard deviation.
    Hence our f(u) and F(u) are really functions of x, eh?
  3. Adjust the parameters to match historical data.
  4. Pray that you get a good match ... with fat tails  

>So?
So we'll try F(u) = (1/2)(1 + tanh(u)), so F'(u) = f(u) = (1/2)/cosh2(u).

>Huh?
tanh(u) = (eu - e-u) / (eu + e-u)
cosh(u) = (1/2) (eu + e-u)

>Yeah, but what do they look like?
Like this:

Now pay attention:
      F(u) = (1/2)(1+tanh(u)) where u = (x-m)/s, so
      f(x) = dF/dx = dF/du du/dx = (1/2)/cosh2(u) (1/s) = (1/2s)/cosh2(u) and
      f (x) dx = F(u) = (1/2)(1+tanh(u)) evaluated between -∞ and +∞

>And your probability density is ... what?

      f(x) = (1/2s) / cosh2(u)   where u = (x-m)/s.

and the cumulative probability is:

      F(x) = (1/2) (1 + tanh(u))   where u = (x-m)/s.

for Part III