Logistic Growth

I've recently been trying to understand and (in particular) tracking the worldwide cases of Swine Flu (or H1N1 flu).
One model that seems to fit the reported cases of H1N1 virus is the Logistic Curve
This model attempts to model a variety of things, like tumour growth, the cost of an annuity and (most commonly) population growth.
It begins with an exponential growth, then levels off and eventually suggests a constant population.

>A constant population? You kidding?
Well, it's often applied to populations -- like fish in a lake.
The lake is only so big and has only so much food so one might expect some maximum population that the lake can support.
In fact, the eventual population is sometimes called the "carrying capacity". In Figure 1, it's 1000.

>So you dump a few fish in the lake and the population can grow to 1000, right?
Something like that.


Figure 1
Anyway, there's a simple differential equation describing such a growth, namely:

[1]         dP/dt = r P (1- P/K)

Note that there are two obvious solutions: P = 0 and P = K.
When 0 < P < K, dP/dt > 0 so the population increases.
When P > K, dP/dt < 0 and the population decreases.

For the case K = 1000, we can see that in Figure 2.
The solutions to [1] are:

[2]         P(t) = K / {1 + (K/Po - 1) e-rt}


Figure 2
Note that:

  • When t = 0, P(0) = Po.
  • When t ∞, P(t) K.
  • When P/K is small, then [1] looks like: dP/dt = r P which has solution P(t) = P(0) ert ... and that's exponential growth.

>So what about that swine flu thing?
Ah, yes ... the H1N1 virus.
I looked at the reported cases (according to W.H.O.) and tried to fit a logistic curve to the data.
For the first few days I got a reasonable exponential fit as in Figure 3a. Then, using more data as they were reported, I tried to fit a logistic curve ... as in Figure 3b.

Figure 3a

Figure 3b
>Is that a good fit?
Actually, I was surprised how good!

>So are your predicting some eventual, maximum number of cases?
Uh ... not quite.
For me the interesting thing was how one might go about finding the "best" logistic curve.
That meant defining "best".
I looked at the deviations: (Number of Cases) - (Logistic Value) as in Figure 4. Then I tried three things:
  1. Minimizing the Sum of the Squares of the deviations.
  2. Minimizing the Maximum |deviation|. (That || thing means the absolute value of the deviation.)
  3. Minimizing the Average |deviation|.
I got different values for K = maximum cases.
Using the reported cases for the first 18 days I got these guys:
  1. Minimizing the Sum of the Squares: 7350 / (1 + 79.5 e-0.340t )
  2. Minimizing the Maximum |deviation| : 7820 / (1 + 60.0 e-0.355t )
  3. Minimizing the Average |deviation| : 7125 / (1 + 79.0 e-0.345t )
Interesting, eh?

Figure 4

>So which is the best predicted value for the maximum number of cases?
I'll let you know in a month or three.

Early results

Later ...


Note:
In the spreadsheet described here, I've added another criterion for "best parameters".
The errors are weighted so that recent case numbers are weighted more heavily than ancient ones.

Note, too, that [1] has the form:   dP/dt = a P - b P2.
I've found that a better fit to Swine Flu cases is generated using:   dP/dt = a P - b P3.
In that case, the solutions look like [2] with the denominator replaced by a square root ... like so:

[3]         P(t) = K / (1 + A e-rt)1/2.

That suggests that we might try:

[4]         P(t) = K / (1 + A e-rt)rr   ... and vary rr as well.

That spreadsheet looks like this:


"Other" Growth Functions

Logistic Growth, though very common, doesn't seem to provide a very good fit for the Swine Flu cases we're considering.
Indeed, the growth rates in flu cases is pretty variable.
However, if we plot the logarithm of the number of cases, the curve is much smoother (as in Figure 5).

That suggests that we might try to fit a curve to the logarithm ... the natural logarithm.

To that end we assume some "Sigmoidal" function -- that is, S-shaped -- such as Equation [5], below:
[5]         f(x) = log(cases) = K { 1 - A*exp(-rr*x) }b

Then we vary K, A, b and rr to fit f(t) to log(cases) like Figure 6.


Figure 6


Figure 5
This spreadsheet attempts to do just that:

The lower chart, on the right, has a couple of grey curves.
They illustrate the sensitivity of the Eventual Value to changes in the parameters.
Indeed, the parameters are changed (up or down) by 1% to generate those curves.

You can try your own "Growth Function" in cell G10 ... tho' the display of that equation may not be proper.
That is, cell D28 will still show log(y) = K*(1-A*EXP(-rr*x))^b.
Nevertheless, column T will have your new Growth Function and the charts will be okay.

Note, too, that you can weight the errors so that recent cases have more weight than ancient cases.
That is, you stick the weights associated with each error in column T and the computed error is calculated according to:

[6]         Error = Σ W(n) | Y(n) - f(n)| / Σ W(n) ... where f(n) is the curve that you're fitting to Y(n) = log(y(n)) and W(n) are the weights you stuck in column T.

Note that f(n) changes as rr, K, A and b change. The eventual "best" choice for these parameters is the one that minimizes the Error.

Further, in column C, the numbers are interpolated for days that have missing data. (They're coloured grey.)


If we were eager to generate some differential equation (along the lines of the SIR model), we could write:

[7]         dS/dt = -a S I +b I     dI/dt = a S I - b I

where S measures the Susceptible population and I measures the Infected population.
Note that d/dt (S + I) = 0 so S(t) + I(t) = constant, N, the entire population.

Setting S = N - I, then the Infected population satisfies:

[7a]         dI/dt = I { a (N - I) - b } = (aN-b) I { 1- a/(aN-b) I }   which is the equation for Logistic Growth ... as in [1].