Latest Posts
Favourite Posts
Standard Errors of Volatility Estimates
Sample Variance
The sample variance is given by the formula
$$ S^2 = \frac{1}{n-1}\sum_{i=1}^n (X_i - \bar X)^2 \label{sample_var}\tag{1} $$where \(\bar X\) is the sample mean
$$ \bar X = \frac{1}{n} \sum_{i=1}^n X_i $$Equation \eqref{sample_var} is an unbiased estimator for the population variance \(\sigma^2\), meaning
$$\EE{S^2} = \sigma^2$$Where we have used the fact that the \(X_i\) are iid. so \(\mathbb E[X_i X_j] = \mathbb E[X_i]\mathbb E[X_j] = \mathbb E[X_i]^2\) for \(i\neq j.\)
The variance of the sample variance is given by
$$\var{S^2} = \frac{\sigma^4}{n}\left(\kappa - 1 + \frac{2}{n-1}\right)$$where \(\kappa\) is the population kurtosis (raw, not excess!) Note \(\kappa \geq 1\) always.
The standard error of the sample variance is
$$\varepsilon({S^2})=\frac{\std{S^2}}{\sigma^2} = \sqrt{\var{S^2}} = \sqrt{\frac{\kappa-1}{n} + \frac{2}{n^2-n}}$$For large \(n\) this becomes
$$\varepsilon(S^2) \approx \sqrt{\frac{\kappa-1}{n}}$$Ultimately the standard error of the sample variance scales as \(\varepsilon \sim n^{-1/2},\) similarly to the standard error of the sample mean.
Sample Standard Deviation
The population standard deviation - \(\sigma\) - is given by the square root of the population variance. Similarly, the sample standard deviation - \(S\) - is given by the square root of the sample variance.
We know that \(S^2\) is an unbiased estimator for \(\sigma^2,\) so Jensen’s inequality tells us that \(S\) must be a biased estimator for \(\sigma;\) specifically it is an underestimate. Jensen’s inequality states that, for any concave function \(\phi,\) we have
$$\mathbb E[\phi(X)] \leq \phi(\mathbb E[X])$$The square-root function is concave and so
$$\mathbb E[S] = \mathbb E[\sqrt{S^2}] \,\,\leq\,\, \sqrt{\mathbb E[S^2]} = \sqrt{\sigma^2} = \sigma$$Although \(S\) is a biased estimator for \(\sigma,\) it is consistent. That is, as \(n \rightarrow \infty\) we do get \(S \rightarrow \sigma.\)
What is the standard error on the sample standard deviation? Jensen’s inequality makes it hard to derive any fully general results. However, for large \(n\) we can derive a result.
Note that because \(\varepsilon \sim n ^{-1/2}\) as \(n\) gets large the distribution of \(S^2\) gets tighter around \(\sigma^2.\) Taylor expand \(\sqrt{S^2}\) around \(\sigma^2:\)
$$ \sqrt{S^2} = \sqrt{\sigma^2} + \frac{S^2 - \sigma^2}{2\sqrt{\sigma^2}} + O\big((S^2 - \sigma^2)^2\big) $$Keeping only the first order term, we get
$$S = \sigma + \frac{S^2 - \sigma^2}{2\sigma} \label{S_large_n}\tag{2} $$Now we can calculate the variance of this first order estimate of \(S:\)
$$\begin{align} \mathrm{Var}(S) &= \mathrm{Var}\Big(\sigma + \frac{S^2 - \sigma^2}{2\sigma}\Big) \\ &= \mathrm{Var}\Big(\frac{S^2}{2\sigma}\Big) \\ &= \frac{1}{4\sigma^2}\mathrm{Var}(S^2) \\ &= \frac{1}{4\sigma^2}\mathrm{Var}(S^2) \\ &= \sigma^2 \frac{\kappa - 1}{4n} \end{align}$$On the last line we have used our large \(n\) approximation for \(\mathrm{Var}(S^2)\) because we already put ourselves in that regime in order to use the Taylor expansion approximation.
Taking the square-root and dividing by the population standard deviation yields
$$\varepsilon(S) = \sqrt{\frac{\kappa - 1}{4n}}\qquad\qquad n\gg 1 \label{err_S}\tag{3} $$For a Gaussian, where \(\kappa = 3,\) this tells us
\[\varepsilon(S) = \frac{1}{\sqrt{2n}}\]The argument above is a bit sloppy, I have not said under what conditions the first order approximation equation \eqref{S_large_n} is valid, nor proven when its variance is equal to the variance we seek (i.e. that of the unapproximated \(S\).) To make the argument rigorous see the “Delta Method”.
Sums of IID Random Variables
Let \(X_i\) be iid random variables with mean and standard deviation \(\mu_X\) and \(\sigma_X\) respectively. Define
$$Y = \sum_{i=1}^n X_i$$We know
$$\mathbb E[Y] = n\mu_X \equiv \mu_Y \qquad\text{and}\qquad \mathrm{Var}(Y) = n\sigma_X^2 \equiv \sigma_Y^2 $$Further, we get a similar relation for the kurtosis
$$ \kappa_Y - 3 = \frac{1}{n}(\kappa_X - 3) $$Sanity Check
As \(n\) grows, the central limit-theorem tells us that \(Y\) approaches being normally distributed, so we expect \(\kappa_Y \rightarrow 3.\) This is indeed what we see for this relationship.
We want to estimate the underlying standard deviation \(\sigma_X\) from observations of \(Y.\) We know \(\sigma_X = \sigma_Y / \sqrt{n},\) so we can use \(S_X = S_Y / \sqrt{n}\) as an estimate. Say we have \(m\) observations of \(Y,\) and \(m \gg 1,\) then from above we know
$$\std{S_Y} = \sigma_Y \sqrt{\frac{\kappa_Y - 1}{4m}}$$Substituting in our known relationships we get
$$\std{S_Y} = \sqrt{n}\,\sigma_X \sqrt{\frac{(\kappa_X-3)/n + 2}{4m}}$$we know
$$\std{S_X} = \std{\frac{S_Y}{\sqrt n}} = \frac{1}{\sqrt n}\std{S_Y}$$so
$$\std{S_X} = \sigma_X \sqrt{\frac{(\kappa_X-3)/n + 2}{4m}} \label{S_err_nm}\tag{4} $$The kurtosis of \(X_i\) can either reduce the standard error on our estimate (for \(\kappa_X < 3\)) or increase it (for \(\kappa_X > 3\).)
Sanity Check
For \(n = 1\) we recover the result we would have got by applying equation \eqref{err_S} directly to one \(X_i,\) as we should.
The dependence on \(n\) arises purely because of the effect of the summation on the overall kurtosis of \(Y.\) Additionally as \(n\) grows the kurtosis of \(X\) affects the standard error less. We should expect this, because the central limit-theorem tells us that \(Y\) approaches a normal distribution in the limit \(n \rightarrow \infty.\) Note that for zero excess kurtosis (\(\kappa_X = 3\)) the standard error of \(S_X\) does not depend on \(n\) anyway.
Volatility Estimates
Now imagine a process
$$Z_t = \sum_{i=1}^t X_i$$Let’s say we sample this process every \(n\) time units, i.e \( Z_{jn}.\) The differences of this process give us back a series of iid realizations of \(Y:\)
$$Z_{(j+1)n} - Z_{jn} = \sum_{i=jn+1}^{jn+n} X_i \equiv Y_{j}$$If we have observed up to time \(t = mn\) then we have \(m\) observations of \(Y,\) each consisting of the sum of \(n\) draws of \(X\). Thus, we can use \(S_X = S_Y / \sqrt{n}\) to estimate the volatility, and we can directly apply our formula equation \eqref{S_err_nm} to calculate the error on this estimate.
Effect of the Sampling Period
A natural question to ask is; for a fixed observation time, how does sampling frequency affect the accuracy of our estimate? The number of samples \(m\) is related to the total time \(t\) and sampling period \(n\) by \(m = t/n.\) Thus
$$\varepsilon = \frac{\Std{S_x}}{\sigma_X} = \sqrt{\frac{(\kappa_X - 3) + 2n}{4t}} \label{S_err_tn}\tag{5} $$Firstly, let’s note that
$$\varepsilon \propto \frac{1}{\sqrt{t}}$$So, as one might hope, sampling for a longer period always gives a better estimate. How about for the sampling period \(n?\)
Define \(c = (\kappa_X - 3)/2\). Remember \(\kappa_X \geq 1\) so \(c \geq -1,\) also note \(n \geq 1.\) Then we can write
$$\varepsilon \propto \sqrt{c + n}$$Because the square-root function is monotonically increasing, this tells us it is always beneficial to reduce \(n,\) i.e. to increase the sampling frequency. In fact, because the square-root function is concave, the smaller \(n\) is, the greater the (relative) benefit in reducing it further.
However, as we increase the kurtosis, note that the overall benefit of increasing sampling frequency, decreases. This is because we move into a flatter part of the square-root curve. It should be clear visually, but you can also show it by calculating the gradient of \(\varepsilon\) wrt \(n\) and finding
$$\frac{\partial\varepsilon}{\partial n} \propto \frac{1}{\sqrt{c+n}}$$Not only does increasing kurtosis decrease the benefit of sampling faster, it increases the errors we will see at the very fastest time-scales, for the same reason.
Effect of the Sample Count
We can similarly ask, still for fixed \(t,\) how the standard error scales with the number of samples. By substituting \(n = t/m\) into our formula above, and rearranging, we get
$$\varepsilon = \frac{1}{2}\sqrt{\frac{\kappa_X - 3}{t} + \frac{2}{m}}$$We can also calculate the gradient of this wrt \(m,\) but it’s not as clean as the gradient wrt \(n,\) above. In this case, let’s do the following; say we have a number of samples \(m,\) what happens to the standard error if we increase (or decrease) this by a factor \(f,\) giving \(m^\prime = mf\)? We find
$$\frac{\varepsilon^\prime}{\varepsilon} = \sqrt{\frac{cm + 2t / f}{cm + 2t}}$$For \(\kappa_X=3\) this gives \(\varepsilon^\prime/\varepsilon = 1/\sqrt{f}.\) This makes sense, \(f\) is directly related to the number of samples. The relative benefit of one more sample, in this case, is
$$\frac{\varepsilon - \varepsilon^\prime}{\varepsilon} = 1 - \sqrt{\frac{m}{m + 1}}$$This follows because, for one extra sample, \(f = 1+1/m\). This function approaches zero as \(\sim m^{-1},\) i.e. as we get more samples the relative benefit of adding more drops away. (Though actually fairly slowly.)
If \(\kappa_X > 0\) then we find, for large \(m\), that the relative benefit instead goes as \(\sim m^{-2} / c\), meaning it drops off faster than in the mesokurtic case. Further, the higher the kurtosis, the smaller the benefit for each increase in sample count.
Despite this, the point above stands, it is always beneficial (at least mathematically) to sample more frequently. You may have other reasons that this becomes a trade-off; e.g. ability to fit your dataset in memory or something.
Empirical Verification
Let’s empirically test our formula to check it’s correct.
For a set time \(T\) we draw a realization of \(\{Z_t: t\leq T\}\), then for a given sampling period \(n\) we slice each realization into \(m = \lfloor T/n \rfloor\) samples and use \(S_X = S_Y/\sqrt{n}\) to estimate the volatility. We repeat this process until we have \(D\) estimates of our volatility, then calculate the empirical standard deviation of these estimates. We can compare this to our formula equation \eqref{S_err_nm}.
Repeating this for all sampling periods in the range \(1 \leq n < 150\) gives us a dataset we can plot:
We see that in the large \(m\) limit we get good agreement between equation \eqref{S_err_nm} and the observed values. (Remember, small \(n\) means higher frequency and so more samples.) As the sampling period increases, so \(m\) decreases and we start to see that the theoretical value is not such a good estimate. Here \(T = 1000\), therefore at the upper end of the graph we have only \(\lfloor 1000 / 150 \rfloor = 6\) samples. Unsurprisingly for such a small number our “large N” approximation breaks down!
The code that produced these plots is available here.
Vortex Shedding
I have been working on my Lattice-Boltzmann simulation code recently. (Okay, given how long this article has been a draft, ‘recently’ is perhaps a stretch.) Simulating the vortex shedding of flow past a cylinder is practically a rite-of-passage for CFD codes. So let’s do it!
All the results shown here were produced with my code, which is available on GitHub.
Specifically the script at vortex_street.py
.
Contents
Setup
Flow Geometry
The flow setup (shown in the diagram) is a rectangular domain of height \(W\) and length \(L\). At the left and right ends in- and out-flow boundaries are enforced by prescribing that the velocity be equal to \(\mathbf u = (u, 0)\). The top and bottom boundaries are periodic. In the (vertical) centre of the domain is a circular wall boundary of diameter \(D\). Let’s define the y-axis such that \(y=0\) corresponds to the midline of our simulation.
This setup is called flow past a cylinder because we imagine the 2D simulation is a slice of a 3D system which is symmetric under translations in the z-direction.
Characteristic Scales
The suitable characteristic length-scale is the diameter of the cylinder \(D\), and the characteristic velocity is the prescribed in-flow velocity \(u\). Combined with the fluid’s kinematic viscosity - \(\nu\) - these give us the Reynold’s number
$$\mathrm{Re} = \frac{uD}{\nu}$$The flow displays different behaviour depending on the value of \(\mathrm{Re}\). We can control this by changing any of \(u\), \(D\) or \(\nu\). In the simulations below I have done this by fixing \(D\) and \(\nu\) then inferring \(u\).
The characteristic timescale of the system is the time it takes the fluid to flow past the cylinder
$$T = \frac{D}{u}$$We define the ‘dimensionless time’ to be \(\tau = t/T\). When comparing results across different Reynold’s numbers we will plot them at equal values of dimensionless time.
Because our simulation domain is finite in size we, strictly, also have to consider the dimensionless ratio \(D/W\); the ‘obstruction’. However we will sweep that under the table for now.
Flow Regimes
Depending on the Reynold’s number this setup displays 4 distinct behaviours.
Re. | Behaviour |
---|---|
<10 | Creeping |
10 - 100 | Recirculation |
100 - 1000 | Vortex Shedding |
> 1000 | Turbulence |
These ranges are only approximate. I have seen various figures quoted in different sources, for instance sometimes it’s stated that shedding starts at around Re = 40. In my results Re = 50 was stable for the whole simulation, unless I manually induced shedding by perturbing the initial velocity field.
Simulation
Initial Conditions
When setting up the simulation we could set the initial velocity to be equal to \(u\) everywhere. However, directly up- & downstream of the cylinder the velocities will be pointing respectively into, and out-of the wall boundary. This means, at these points, the velocity field is not divergence free.
At low velocities this causes a large “sound” wave to propagate outwards from the cylinder (compression at the leading edge and rarefaction at the trailing edge.) The velocities induced by this can be of similar order to the flow features we’re interested in. They will eventually propagate out of the domain but it’s needlessly ugly.
Here is what the initial disturbance looks likes for \(Re = 100:\)

The compression/rarefaction seen here is larger than 4%!
More problematically, at higher velocities (Reynold’s numbers) these pressure waves can even cause the simulation to become unstable and blow up. We have to do something better!
Stream Function
The initial pressure waves are mostly caused by the non-zero divergence of the velocity field. Therefore we can naturally think to start with a velocity field that is divergence free.
There are a few approaches we could try to get such initial conditions. One such is to construct a stream function, and to use this to get the velocity field. This works because stream functions exist only for divergence free flow fields, thus any velocity field derived from a stream function is guaranteed to be divergence free.
The stream function is a 2D scalar field - \(\psi\) - whose derivatives in each direction specify the velocity vector:
$$u_x = \frac{\partial \psi}{\partial y} \qquad u_y = -\frac{\partial \psi}{\partial x}$$Because the velocity is specified by the gradients of \(\psi,\) it is not unique; any constant offset will yield the same velocity field.
A uniform velocity field $(u, 0)$, corresponds to stream functions of the form \(\psi_\text{uniform} = u y + C.\) Wlog. choose \(C = 0,\) so that the stream function is zero along the mid-line of our simulation. Next, we can notice that if the stream function’s gradients are zero at the cylinder walls then we have zero velocity here. Thus if we perturb \(\psi_\text{uniform}\) near the wall s.t. this is the case, we will get a velocity field which has the desired value far from the cylinder, tends to zero near the cylinder and is divergence free. Nice!
Obviously the stream function \(\psi = 0\) has zero gradient everywhere, and conveniently matches \(uy\) on the midline. So we can imagine interpolating between \(\psi = 0\) near the wall and \(\psi = uy\) far from the wall:
$$\psi = f(d)\cdot uy + \cancel{(1-f(d)) \cdot 0}$$where \(d\) is the distance from the solid wall.
The interpolation function must have the following properties:
- \(f(\infty) = 1\) - One far from the walls.
- \(f(0) = 0\) - Zero near the walls.
- \(f^\prime(0) = 0\) - Zero gradient near the walls.
The last property is required because otherwise we would get a non-zero x-velocity at the wall:
$$u_x = \frac{\partial \psi}{\partial y}\bigg|_{d=0} = uy \partial_y f(0) + f(0)u$$An obvious candidate is
$$f(d) = 1-e^{-(d/D)^2}$$because \(e^{-x^2}\) has zero gradient at the origin. Moreover the gradient is actually linear around zero, which matches the expected velocity profile near a wall (see Wikipedia.)
If we plug in \(f\) we can evaluate the exact form of \(u_x\) along the vertical line extending from the cylinder. We find that there is a region (\(d > \sqrt{D/2}\)) where \(u_x > u\). This makes sense physically. We need the overall flow rate to remain constant (mass conservation), so we must have a higher velocity region to compensate the zero velocity near the wall and the channel width reduction due to the cylinder.
Distance Function
To use calculate the stream function above we need to know the distance to our cylinder - \(d.\) This is very simple to calculate using the signed distance function for a circle. However, we can keep it more general by using the Euclidean distance transform. This lets us perform the same calculations for arbitrary geometries.
Here is an example of the stream function initial conditions applied to a semi-circular boundary:

Putting this together gives us the divergence free initial conditions for our simulation. The updated simulation start-up is shown below, with the same colour-scale as above:

The density fluctuations are much reduced, showing a maximum compression/rarefaction of under 1%. More importantly the simulation remains stable for higher Reynold’s numbers.
Density Initial Conditions
But why do we still see any density fluctuations? Ultimately it’s because the Lattice-Boltzmann method (LBM) solves for approximately incompressible flows.
In an incompressible fluid the pressure acts to keep the flow exactly divergence free. Fluid in the LBM is slightly compressible, with the pressure being related to the density via the equation of state
$$P = \rho c_s^2$$But the flow is still almost incompressible so we expect corresponding pressure gradients, and these pressure gradients imply density gradients via the equation of state. Therefore, even without the initial transients, we will always see some density fluctuations in our LBM simulations.
We can use Bernoulli’s principle and the equation of state to relate the density to the velocity as
$$\frac{\rho}{\rho_0} = 1 - \frac{v^2 - u^2}{2c_s^2} \label{density_bernoulli}\tag{1} $$Where I have used the fact that all streamlines in our initial velocity field end at the in-flow boundary, letting us fix the constant in Bernoulli’s relation. \(\rho_0\) is the arbitrary density at the in-flow.
We can use equation \eqref{density_bernoulli} to set the initial condition for the density. Even doing this, however, we would still expect some initial transients. This is because the stream function we used is arbitrary (up to the constraints), and is not a solution to the Navier-Stokes equation (probably).
Surprisingly, I found that setting the initial density in this way actually made the transients marginally worse. Therefore, for my simulations I have simply let the density be uniform throughout the domain.
Plotting
Choosing Colour-scales
In order to accurately compare the results across different Reynold’s numbers it is important for the colour-scales to match. All the results shown here have a velocity magnitude colour-scale from 0 to \(2u\), and a vorticity colour-scale from \(-50u/D\) to \(+50u/D.\) These numerical prefactors were determined empirically to give useable scales.
Streamlines
When visualizing flow fields the streamlines can be a useful addition, even for time varying flows. One difficulty with plotting them, is choosing the seed points from which to integrate the velocity field. For example, if we choose some seed points at the inflow boundary, then we will not see any streamlines in the recirculation regions behind the cylinder. To get around this we can actually make use of the stream function again.
The contours of the stream function correspond to the stream lines of the velocity field. Therefore if we calculate the stream function, we can easily plot streamlines including in disconnected parts of the flow. (This is going the other way from the initial conditions where we went from the stream function to the velocity.)
One definition of the stream function is that it is the flux through any path from a reference point to the current position. This is given by the line integral
$$\psi(\mathbf x) = \int_{\mathbf x_\mathrm{ref}}^{\mathbf x} \mathbf v \cdot \hat{\mathbf n} \,dl$$Because the exact path is not important we can arbitrarily say we will take a path which goes first vertically, and then horizontally from \(\mathbf x_\mathrm{ref}\) to \(\mathbf x.\) Thus the integral becomes
$$\psi(\mathbf x) = \int_{y_\mathrm{ref}}^y v_x dy + \int_{x_\mathrm{ref}}^x v_y dx$$which is something we can easily calculate numerically.
It is important to note that although \(\mathbf x_{\mathrm{ref}}\) does not affect the velocity field we would get back from the \(\psi\), it obviously affects the value of \(\psi\) itself. We need our contour levels to be consistent over time, so we need to choose a reference point whose velocity does not change. I have used the lower-left corner, which is in the inflow boundary and thus has velocity fixed at \(u.\)
Results
This section shows the results of the simulation for various Reynold’s numbers, at a late dimensionless time \(\tau=100.\) This is enough time to let the initialization effects fully propagate out of the domain, and for the flow to establish its long-time behaviour. The chosen Reynold’s numbers display the four regimes mentioned above; creeping flow, recirculation, vortex shedding and, finally, turbulence.
The simulations for Reynold’s number 5 and 50 have domains with grid sizes 1000 × 3000, i.e. 3 million cells. The higher Reynold’s numbers require a higher resolution and so were run at 4000 × 12,000, i.e. 48 million cells.
If you’re interested in seeing the time evolution you can head over to YouTube, for a video showing the simulation for Re = 500 and Re = 5000. Re = 5 and Re = 50 are not worth a video since they settle into a steady-state!
I have not embedded the video here, primarily, because of the cookie implications. But also because, unasked, YouTube has decided the video is a “Short”, which affects the embedding.
Cylinder Frame
First, let’s look at the fluid velocity (magnitude) in the rest frame of the cylinder. This is how the simulation is actually run, and matches the set-up described above with in- and out-flow boundary conditions.

We can see, as expected, the four flow regimes for the different Reynold’s numbers.
For Re \(<\) 100 we see that, after the initial transients have advected out of the domain, the flow settles into a steady-state with no vortex shedding. In both cases there is a region of “stagnant” flow trapped just behind the cylinder, which has zero net velocity.
For Re = 50, if we look closely, we can see a slight non-zero velocity along the midline in the middle of this stagnant region. In fact, the velocity is not only non-zero, it is heading in the direction opposite to the bulk flow! This is because the flow is recirculating. It’s hard to see when looking at the whole domain and with the chosen colour scale, though, so let’s look a zoom of just the recirculation region.

Here I have not blended the streamlines with the velocity field, as above, but just overlain them directly to make them more visible. Because the simulation is steady-state for Re = 50, the streamlines do correspond to pathlines of the flow, and we see that just downwind of the cylinder they are closed loops. This means that fluid is trapped, and being pushed around in a vortex motion, i.e it is recirculating.
At higher Reynold’s numbers the recirculation zone grows, and eventually becomes unstable. First one of the recirculation vortices is shed, then while it reforms the the other is shed, etc. When looking at the plot for Re = 500 above, however, it is not obvious that the periodic flow feature behind the cylinder is a sequence of vortices. The flow is not steady-state, so the streamlines do not correspond to path- or streaklines of the flow. Despite that, the fact they are not closed here doesn’t make them look much like a vortices. This can be remedied by considering the fluid’s own rest-frame.
Fluid Frame
Instead of considering the cylinder fixed, and the fluid as flowing past it with velocity \(u\), we could equally well consider the fluid as being fixed and the cylinder moving through it with velocity \(-u.\) This is the ‘fluid frame’. The fluid frame velocity is given by \(v_x^\prime = v_x - u\) and \(v^\prime_y = v_y,\) and the magnitude is correspondingly changed.

In this frame we see that for Re = 5 and Re = 50, the stagnant region behind the cylinder is, instead, an entrained pocket of fluid. For the fluid rest frame the streamlines do not coincide with the pathlines even for Re = 5 and Re = 50, because in this frame cylinder is viewed as moving, therefore the flow cannot be steady-state.
For Re = 500 and Re = 5000, despite the flow also not being-steady state, the streamlines do form closed loops behind the cylinder, showing that we have formed vortices.
At Reynold’s number 500 they form a regular line, with each vortex rotating in the opposite sense to its neighbours. The vortices are not moving relative to the surrounding fluid, the non-zero velocity in this reference frame purely shows their rotation. Although it is tempting to think it looks like the entrained fluid pocket for lower Reynold’s numbers, it is not. The high velocity in the band behind the cylinder is actually perpendicular to the cylinders motion. Picture the cylinder moving through stationary fluid, leaving a line of vortices, whose centers are stationary, in its wake.
By the time we get to Re = 5000, turbulence has set in and they shoot off in random directions, and tend to form pairs of oppositely rotating vortices. If you watch the video, you can even see that these sometimes collide and the pairs swap vortices.
From the velocity magnitude + streamlines we cannot tell which way the vortices are rotating. This, we can visualize with the vorticity.
Vorticity
The vorticity is given by the curl of the velocity field: \(\nabla\times\mathbf v.\) It is unchanged by a constant offset of the velocity
$$\boldsymbol{\omega}^\prime = \nabla\times\mathbf v^\prime = \nabla\times(\mathbf v + \mathbf u) =\nabla\times\mathbf v + \nabla\times\mathbf u = \nabla\times\mathbf v = \boldsymbol{\omega}$$because for a constant vector \(\nabla\times\mathbf u = 0.\) Therefore, it does not matter if we take the curl of the cylinder-frame or the fluid-frame velocity field.

Orange shows positive vorticity (i.e. anticlockwise rotation) and blue, negative vorticity (clockwise rotation).
Mechanisms
Creeping Flow
At very low Reynold’s numbers, creeping flow shows neither time evolution, nor eddies. In this regime the flow is governed by Stokes equations, which does not have any time dependence. Its inertia is so weak compared to viscosity, that it instantly responds everywhere to the boundary conditions.
Recirculation
For slightly higher Reynold’s numbers the approximation behind Stokes flow, that viscosity completely dominates inertia, starts to break down and we do get eddies forming.
These are formed behind the cylinder by flow separation. As the fluid flows past the cylinder it is at first constricted, which means the velocity must increase (mass conservation) and thus the pressure decreases (Bernoulli’s principle). Once past the midpoint, however, the channel becomes less constricted again and the velocity drops, and thus the pressure increases. The overall flow is therefore going against the pressure gradient at this point. Near the boundary the flow is slower, so this adverse pressure gradient actually causes the flow here to reverse, and go against the bulk flow, causing the eddies.
An obvious question to ask is why is the recirculation stable for these intermediate Reynold’s numbers? Why do we not see vortices being shed here? Unfortunately I could not find a good answer in the literature. (One must exist, surely?)
If I were to hazard a guess I’d say that, although we now have recirculating eddies, the flow is still viscous enough that the bulk behaves more-or-less like Stokes flow. Disturbances in the flow are felt quickly far away via the rapid diffusion of momentum associated with the high viscosity. The bulk flows around the combined “cylinder + eddies” obstacle, which “deconstricts” more gently than the cylinder alone, thereby ameliorating the adverse pressure gradient for the bulk. (One obvious feature of this interpretation is that, although the cylinder has no-slip boundary conditions, the eddies most certainly do not. But perhaps that does not matter.)
Shedding
I think the exact cause of the vortex shedding is not currently well understood. At least, that’s what’s stated in the abstract of nearly every paper I looked at whilst trying to understand the mechanism(s) involved in vortex shedding. Many papers investigating vortex shedding are numerical, rather than theoretical, and examine simulations in fine detail. While useful, I was looking for something more mathematical.
I found this paper by Boghosian & Cassel quite nice; it is at the same time sufficiently mathematical and quite readable. It examines under which situations vortices in 2D incompressible flow split. The gist of the paper is that at points of zero momentum, having \(\nabla\cdot\frac{d\mathbf u}{dt} > 0\) will cause a vortex to split. While I like the paper, I don’t think it explains the vortex shedding mechanism observed here. They show that for a general vortex the pressure gradient is not enough to get the divergent net force, so they need to add a body force. That is clearly not the situation we have here.
A common suggestion for the cause of vortex shedding, which seems sensible to me, is a form of Kelvin-Helmoltz instability.
The fluid-frame velocity magnitude plots for Re = 50 show how we have two shear layers just downstream of the cylinder; one extending from the top, and one from the bottom. This is true in the early flow for higher Reynolds numbers too. Because of the Kelvin-Helmholtz instability, this shear layer is unstable and will naturally form growing vortices.
This makes sense but it cannot be the whole story; the shear layer exists also for lower Reynold’s numbers, and according to theory, shear layers are always unstable (if the densities of the fluid on either side are equal.) This is the flip-side of the question of why the recirculating vortices are stable for low Re; why do they become unstable for higher values of Re?
von Kármán Vortex Street
But wait, what about the famous von Kármán Vortex Street? This is the famous effect whereby vortices shed behind an obstacle create two rows of opposite rotational sense, between which the vortices are offset from one another. This is beautifully illustrated in this satellite image from NASA:
Consider two rows of vortices separated by a distance denoted \(h\), with each row containing vortices rotating in opposite directions. Within each row let the distance between vortices be denoted by \(l\). According to von Kármán’s analysis, any arrangement apart from \(\frac{h}{l} = \frac{1}{\pi}\mathrm{arcosh}(\sqrt{2})\approx0.28\) is unstable.
His original paper from 1911 (in German) can be found here, and an English translation of his second paper, also from 1911, is available here. It is quite readable and I suggest that the interested reader give it a look.
However, my simulations did not show this predicted, and rather beautiful pattern. Instead, the vortices form a single line of alternating sense along the midline. Why?
I think the answer, ultimately, comes down to boundary conditions. I ran the simulations with a larger domain and I do see the predicted offset vortices:

The vortices have been visualized here by letting two tracers (dyes) advect with the fluid; the top half of the obstacle releases red dye, and the bottom green. Where these mix it creates yellow, but you can clearly see the red & green dye trapped in the vortices being carried off downstream.
To Make the tracers more visible downstream I have boosted the contrast between zero- and low tracer concentrations, by updating values to be \(\sqrt C.\)
This simulation was run at Re = 100. In the simulations I ran, at lower Reynold’s number the von Kármán pattern developed closer to the obstacle. For higher Reynold’s numbers it did develop, just somewhat downstream of the obstacle. However, with the original shorter geometry, even at Re = 100 the vortices formed a line, rather than the von Kármán pattern.
Consider the velocity field formed by the vortex street arrangement, in the rest frame of the bulk fluid. I will describe it with words, but to aid the subsequent description, let’s just look at it:

Between neighbouring pairs of opposite sense vortices the fluid flows fast, and on the outer edges of the vortices, slowly. Thus fluid between the two rows is, on average, moved to the left (for our configuration). The vortices themselves “roll” between this moving central fluid, and the stationary bulk fluid. This rolling causes some slight rightward motion of the bulk fluid just outside the vortex street.
If we imagine averaging the fluid’s velocity over time we get the right-hand side of the above figure. This clearly shows how the fluid behind the cylinder is entrained and has non-zero velocity.
This is the reason, then, I believe the boundary conditions are affecting the vortex street formation. For the vortex street, the velocity profile along a vertical slice is anything but uniform and steady-state. By contrast, our boundary conditions set a uniform and constant velocity across the whole right-hand edge.
Because the vortex street has a leftward velocity relative to the velocity at our boundary, it is forced to accelerate as it exits the domain. This has the effect of pulling the vortices inwards. Additionally (and perhaps more importantly?) the constant velocity outflow condition kills any rotational motion as the flow approaches the boundary.
This suggests an interesting test would be to implement some more sophisticated boundary conditions, e.g. non-reflecting / characteristic boundary conditions, which will let the vortices flow out of the system unimpeded.
Population Dynamics
You’ve very likely seen a population pyramid before. I thought it could be interesting to write down a mathematical description of the time evolution of such diagrams, and simulate it.
This post starts by showing the PDE governing the evolution of the population, and how to discretize it & various probability distributions required. Then follows an interactive simulation of a population over time, using the preceding scheme. Lastly, I attempt to mathematically analyze the equation, to see under what conditions various behaviours arise.
Contents
Mathematical Framework
Before diving in, I will make a couple of simplifying assumptions:
- The male & female populations are always balanced; no need to track them separately.
- There is no immigration; the population only grows by birth.
- Fractional people exist; the populations are given by real numbers, not integers.
These assumptions could easily be relaxed, but I don’t think that would make the problem more interesting mathematically.
Population PDE
We would like to model the ‘demographic curve’ - \(n(a, t)\) - which tells us how many people of any given age - \(a\) - are alive at time \(t\).
The time evolution of \(n\) has two contributing terms. Firstly an aging term; everybody gets older at the same rate, namely one year per year. Meaning we will have a right-moving advection term with velocity equal to 1. Secondly a hazard term, representing mortality.
$$\boxed{ \pfrac{n}{t} = -\pfrac{n}{a} - h(a)n \label{eqn_dndt}\tag{1} }$$The hazard rate - \(h(a)\) - is the probability density per unit time that a person of a given age dies (or emigrates, perhaps).
Lastly we need a boundary condition for \(a=0,\) because we cannot have a negative age to advect from. The boundary condition is imposed by the birth-rate
$$n(0, t) = B(t) \label{eqn_n_bc}\tag{2} $$We must now define \(B(t).\)
Birth & Death Rates
The endogenous birth-rate (density) - \(b(a)\) - is the probability density per unit time that a person of a given age reproduces. Therefore the total number of endogenous births per unit time is given by
$$B_n(t) = \int_0^\infty b(a) n(a, t) \,da \label{eqn_B}\tag{3} $$The exogenous birth-rate - \(B_x(t)\) - is a birth-rate which is independent of the population. The total number of births per unit time is their sum \(B = B_n + B_x.\)
Similarly, the total number of deaths per unit time is
$$D(t) = \int_0^\infty h(a) n(a, t) \,da \label{eqn_D}\tag{4} $$Total Population
The total population is given by
$$P(t) = \int_0^\infty n(a, t) \,da$$What is the rate of change of the total population? Well …
$$ \begin{flalign} && \frac{dP}{dt} &= \frac{d}{dt}\int_0^\infty n \,da & \text{Definition of }p \\ && &= \int_0^\infty \pfrac{n}{t}\,da & \text{Linearity} \\ && &= -\int_0^\infty \Big(\pfrac{n}{a} + n h\Big) \,da & \text{Equation }\eqref{eqn_dndt} \\ && &= n(0, t) - n(\infty, t) - \int_0^\infty nh\,da & \text{Fundamental Thm.}\vphantom{\pfrac{}{}} \\ && &= B - D & \text{Equations \eqref{eqn_D} and \eqref{eqn_n_bc}} \vphantom{\pfrac{}{}} \end{flalign} $$To go from the penultimate to the final line we have also assumed that there are no infinitely old people. Feels like a reasonable assumption. Thus the rate of change of the total population is the total births minus the total deaths. Unsurprising but a good check that things are thus far consistent.
Survival, Mortality & Hazard Curves
Equation \eqref{eqn_dndt} is written in terms of the hazard rate, which is a probability density. However, for our discretized simulation, we need actual probabilities and not densities. In this section we will derive the conversion from the hazard rate to actual probabilities, and relate them both to what fraction of people survive to each age.
The contents of this section are related to survival analysis.
Let’s start by defining the mortality function - \(M(a).\) This is the probability that an individual has died by age \(a.\) It is the inverse of the survival function \(S(a) = 1 - M(a),\) which is the probability that an individual has not died by age \(a.\)
Say a person lives to age \(A,\) then \(M(a) = \pp (A \leq a).\) The unconditional (a priori) probability density of dying at age \(a\) is
$$m(a) = \dfrac{M}{a}$$This has the property that it is small for large ages, which is perhaps confusing at first sight but makes perfect sense. What we care about is not the a priori probability density of dying at some age, but rather the probability density conditioned on having survived up to that point. This quantity is our hazard rate.
Intuitively, we can see that if we survived to age \(a_1,\) then the probability of dying between ages \(a_1\) and \(a_2 \geq a_1\) is a simple rescaling of our unconditional mortality curve:
$$H(a_2|a_1) = \frac{M(a_2) - M(a_1)}{1 \,\, - \,\, M(a_1)} = \frac{S(a_1) - S(a_2)}{S(a_1)} \label{eqn_H}\tag{5} $$Thus \(H(a_2|a_1)\) is the hazard probability.
Example
Consider a simple mortality function given by
$$M(a) = \mathrm{erf}\big((a - 80) / 5\big)$$This would tell us, for example, that the a priori probability density of dying at age 100 is quite low; \(M(101) - M(100) \approx 1.8\times 10^{-5}.\) This does not mean 100 year olds are all hale and sprightly, it means we were simply very unlikely to survive to 100 in the first place. The hazard of being aged 100 is actually quite high!
$$H(101|100) = \frac{M(101) - M(100)}{1 - M(100)} \approx \frac{1.8\times 10^{-5}}{3.2 \times 10^{-5}} = 0.57$$The hazard rate is given by the derivative of the hazard probability wrt \(a_2\) evaluated at \(a_1:\)
$$h(a_1) = \pfrac{H(a_2|a_1)}{a_2} \bigg|_{a_2 = a_1} = -\frac{1}{S(a_1)}\dfrac{S(a_2)}{a_2}\bigg|_{a_2 = a_1} \label{eqn_h_rate}\tag{6} $$i.e.
$$\dfrac{S}{a} = -h(a)S$$This is a simple ODE which has solution
$$S(a) = C\exp\left({-\!\int_0^a h(a^\prime) \,da^\prime}\right)$$We can pin down the value of the integration constant \(C\) by noting that \(S(0)\equiv 1\) thus \(C = 1,\) so
$$S(a) = \exp\left({-\!\int_0^a h(a^\prime) \,da^\prime}\right) \label{eqn_S_cont}\tag{7} $$Thus, we can either assume that the survival function is given and use equation \eqref{eqn_h_rate} to calculate the hazard rate. Or, vice-versa, we can assume we are given the hazard rate and use equation \eqref{eqn_S_cont} to calculate the survival function. It doesn’t really matter.
Life Expectancy
The life-expectancy is the mean age at which death occurs i.e.
$$\mu_A = \mathbb E[A] = \int_0^\infty a m(a) \,da$$Solving the Equations Numerically
Discretization & Update Scheme
For our simulation we have both a discrete grid for our ages, and discrete time steps. First, lets choose a grid of ages to represent, with step size \(\Delta a\). Thus we say we know the value of \(n\) at ages \(a_i = i\Delta a\) for \(i \geq 0.\) We require the population to age \(\Delta a\) each time step, so that it lands exactly on our grid, and we suffer no numerical diffusion. Therefore the grid spacing \(\Delta a\) also sets the time step and we have \(t_i = i\Delta a.\)
In the time discretized setting, we cannot work with the hazard rate, but rather we work with the previously defined hazard probability. Specifically we need the probability of dying in any given time step given we reached a certain age. So, rather than considering \(h(a)\) we use \(H(a+\Delta a|a).\) In this way we turn the 2d function \(H\) into the 1d function of \(a\) needed for our simulation. Further, define \(H_i = H(a_i+\Delta a | a_i)\) to be the value of this function at grid point \(i.\)
In our simulation it is entirely possible to have a probability of 1 of dying in the next \(\Delta a\) time. This is also possible in the continuous setting but it implies an infinite hazard rate at the point the survival curve becomes zero. You can see this either by noting \(0 = e^{-\infty}\) from equation \eqref{eqn_S_cont} or the fact we get zero denominator in equation \eqref{eqn_H}
The endogenous birth-rate \(b\) can also be discretized simply by integrating over timesteps;
$$b_i = \int_{a_i}^{a_i+\Delta a} b(a) \,da$$The total birth-rate then becomes a sum
$$B(t_j) = \sum_{i=0}^\infty b_i n(a_i, t_j)$$Our final discretized update scheme is very simple
$$\begin{align} n(a_i, t_j) &= n(a_{i-1}, t_{j-1}) H_i & \qquad i > 0\\ n(a_0, t_j) &= B(t_{j}) H_0 & \end{align}$$but it exactly captures the aging (advection) term and the hazard term of the continuous setting.
Discrete Survival Curve
As mentioned above, we have a choice; we can consider the hazard probabilities to be given, and use these to calculate the survival function, or we can calculate the hazard probabilities from the survival function.
If we consider the hazard probabilities \(H_i\) to be given, we can use them to calculate \(S_i\) thus:
$$S_i = \prod_{j=0}^{i-1} (1 - H_i)$$and we already know that if we consider \(S_i\) to be given then we calculate \(H_i\) as
$$H_i = \frac{S_{i+1} - S_i}{S_i}$$These equations actually hold for any partition of the interval \([0, a_i]\) not just our regular grid.
Interactive Simulation
We’re ready to assemble this together into a simulation of our population evolution. Use the slider inputs below the simulation to play with the birth- & hazard rates, and see what happens. The code for this simulation is available on GitHub.
Analysis of the Equations & Population Evolution
A very natural question to ask is how does the population behave? Do there exist solutions where \(n(a, t)\) is not changing with time? What about the situation where \(p(t)\) is not changing with time? If such solutions exist, are they stable?
Reproduction Rate
One obvious criterion for the solution to be steady-state is that every person should produce (on average) exactly one offspring over their lifetime. Thereby replacing themself, and their child then replacing themself, and so on ad-infinitum.
We know that the probability density of reproducing is \(b(a)\), so we may initially think the overall average number of children per individual is the integral of this quantity. It almost is. But a person may have already succumbed by age \(a,\) so the birth-rate density at this age “counts less”. How much less? It gets weighted by the probability of surviving (at least) to age \(a\) - \(S(a)\).
The integral of this weighted birth-rate gives us the (endogenous) reproduction rate.
$$R = \int_0^\infty b(a) S(a) \,da \label{eqn_R}\tag{8} $$Note, for this value to be finite we require that \(b(a)S(a) \rightarrow 0\) fast enough that the integral converges.
Parameter Cases
Remembering that we also have to consider the effect of the exogenous birth-rate we get 5 cases to consider.
Case | Parameters | Behaviour |
---|---|---|
1 | \(\quad R > 1 \quad\) | \(P\) will grow without bound. |
2 | \(\quad R = 1, B_x > 0 \quad\) | \(P\) will grow without bound. |
3 | \(\quad R = 1, B_x = 0 \quad\) | \(P\) is stable\(^{[1]}\) (but not necessarily steady-state). |
4 | \(\quad R < 1, B_x > 0 \quad\) | \(P\) will approach a steady state. |
5 | \(\quad R < 1, B_x = 0 \quad\) | \(P\) will shrink to zero over time.\(^{[2]}\) |
\(^{[1]}\) It neither blows-up, nor shrinks to zero but can be “oscillating” around a long-run value.
\(^{[2]}\) Thus it approaches the trivial steady state solution of no population!
For any value of \(R\), if \(B_x=0\) then \(n(a, t) = 0\) is a valid solution. This trivial case is uninteresting, I merely note that it exists.
By playing with the interactive simulation above you can see all 5 of these cases display the listed behaviour. The horizontal line marked ‘1’ on the right-hand y-axis corresponds to \(R=1\) for the birth-rate. Next we can try to prove if we must see these behaviours, or under which conditions they arise.
Lagrangian Formulation
We can view equation \eqref{eqn_dndt} through the lense of the total derivative. For a cohort born in year \(\tau\), their age changes as \(a(t) = t - \tau.\) Thus
$$\frac{d n}{d a} = \pfrac{n}{a} + \pfrac{t}{a}\pfrac{n}{t} = \pfrac{n}{a} + \pfrac{n}{t} = - h(a)\cdot n \label{eqn_dnda_tot}\tag{9} $$This tells us that, for a specified cohort, the population evolution is an governed by nothing more than an exponential decay with the hazard rate. This looks familiar; it is the same differential equation we saw for the survival curve \(S.\) Thus the solution is
$$n = C \exp\left(-\int_{0}^a h(a^\prime)\,da^\prime\right) = C S(a)$$We know the initial population of this cohort was given by the birth-rate at time \(\tau\), which sets the integration constant \(C = B(\tau).\) Thus
$$n(a, t) = B(t - a) S(a) \label{eqn_n_soln_1}\tag{10} $$This is a useful equation because it lets us rewrite various quantities involving \(n\) in terms of \(B.\) For example
$$P(t) = \int_0^\infty n(a, t) \,da = \int_0^\infty B(t-a)S(a) \,da$$This equation makes intuitive sense, the total population now is given by contributions from the birth-rate at each previous time, weighted by how many people survived form those previous times until now.
Further, combining equation \eqref{eqn_n_soln_1} with equation \eqref{eqn_B} gives us an equation solely in terms of the birth-rate:
$$B_n(t) = \int_0^\infty b(a)S(a) B(t-a)\,da \label{eqn_B_soln_1}\tag{11} $$Aside
I had somehow expected we would be able to write things as a delay differential equation for \(B,\) however that isn’t quite what we get. You can take the time derivative of equation \eqref{eqn_B_soln_1} and you get
$$\dfrac{B_n}{t} = \int_0^\infty b(a)S(a) \pfrac{B}{t}(t-a)\,da$$So the rate of change of \(B_n\) (and thus \(B\)) depends not on its values in the past, but rather on how it was changing in the past.
In any case, equation \eqref{eqn_B_soln_1} looks almost like the definition of \(R\) (see equation \eqref{eqn_R}), which makes it useful for characterizing the behaviour.
Existence of Steady-State Solutions
Equation \eqref{eqn_B_soln_1} actually lets us directly show for which cases a steady-state solution does exist. At steady state we must have constant endogenous and exogenous birth-rates. Because the birth-rates are constant then we can pull them out of the integral in \eqref{eqn_B_soln_1} giving
$$B_n = (B_n + B_x)R$$If \(B_x = 0,\) this means we have \(B_n = R B_n,\) which is true only when \(R = 1.\) Thus for \(R = 1\) and \(B_x = 0\) we have a steady-state solution, this is case 3 above.
Now consider \(B_x > 0.\) Rearranging slightly we get
$$B_n = B_x \frac{R}{1-R}$$This clearly only has a physically meaningful solution if \(R < 1,\) which corresponds to case 4 above. Each person does not quite replace themselves, but this is compensated for by a non-zero exogenous birth-rate. As \(R\) approaches one from below, \(B_n\) blows up towards infinity. This means the steady state total population gets very large as our reproduction rate approaches unity.
For \(R = 1\) the right-hand side is undefined and there is no steady-state solution. The reason is clear; at \(R = 1\) each person exactly replaces themselves via the endogenous birth-rate, but we are also constantly increasing the population via the exogenous birth-rate.
For \(R > 1\) the endogenous birth-rate becomes negative which is physically meaningless. There is no constant \(B_n > 0\) which solves equation equation \eqref{eqn_B_soln_1} in this case.
Note, this says only that steady-state solutions exist in the two cases specified; it does not say that we will always approach these solutions from any set of initial conditions.
For example, in the case of \(B_x = 0, R=1,\) if \(b\) is given by a delta function, then we will get delta-pulse baby-booms, which persist forever. However we will never approach a true steady-state solution. I believe, on the other hand, that if there is some width to \(b\) then we will approach the steady-state solution but I have not proved that.
Expanding & Contracting Total Population
For the cases without a steady-state solution (cases 1, 2, and 5) it seems difficult to make any concrete statements with full generality.
A delta pulse of birth-rate (i.e. a single cohort) will be spread by convolution with the endogenous birth-rate \(b,\) which itself gets convolved with \(b,\) spreading it further etc. It seems intuitive that, for \(R > 1,\) eventually there will have been more people born than contained in the initial cohort. However, the convolution means they can be spread out so that at any one moment the birth-rate is lower than the initial rate.
In fact, its even possible for the population to be non-monotonic but still grow over time; consider a situation where 20% of people die in infancy, but then adults have on average 2 children (per parent). In this situation each person has \(0.8 \times 2 = 1.6\) offspring, and the population clearly grows with each generation, despite initially falling due to the infant mortality.
Probably because of such “corner cases”, I haven’t found a convincingly rigorous argument for the “facts” that, in the long run, the population must grow if \(R > 1\) and shrink if \(R < 1\). I suspect that any proof will rely on the fact that there is not infinite “space” for the generational convolutions to expand into. The survival curve allows for a certain range of ages, so eventually the trailing edge of one generation starts overlapping with leading edge of the next. This means we probably must specify some properties that \(S(a)\) and \(S(a)b(a)\) obey, in order to obtain a proof.
Cohort Resultant Population
A last note.
Effectively what we are asking involves counting the current (live) population arising from a single cohort at each moment in time. That is to say, how many of the cohort itself are alive, how many of the cohorts direct offspring, their offspring, and so forth? Let’s write that down, at least.
Without loss of generality, consider a cohort size of 1; due to linearity of integrals any other size is just given by a factor on the whole expression. For a cohort of age \(a\) the total current population arising due to that cohort is given by
$$ p(a) = S(a) + \int_0^{a} b(a^\prime) S(a^\prime) p(a - a^\prime)\,da^\prime\qquad t \geq \tau \label{eqn_p_cohort}\tag{12} $$This expression can be easily understood. The first term is how many of the original cohort are surviving at age \(a.\) The second is the current resultant population arising from each direct child cohort, whose age is the parent cohort’s current age, less the parents’ age when the children were born. Again, \(p\) represents a “unit” cohort and it is weighted by the birth-rate when they were born. The recursive nature of the expression captures subsequent generations arising from the original cohort.
What we are seeking to prove is that \(p(a)\) diverges for \(R > 1\) and converges to zero for \(R < 1.\)
In principle, equation \eqref{eqn_p_cohort} really tells us a lot about the dynamics of the total population; it captures everything about the behaviour that arises due to the endogenous birth-rate. If we could solve it, the whole dynamics of the total population - \(P\) - could be obtained by convolving \(p\) with the exogenous birth-rate \(B_x\). (This is not quite as much information as is contained in the demographic curve - \(n\) - of course, but still good.)