Update
As of this month, I am starting a new position, after having had some time between jobs.
Between my family, a new job, and life in general, I expect my posting frequency will drop
significantly.
I still have plenty of drafts and ideas for posts, so hopefully it will not drop all the way to zero!
(Though within \(\varepsilon\) of zero is not unlikely.)
- Dan
Latest Posts
Favourite Posts
Interactive Linear Shallow Water Simulation
This is an interactive simulation of the linearized shallow-water equations. For info on the the equations themselves, see the last post Deriving the Shallow Water Equations. The simulation uses WebGPU to accelerate both the simulation and the rendering. I will discuss these below, but without further ado here’s the simulation:
Click on the canvas below, then click "Start"...
The simulation has varying bathymetry with the water becoming shallower towards the right-hand edge of the domain, like a beach.
Numerical Method
The numerical method is a simple finite difference simulation. It uses central differences for the spatial derivatives, and the improved Euler method for time-stepping. The full update rules are verbose, but simple. Expand the box below to view them if you’re interested.
We implement the computation as WebGPU compute shaders. These shaders cannot see updated values from array indices other than our own until the end of the pipeline. However, our final value computation requires knowing the value of the intermediate value at neighbouring points. Because of these two facts, we have to split our intermediate & final value calculations into two separate shaders. With a third handling the setting of the boundary condition values.
If we naively split our computation just as they are written above, we may think we need two sets of intermediate arrays; one for the intermediate values, and one for the corresponding “deltas”:
$$\begin{align} \Delta y_{t} &= f(y_t)\Delta t \\[0.2em] \tilde y_{t+1} &= y_t + \Delta y_t \label{impeul1}\tag{1} \\[0.2em] \Delta\tilde y_{t+1} &= f(\tilde y_{t+1})\Delta t \\[0.2em] y_{t+1} &= y_t + (\Delta y_t + \Delta \tilde y_t) / 2 \\[0.2em] \end{align}$$However, we can eliminate \(\Delta y\) by substituting in equation \eqref{impeul1}, so
$$\begin{align} y_{t+1} &= y_t + (\Delta y_t + f(\tilde y_{t+1})\Delta t) / 2 \\ &= y_t + (\tilde y_{t+1} - y_t + f(\tilde y_{t+1})) / 2 \\ &= y_t + \tilde y_{t+1} /2 - y_t / 2 + f(\tilde y_{t+1})\Delta t / 2 \\ &= (1/2) (y_t + \tilde y_{t+1} + f(\tilde y_{t+1})\Delta t) \end{align}$$Thus we only need to keep around the intermediate value, and not the original gradient which was used to calculate it.
Because I am using central differences I do not update the outer edges of the simulation domain, using these, instead, to set the boundary conditions.
Boundary Conditions
There are two types of boundary conditions used for this simulation; (non-inverting) reflecting and non-reflecting.
- Reflecting
- \(\partial_y \eta = 0\,\) and \(\,\partial_y \mathbf u = 0\,\) along the horizontal boundaries.
- \(\partial_x \eta = 0\,\) and \(\,\partial_x \mathbf u = 0\,\) along the right-hand vertical boundary.
- Non-reflecting:
- \(\partial_t \eta = c \partial_x \eta\,\), \(\,\partial_x u = -\frac{1}{bc} \partial_x \eta\,\) and \(v=0\) along the left-hand vertical boundary.
The reflecting boundaries are enforced by setting the outermost cell values s.t. the central differences are zero.
The non-reflecting boundary conditions correspond to the equations satisfied by a purely left-moving wave. They are enforced by adding values to the leftmost cell values corresponding to the discretization of this equation:
$$\Delta \eta_{0,j} = c\,(\eta_{1,j} - \eta_{0,j}) \frac{\Delta t}{\Delta x}$$and similarly for \(u\). This is the most simple, first order, discretization of the left-moving advection equation. I’m sure there are more accurate ways to calculate this.
Rendering
The rendering is also handled by a WebGPU shader. There are two steps, which are both relatively simple, that get combined into the final image. First, I model the effect of the depth of the water on the colour we expect to see; for very shallow water we see the yellow of the beach, whereas for deeper water we see only the dark blue of the water itself. Second I add specular reflections of the water’s surface using Phong lighting.
The rendering of the sand is pretty simple; I define a base colour for the sand (#beba00) and the water (#4eb2ff), then I blend them depending on the depth of the water. The blending is exponential, following the Beer-Lambert law. Because water absorbs red light much more strongly than green or blue we apply a different exponential decay for the red, green and blue channels. Thus the final colour is given by
$$\text{Sand}_c\cdot e^{-\lambda_c h} + \text{Water}_c\cdot \big(1-e^{-\lambda_c h}\big) $$for \(c\in \{R, G, B\}\) and $\lambda_R=\lambda_G=2\lambda_B$. These particular values of the decay coefficients aren’t at all physically realistic, but they look okay.
Then, using the formula in the derivation post, we can calculate the normal vector to the surface. This lets us apply the formulas for Phong lighting, as detailed on Wikipedia, to get the specular & diffuse lighting for the water’s surface.
The rendering is implemented as a fragment shader, which renders onto a single triangle covering the whole viewport.
This is initialized with a very simple vertex shader.
The WebGPU shader code for this simulation can be found here.
(Warts debug lines and all)
Deriving the Shallow Water Equations
This post shows the derivation of the shallow water equations. These equations describe the flow velocity and height of the water’s free surface when the horizontal extent is much greater than the depth. For example, tidal or oceanic flows can be well modelled by these equations. They are derived directly from the Euler equation for inviscid flows.
Contents
Flow Geometry
We consider the depth of the water at each point, with a spatially varying bathymetry. This geometry is shown here:
We can easily convert the fluid depth (\(h\)) and the bathymetry (\(b\)) into a height above (or below) a reference height (\(\eta\)). For geophysical flows, for example, this would be the geoid. Note that \(b\) is the seafloor depth (below reference), so it has the opposite sign to \(\eta\), the height. Consequently we have \(h = \eta - b\).
Derivation
The shallow water equations are derived from the divergence free condition and the incompressible Euler equation, which describe mass- and momentum balance respectively, for the full 3D fluid:
$$\nabla\cdot\mathbf u = 0 \label{div_u}\tag{1}$$$$\Dfrac{\mathbf u}{t} = -\frac{\nabla P}{\rho} + \mathbf f \label{euler}\tag{2}$$
We want to convert these into equations describing the net horizontal components of the velocity, and the fluid depth.
The net horizontal motion of the fluid is obtained by depth averaging the underlying fluid velocity. If we denote the 3D velocity components as \(\mathbf u = (u, v, w)\), then the net velocity’s x-component is given by
$$\bar u(x, y, t) = \frac{1}{h}\zint{u(x, y, z, t)} \label{uavg}\tag{3}$$And similarly for \(\bar v\), meaning we get the 2D velocity field \(\bar{\mathbf u} = (\bar u, \bar v)\).
Conservation of Mass & the Fluid Height
The derivation begins by integrating the divergence free condition for the fluid, \eqref{div_u}, in the vertical direction.
$$ \zint{\Big[{\color{#e2c576} \pfrac{u}{x}} + {\color{#92d8e6} \pfrac{v}{y}} + {\color{#be98df} \pfrac{w}{z}}\Big]}= 0 $$The z term can be handled directly by the Fundamental Theorem of Calculus. For the x and y terms, we want to pull the spatial derivatives outside the integral, but the integrals’ bounds (\(-b\) and \(\eta\)) depend on position. Therefore, in order to pull the spatial derivatives outside, we must take care to apply the Leibniz integration rule correctly. Applying this gives us the following:
$$\begin{align} 0 = &{\color{#e2c576} -u({\small z=\eta}) \pfrac{\eta}{x} - u({\small z=-b})\pfrac{b}{x} + \pfrac{}{x}\zint{u} } \\ &{\color{#92d8e6} -v({\small z=\eta}) \pfrac{\eta}{y} - v({\small z=-b})\pfrac{b}{y} + \pfrac{}{y}\zint{v} } \\ &{\color{#be98df} +w({\small z=\eta}) - w({\small z=-b}) \vphantom{\zint{}}} \end{align}$$Reordering the terms, and using equation \eqref{uavg}
$$\begin{flalign} && 0 = &\pfrac{(h\bar u)}{x} + \pfrac{(h\bar v)}{y} & \\ && &-\Big({\color{#7bd97e} u({\small z=-b})\pfrac{b}{x} + v({\small z=-b})\pfrac{b}{y} + w({\small z=-b}) }\Big) & \text{Seabed} \\ && &-\Big({\color{#d58080} u({\small z=\eta})\pfrac{\eta}{x} + v({\small z=\eta})\pfrac{\eta}{y} - w({\small z=\eta}) }\Big) & \text{Surface} \\ \end{flalign}$$The next step is to identify the boundary condition terms arising from the integrals.
Surface Normal Vectors
The boundary conditions at both the seabed and the free surface will involve normal vectors. For an arbitrary (differentiable) function \(f(x)\), a vector normal to this function can be found by taking the gradient of a particular scalar field:
$$\mathbf n = \nabla (z - f(x)) = \left[\begin{matrix}-f^\prime(x) \\ 1\end{matrix}\right]$$Note that the resulting vector \(\mathbf n\) does not depend on z. Neither is \(\mathbf n\) a unit vector in general (except at extrema of \(f\)), however this does not matter for our derivation.
This construction is illustrated here:
Sea-floor Boundary Condition
The first boundary condition is simple enough; the fluid’s velocity at the seabed must be tangential to the seabed itself. If it were not, the fluid would penetrate through the sea floor or leave a vacuum. Mathematically this condition states that the fluid velocity vector is orthogonal to the seabed’s normal vector, i.e.
$${ \mathbf u({\small z = -b}) \cdot \nabla(z + b(x, y))} = 0$$Expanding the gradient and performing the dot-product gives us exactly the seabed boundary terms above (highlighted green), so we see these terms all disappear.
Note that a no-slip boundary condition would give the same result, but is not as general.
Free Surface Boundary Condition
The second boundary condition is slightly subtler. The height of the free-surface at any given position can change in time. However, relative to this vertical motion, the fluid’s velocity must still be tangential to the free surface*. To see why this makes sense, imagine sitting in a reference frame where the free surface (at position \(x\)) is stationary, then we get effectively the same condition as at the seabed.
The vertical speed of the free surface is \(\pfrac{h}{t}\hat{\mathbf z}.\) Thus, using the same construction for a normal vector as above,
$$\Big(\mathbf u({\small z=\eta}) - \pfrac{h}{t}\hat{\mathbf z}\Big)\cdot\nabla(z - \eta(x, y)) = 0$$Expanding this and rearranging yields
$$-\Big({\color{#d58080} u({\small z=\eta})\pfrac{\eta}{x} + v({\small z=\eta})\pfrac{\eta}{y} - w({\small z=\eta}) }\Big) = \pfrac{h}{t}$$So, finally, we can plug this into the equation above for our free surface boundary condition and get
$$\boxed{ \pfrac{h}{t} + \nabla\cdot(h\bar{\mathbf u}) = 0 \label{dhdt}\tag{4} }$$This equation describes mass conservation in the fluid with a free surface. The resulting 2-dimensional flow is no-longer divergence free, we can accommodate non-zero divergence by either “piling up” fluid at a point, or draining it away.
This is the same equation as the usual mass conservation equation for compressible fluids, with the fluid’s height now taking the place of density!
Equation \eqref{dhdt} is in conservation form. By expanding the spatial derivative we can convert it to Lagrangian form:
$$\Dfrac{h}{t} = -h\nabla \cdot \bar{\mathbf u} \label{DhDt}\tag{5} $$The Long-wavelength Limit & Velocity Components
Now we turn to the momentum conservation equations, i.e. Euler’s equation. To make progress here we have to make a long-wavelength assumption.
The long-wavelength limit is that in which things vary over horizontal length-scales much greater than the fluid’s depth. This means the changes in the vertical direction also have to be slow, and the vertical acceleration is approximately zero. Therefore the \(z\) component of the Euler equation becomes
$$-\frac{1}{\rho}\pfrac{P}{z} - g = \Dfrac{w}{t} \approx 0$$where we have used the fact that the vertical component of the body force is gravity alone. Integrating with respect to \(z\) gives
$$P = - \rho g z + C$$We can fix the integration constant by noticing that at the surface we must have atmospheric pressure, so
$$P_0 = - \rho g \eta + C$$thus
$$P = P_0 - \rho g (z - \eta)$$This is the pressure profile we would get in hydrostatic equilibrium. This shows the long-wavelength limit is equivalent to assuming we are approximately at hydrostatic equilibrium. Now we can plug this into the horizontal components of Euler’s equation. Focusing on just the x-component, this gives
$$\begin{align} \Dfrac{u}{t} &= -\frac{1}{\rho}\pfrac{P}{x} + f_x \\[0.2em] &= -g \pfrac{\eta}{x} + f_x \\[0.2em] \end{align}$$All that remains now is to depth-integrate this equation. However, here we run into a snag; the left-hand side is not linear, and so we can’t just pull the integral through the material derivative.
Thus we make our second assumption: that the horizontal components of the velocity do not vary with depth. By expanding the material derivative on the left-hand side, and identifying \(u\equiv\bar u\) and \(v\equiv\bar v\), we get
$$\Dfrac{\bar u}{t} = -g \pfrac{\eta}{x} + f_x \label{DuDt}\tag{6}$$where now the left-hand side is a 2D material derivative, for the 2D depth averaged velocity.
Equation \eqref{DuDt} is in Lagrangian form; let’s convert it to the conservation form.
To do this it is easier to consider the total x-momentum \(h\bar u\)1, rather than the velocity \(\bar u\). The material derivative obeys the usual product rule, therefore
$$\begin{align} \Dfrac{(h\bar u)}{t} = \pfrac{(h\bar u)}{t} + \bar{\mathbf u}\cdot \nabla (h\bar u) = h\Dfrac{\bar u}{t} + \bar u\Dfrac{h}{t} \\[0.2em] \end{align}$$Now plug in equations \eqref{DhDt} and \eqref{DuDt} for the material derivatives on the right-hand side, and use the product-rule in reverse for the spatial gradients (highlighted in yellow)
$$\begin{align} \pfrac{(h\bar u)}{t} + {\color{#e2c576} \bar{\mathbf u}\cdot \nabla (h\bar u)} &= h\Big(-g\pfrac{\eta}{x} + f_x\Big) + {\color{#e2c576} \bar u (-h\nabla\cdot \bar{\mathbf u})} \\[0.2em] \pfrac{(h\bar u)}{t} + {\color{#e2c576} \nabla\cdot (h\bar u \bar{\mathbf u})} &= -gh\pfrac{\eta}{x} + hf_x \\[0.2em] \end{align}$$The derivation is identical for the y component.
This completes the derivation. Henceforth, we will only ever consider the two dimensional velocity, so I’ll drop the overbar.
Shallow Water Equations in Conservation Form
The full shallow water equations in conservative form read
$$\begin{align} \pfrac{h}{t} + \nabla \cdot (h \mathbf u) &= 0 \\[0.2em] \pfrac{(h\mathbf u)}{t} + \nabla \cdot (h \mathbf u \otimes \mathbf u) &= -gh \nabla \eta + h \mathbf f \end{align}$$The first equation has no source terms (zero right-hand side), which tells us that mass is exactly conserved. The water column depth can (and does) change both in space and time, but throughout the system as a whole, fluid is neither created nor destroyed.
The second equation does have source terms, telling us that momentum is not conserved. This is not surprising, the change in momentum comes exactly from the body forces experienced by the fluid. These being gravity for the vertical direction, and any other forces in the horizontal directions (e.g. Coriolis force.)
It is a simple matter to write these in terms of the displacement height \(\eta\), we just substitute in \(h= \eta + b\), yielding
$$\begin{align} \pfrac{\eta}{t} + \nabla \cdot (\eta \mathbf u) &= {\color{#92d8e6} -\nabla\cdot(b\mathbf u)} \\[0.2em] \pfrac{(\eta\mathbf u)}{t} + \nabla \cdot (\eta\mathbf u \otimes \mathbf u) &= -gh \nabla \eta + h \mathbf f \,{\color{#92d8e6} - \nabla\cdot(b \mathbf u \otimes \mathbf u)} \end{align}$$We gain extra source terms for both equations (highlighted in blue), which represent the effect of the bathymetry on the fluid’s height.
Comments on the Derivation
In order not to break up the flow of the derivation with asides and notate bene, I have put these observations into this separate section.
Tong presents a very terse & understandable description of the derivation, but with the caveat that he treats only the case of uniform bathymetry. Because the seabed has a uniform depth, the bottom boundary condition becomes a condition purely on \(w,\) and the Leibnitz terms arising from moving the depth integral inside the horizontal spatial derivatives become zero.
Free Surface Boundary Condition
The free surface boundary condition seems to be presented differently between sources. For example, in Segur and Tong it is given as
$$\Dfrac{\eta}{t} = w({\small z = \eta})$$Rather than the “no relative normal flow” condition used in Dawson & Mirabito.
This seems to make sense, but it actually relies on identifying the x and y components of the 3D velocity at the surface, with the 2D depth averaged velocity. This is something you can validly do only if you’ve already made the second assumption (uniform horizontal velocity profile).
Only the “no relative normal flow” criterion correctly applies to the 3D velocity, avoiding the need to make the uniform horizontal velocity profile at this early point in the derivation. (In Segur they have not explicitly made this assumption when using the boundary condition. However, they do make it later on, meaning it comes out in the wash.)
One doubt I had about the “no relative normal flow” constraint was; why we do not need to account for the fact the free surface is changing shape? I suppose it’s because the divergence free condition is something that must be obeyed at all instants of time and at every location separately.
Depth Integration of the Euler Equation
At the point we integrate the Euler equation we made the uniform horizontal velocity profile assumption. This lets us identify the horizontal components of the 3D velocity with the 2D depth averaged velocity, and side-step the vertical integration.
Had we not made this assumption, we would have obtained an equation very similar to equation \eqref{DuDt}, but with extra terms arising from the interaction of the non-linear advection term and the integration. Dawson & Mirabito mention this, but they do not give the form of these terms. They refer to them simply as “differential advection terms” (differential here meaning the velocity is different at different depths,) and make the second assumption by assuming these terms are zero.
Analogous to the way the distribution of microscopic velocities in the Boltzmann equation leads to diffusion of momentum (see here), I expect these terms (which are associated with the distribution of horizontal velocities) would manifest as a kind of “viscosity”.
Linearization & Gravity Waves
In their full glory, the shallow water equations contain many non-linear terms. We can linearize them by assuming the velocity \(\mathbf u\) and height \(\eta\) are only small perturbations away from zero. In this way, any cross terms of \(\mathbf u\) and \(\eta\) are even smaller, and we can neglect them. So, simply dropping all non-linear terms in equations \eqref{dhdt} and \eqref{DuDt} gives
$$\begin{align} \pfrac{\eta}{t} &= - \nabla \cdot(b\mathbf u) \label{detadt_linear}\tag{7} \\[0.5em] \pfrac{\mathbf u}{t} &= -g\nabla \eta + \mathbf f \label{dudt_linear}\tag{8} \end{align}$$Note that the water column depth \(h=\eta+b\) may be large because \(b\) can be large. Moreover, remember that \(b\) is a fixed profile, so it does not count as making things non-linear.
It is simple to show that the linear Shallow water equations give rise to gravity waves, as one might expect. For this, let’s assume a constant depth \(b=B\), and no horizontal body force (\(\mathbf f = 0\)). First, take the time derivative of equation \eqref{detadt_linear} then substitute in equation \eqref{dudt_linear}:
$$\begin{flalign} && \partial_{tt}{\eta} &= -\partial_t \nabla\cdot(B\mathbf u) & \text{Differentiate} \\[0.5em] && &= -B\nabla\cdot (\, \partial_t\mathbf u) & \text{Commutativity} \\[0.5em] && &= -B\nabla\cdot (-g\nabla \eta) & \text{Substitute \eqref{dudt_linear}} \\[0.5em] && &= gB\nabla^2 \eta & \text{Distribute} \end{flalign}$$which is the wave equation with \(c = \sqrt{gB}\). This matches the result one obtains from dimensional analysis.
This derivation is similar to how we show that electromagnetic waves arise from Maxwell’s equations.
The fact the wave speed depends on depth is often used as an explanation for why waves break on the beach. The argument goes like this: as a wave climbs the beach the depth, and thus \(c\), decreases. Thus the faster water from behind catches up to the slower water ahead. Eventually there is so much water piling up that it becomes unstable and overtops.
If we don’t assume constant depth and no body force, we merely gain a couple of extra terms, turning our wave equation into an inhomogeneous wave equation.
Alternative Derivation
Here is another derivation of the shallow water equations, which utilizes the divergence theorem. This derivation is based directly on the notion of conservation, and so it directly gives us the conservative form.
Consider a finite sized water column, of arbitrary cross-section. Denote the cross-section of the water column by \(A\), its boundary by \(\partial A\), and the unit normal to the boundary as \(\hat{\mathbf n}.\)
The derivation considers the flow of fluid & momentum through the sides, and into the water column.
Mass Conservation
The inflow of fluid through any side is given by
$$q = \int_{-b}^\eta \mathbf u\cdot(-\hat{\mathbf n})\,dz = -(h\bar{\mathbf u})\cdot\hat{\mathbf n}$$\(\hat{\mathbf n}\) is the outward unit normal, hence the minus sign. We can write the net inflow as
$$Q = - \oint_{\partial A} (h\bar{\mathbf u})\cdot\hat{\mathbf n} \,dl $$The divergence theorem tells us
$$Q = - \int_A \nabla\cdot (h\bar{\mathbf u})\,dA$$Now coming from the other side, because our fluid is incompressible, the change in volume of fluid inside our region \(A\) must manifest as a change in the water’s depth. Therefore
$$Q = \int_A \pfrac{h}{t} \,dA$$We did not specify anything about the size nor shape of region \(A\), therefore the integrands themselves must be equal giving
$$\pfrac{h}{t} = -\nabla\cdot (h\bar{\mathbf u})$$which is the same as our mass conservation equation \eqref{dhdt}.
I actually find this derivation a bit more natural, but I have not seen it presented anywhere. Perhaps it contains an error I’ve not spotted? Maybe using the incompressibility to say flow results in a height change, while physically correct, is jumping the gun?
Momentum Conservation
For the momentum, this conservation view point is not simpler than the derivation presented above, because there identifying \(u\equiv\bar u\) avoids the depth integration. However, I will include the alternative derivation, partly for completeness, and partly because it gives a different perspective (Eulerian vs Lagrangian.)
As above, I’ll show the derivation for the x-component of velocity only. The y-component is derived in exactly the same manner.
The x-momentum flow into the water column through any side is given by
$$\begin{align} j = - \zint{u\mathbf u \cdot \hat{\mathbf n}} = -(h\bar u \bar{\mathbf u} + \Delta\bar{\mathbf u})\cdot\hat{\mathbf n} \end{align}$$where \(\Delta\bar{\mathbf u}\) represents the “differential advection terms” discussed above. From here on we make the uniform horizontal velocity profile assumption and set \(\Delta\bar{\mathbf u} = 0.\)
We then integrate this over the whole surface of the water column to get the total momentum flux into the volume, and apply the divergence theorem
$$\begin{align} J &= - \oint_{\partial A} h\bar u\bar{\mathbf u}\cdot\hat{\mathbf n} \,dl \\[0.2em] &= - \int_{A} \nabla\cdot( h\bar u \bar{\mathbf u}) \,dA \label{J_as_div}\tag{9} \end{align}$$Similarly to above, let’s consider the change of momentum throughout the volume. Here, however, we must be careful because we now have sources of momentum in the volume; namely the forces the fluid feels. The total rate of change of momentum at any point is the sum of the advection (momentum flow) and source (force) terms. Thus the change of x-momentum in our volume due only to advection is
$$J = \int_{A}\zint{\pfrac{u}{t} - \Big(\!-\!\frac{1}{\rho}\pfrac{P}{x} + f_x\Big)}\,dA$$Now we make the long-wavelength / hydrostatic assumption exactly as above, then vertically integrate.
$$\begin{align} J &= \int_{A}\zint{\pfrac{u}{t} - \Big(\!-\!g\pfrac{\eta}{x} + f_x\Big)}\,dA \\[0.2em] &= \int_{A} \pfrac{(h\bar u)}{t} - \Big(\!-\! hg \pfrac{\eta}{x} + hf_x\Big)\,dA \\[0.2em] \end{align}$$We have again used the identification \(u\equiv\bar u\) in this step.
Equating this with equation \eqref{J_as_div} and noting that \(A\) was completely arbitrary gives immediately
$$\pfrac{(h\bar u)}{t} + \nabla\cdot(h\bar u\bar{\mathbf u}) = -hg\pfrac{\eta}{x} + hf_x$$completing the derivation.
References
-
We can ignore density since it cancels everywhere at the end. ↩︎
Standard Errors of Volatility Estimates
Sample Variance
The sample variance is given by the formula
$$ S^2 = \frac{1}{n-1}\sum_{i=1}^n (X_i - \bar X)^2 \label{sample_var}\tag{1}$$where \(\bar X\) is the sample mean
$$ \bar X = \frac{1}{n} \sum_{i=1}^n X_i $$Equation \eqref{sample_var} is an unbiased estimator for the population variance \(\sigma^2\), meaning
$$\EE{S^2} = \sigma^2$$Where we have used the fact that the \(X_i\) are iid. so \(\mathbb E[X_i X_j] = \mathbb E[X_i]\mathbb E[X_j] = \mathbb E[X_i]^2\) for \(i\neq j.\)
The variance of the sample variance is given by
$$\var{S^2} = \frac{\sigma^4}{n}\left(\kappa - 1 + \frac{2}{n-1}\right)$$where \(\kappa\) is the population kurtosis (raw, not excess!) Note \(\kappa \geq 1\) always.
The standard error of the sample variance is
$$\varepsilon({S^2})=\frac{\std{S^2}}{\sigma^2} = \sqrt{\var{S^2}} = \sqrt{\frac{\kappa-1}{n} + \frac{2}{n^2-n}}$$For large \(n\) this becomes
$$\varepsilon(S^2) \approx \sqrt{\frac{\kappa-1}{n}}$$Ultimately the standard error of the sample variance scales as \(\varepsilon \sim n^{-1/2},\) similarly to the standard error of the sample mean.
Sample Standard Deviation
The population standard deviation - \(\sigma\) - is given by the square root of the population variance. Similarly, the sample standard deviation - \(S\) - is given by the square root of the sample variance.
We know that \(S^2\) is an unbiased estimator for \(\sigma^2,\) so Jensen’s inequality tells us that \(S\) must be a biased estimator for \(\sigma;\) specifically it is an underestimate. Jensen’s inequality states that, for any concave function \(\phi,\) we have
$$\mathbb E[\phi(X)] \leq \phi(\mathbb E[X])$$The square-root function is concave and so
$$\mathbb E[S] = \mathbb E[\sqrt{S^2}] \,\,\leq\,\, \sqrt{\mathbb E[S^2]} = \sqrt{\sigma^2} = \sigma$$Although \(S\) is a biased estimator for \(\sigma,\) it is consistent. That is, as \(n \rightarrow \infty\) we do get \(S \rightarrow \sigma.\)
What is the standard error on the sample standard deviation? Jensen’s inequality makes it hard to derive any fully general results. However, for large \(n\) we can derive a result.
Note that because \(\varepsilon \sim n ^{-1/2}\) as \(n\) gets large the distribution of \(S^2\) gets tighter around \(\sigma^2.\) Taylor expand \(\sqrt{S^2}\) around \(\sigma^2:\)
$$ \sqrt{S^2} = \sqrt{\sigma^2} + \frac{S^2 - \sigma^2}{2\sqrt{\sigma^2}} + O\big((S^2 - \sigma^2)^2\big) $$Keeping only the first order term, we get
$$S = \sigma + \frac{S^2 - \sigma^2}{2\sigma} \label{S_large_n}\tag{2}$$Now we can calculate the variance of this first order estimate of \(S:\)
$$\begin{align} \mathrm{Var}(S) &= \mathrm{Var}\Big(\sigma + \frac{S^2 - \sigma^2}{2\sigma}\Big) \\ &= \mathrm{Var}\Big(\frac{S^2}{2\sigma}\Big) \\ &= \frac{1}{4\sigma^2}\mathrm{Var}(S^2) \\ &= \sigma^2 \frac{\kappa - 1}{4n} \end{align}$$On the last line we have used our large \(n\) approximation for \(\mathrm{Var}(S^2)\) because we already put ourselves in that regime in order to use the Taylor expansion approximation.
Taking the square-root and dividing by the population standard deviation yields
$$\varepsilon(S) = \sqrt{\frac{\kappa - 1}{4n}}\qquad\qquad n\gg 1 \label{err_S}\tag{3}$$For a Gaussian, where \(\kappa = 3,\) this tells us
\[\varepsilon(S) = \frac{1}{\sqrt{2n}}\]The argument above is a bit sloppy, I have not said under what conditions the first order approximation equation \eqref{S_large_n} is valid, nor proven when its variance is equal to the variance we seek (i.e. that of the unapproximated \(S\).) To make the argument rigorous see the “Delta Method”.
Sums of IID Random Variables
Let \(X_i\) be iid random variables with mean and standard deviation \(\mu_X\) and \(\sigma_X\) respectively. Define
$$Y = \sum_{i=1}^n X_i$$We know
$$\mathbb E[Y] = n\mu_X \equiv \mu_Y \qquad\text{and}\qquad \mathrm{Var}(Y) = n\sigma_X^2 \equiv \sigma_Y^2 $$Further, we get a similar relation for the kurtosis
$$ \kappa_Y - 3 = \frac{1}{n}(\kappa_X - 3) $$Sanity Check
As \(n\) grows, the central limit-theorem tells us that \(Y\) approaches being normally distributed, so we expect \(\kappa_Y \rightarrow 3.\) This is indeed what we see for this relationship.
We want to estimate the underlying standard deviation \(\sigma_X\) from observations of \(Y.\) We know \(\sigma_X = \sigma_Y / \sqrt{n},\) so we can use \(S_X = S_Y / \sqrt{n}\) as an estimate. Say we have \(m\) observations of \(Y,\) and \(m \gg 1,\) then from above we know
$$\std{S_Y} = \sigma_Y \sqrt{\frac{\kappa_Y - 1}{4m}}$$Substituting in our known relationships we get
$$\std{S_Y} = \sqrt{n}\,\sigma_X \sqrt{\frac{(\kappa_X-3)/n + 2}{4m}}$$we know
$$\std{S_X} = \std{\frac{S_Y}{\sqrt n}} = \frac{1}{\sqrt n}\std{S_Y}$$so
$$\std{S_X} = \sigma_X \sqrt{\frac{(\kappa_X-3)/n + 2}{4m}} \label{S_err_nm}\tag{4}$$The kurtosis of \(X_i\) can either reduce the standard error on our estimate (for \(\kappa_X < 3\)) or increase it (for \(\kappa_X > 3\).)
Sanity Check
For \(n = 1\) we recover the result we would have got by applying equation \eqref{err_S} directly to one \(X_i,\) as we should.
The dependence on \(n\) arises purely because of the effect of the summation on the overall kurtosis of \(Y.\) Additionally as \(n\) grows the kurtosis of \(X\) affects the standard error less. We should expect this, because the central limit-theorem tells us that \(Y\) approaches a normal distribution in the limit \(n \rightarrow \infty.\) Note that for zero excess kurtosis (\(\kappa_X = 3\)) the standard error of \(S_X\) does not depend on \(n\) anyway.
Volatility Estimates
Now imagine a process
$$Z_t = \sum_{i=1}^t X_i$$Let’s say we sample this process every \(n\) time units, i.e \( Z_{jn}.\) The differences of this process give us back a series of iid realizations of \(Y:\)
$$Z_{(j+1)n} - Z_{jn} = \sum_{i=jn+1}^{jn+n} X_i \equiv Y_{j}$$If we have observed up to time \(t = mn\) then we have \(m\) observations of \(Y,\) each consisting of the sum of \(n\) draws of \(X\). Thus, we can use \(S_X = S_Y / \sqrt{n}\) to estimate the volatility, and we can directly apply our formula equation \eqref{S_err_nm} to calculate the error on this estimate.
Effect of the Sampling Period
A natural question to ask is; for a fixed observation time, how does sampling frequency affect the accuracy of our estimate? The number of samples \(m\) is related to the total time \(t\) and sampling period \(n\) by \(m = t/n.\) Thus
$$\varepsilon = \frac{\Std{S_x}}{\sigma_X} = \sqrt{\frac{(\kappa_X - 3) + 2n}{4t}} \label{S_err_tn}\tag{5}$$Firstly, let’s note that
$$\varepsilon \propto \frac{1}{\sqrt{t}}$$So, as one might hope, sampling for a longer period always gives a better estimate. How about for the sampling period \(n?\)
Define \(c = (\kappa_X - 3)/2\). Remember \(\kappa_X \geq 1\) so \(c \geq -1,\) also note \(n \geq 1.\) Then we can write
$$\varepsilon \propto \sqrt{c + n}$$Because the square-root function is monotonically increasing, this tells us it is always beneficial to reduce \(n,\) i.e. to increase the sampling frequency. In fact, because the square-root function is concave, the smaller \(n\) is, the greater the (relative) benefit in reducing it further.
However, as we increase the kurtosis, note that the overall benefit of increasing sampling frequency, decreases. This is because we move into a flatter part of the square-root curve. It should be clear visually, but you can also show it by calculating the gradient of \(\varepsilon\) wrt \(n\) and finding
$$\frac{\partial\varepsilon}{\partial n} \propto \frac{1}{\sqrt{c+n}}$$Not only does increasing kurtosis decrease the benefit of sampling faster, it increases the errors we will see at the very fastest time-scales, for the same reason. As \(n\) tends to zero we do not approach zero error for positive excess kurtosis.
Effect of the Sample Count
We can similarly ask, still for fixed \(t,\) how the standard error scales with the number of samples. By substituting \(n = t/m\) into our formula above, and rearranging, we get
$$\varepsilon = \frac{1}{\sqrt{2}}\sqrt{\frac{c}{t} + \frac{1}{m}}$$As above, from this it’s clear that increasing the sample count decreases the error but that we approach a non-zero error for an infinite number of samples and positive excess kurtosis.
We can also calculate the gradient of this wrt \(m,\) but it’s not as clean as the gradient wrt \(n,\) above. In this case, let’s do the following; say we have a number of samples \(m,\) what happens to the standard error if we increase (or decrease) this by a factor \(f,\) giving \(m^\prime = mf\)? We find
$$\frac{\varepsilon^\prime}{\varepsilon} = \sqrt{\frac{cm + 2t / f}{cm + 2t}}$$For \(\kappa_X=3\) this gives \(\varepsilon^\prime/\varepsilon = 1/\sqrt{f}.\) This makes sense, \(f\) is directly related to the number of samples. The relative benefit of one more sample, in this case, is
$$\frac{\varepsilon - \varepsilon^\prime}{\varepsilon} = 1 - \sqrt{\frac{m}{m + 1}}$$This follows because, for one extra sample, \(f = 1+1/m\). This function approaches zero as \(\sim m^{-1},\) i.e. as we get more samples the relative benefit of adding more drops away. (Though actually fairly slowly.)
If \(\kappa_X > 0\) then we find, for large \(m\), that the relative benefit instead goes as \(\sim m^{-2} / c\), meaning it drops off faster than in the mesokurtic case. Further, the higher the kurtosis, the smaller the benefit for each increase in sample count.
Despite this, the point above stands, it is always beneficial (at least mathematically) to sample more frequently. You may have other reasons that this becomes a trade-off; e.g. ability to fit your dataset in memory or something.
Empirical Verification
Let’s empirically test our formula to check it’s correct.
For a set time \(T\) we draw a realization of \(\{Z_t: t\leq T\}\), then for a given sampling period \(n\) we slice each realization into \(m = \lfloor T/n \rfloor\) samples and use \(S_X = S_Y/\sqrt{n}\) to estimate the volatility. We repeat this process until we have \(D\) estimates of our volatility, then calculate the empirical standard deviation of these estimates. We can compare this to our formula equation \eqref{S_err_nm}.
Repeating this for all sampling periods in the range \(1 \leq n < 150\) gives us a dataset we can plot:
We see that in the large \(m\) limit we get good agreement between equation \eqref{S_err_nm} and the observed values. (Remember, small \(n\) means higher frequency and so more samples.) As the sampling period increases, so \(m\) decreases and we start to see that the theoretical value is not such a good estimate. Here \(T = 1000\), therefore at the upper end of the graph we have only \(\lfloor 1000 / 150 \rfloor = 6\) samples. Unsurprisingly for such a small number our “large N” approximation breaks down!
The code that produced these plots is available here.