Standard Errors of Volatility Estimates

$$ \DeclareMathOperator{\Std}{Std} \DeclareMathOperator{\Var}{Var} \newcommand{\std}[1]{\Std\!\left(#1\right)} \newcommand{\var}[1]{\Var\!\left(#1\right)} \newcommand{\EE}[1]{\mathbb{E}\left[#1\right]} $$

Sample Variance


The sample variance is given by the formula

$$ S^2 = \frac{1}{n-1}\sum_{i=1}^n (X_i - \bar X)^2 \label{sample_var}\tag{1} $$

where \(\bar X\) is the sample mean

$$ \bar X = \frac{1}{n} \sum_{i=1}^n X_i $$

Equation \eqref{sample_var} is an unbiased estimator for the population variance \(\sigma^2\), meaning

$$\EE{S^2} = \sigma^2$$

$$\begin{align} \EE{S^2} &= \frac{1}{n-1}\sum_{i=1}^n\mathbb E[(X_i - \bar X)^2] \\ &= \frac{n}{n-1}\EE{X_i^2 - 2X_i\bar X + \bar X^2}\\ &= \frac{n}{n-1}\EE{X_i^2 - \frac{2}{n}\sum_{j=1}^n X_i X_j + \frac{1}{n^2}\sum_{j=1}^n\sum_{k=1}^n X_jX_k} \\ &= \frac{n}{n-1}\EE{X_i^2 - \frac{2}{n}\Big(X_i^2 + \sum_{j\neq i} X_i X_j\Big) + \frac{1}{n^2}\sum_{j=1}^n\Big(X_j^2 + \sum_{k\neq j} X_jX_k \Big)} \\ &= \frac{n}{n-1}\Big(\frac{N - 1}{n}\EE{X_i^2} - \frac{n-1}{n} \EE{X_i}^2\Big) \\ &= \EE{X_i^2} - \EE{X_i}^2 \\ &= \sigma^2 \end{align}$$

Where we have used the fact that the \(X_i\) are iid. so \(\mathbb E[X_i X_j] = \mathbb E[X_i]\mathbb E[X_j] = \mathbb E[X_i]^2\) for \(i\neq j.\)

The variance of the sample variance is given by external link

$$\var{S^2} = \frac{\sigma^4}{n}\left(\kappa - 1 + \frac{2}{n-1}\right)$$

where \(\kappa\) is the population kurtosis external link (raw, not excess!) Note \(\kappa \geq 1\) always.

The standard error of the sample variance is

$$\varepsilon({S^2})=\frac{\std{S^2}}{\sigma^2} = \sqrt{\var{S^2}} = \sqrt{\frac{\kappa-1}{n} + \frac{2}{n^2-n}}$$

For large \(n\) this becomes

$$\varepsilon(S^2) \approx \sqrt{\frac{\kappa-1}{n}}$$

Ultimately the standard error of the sample variance scales as \(\varepsilon \sim n^{-1/2},\) similarly to the standard error of the sample mean.

Sample Standard Deviation


The population standard deviation - \(\sigma\) - is given by the square root of the population variance. Similarly, the sample standard deviation - \(S\) - is given by the square root of the sample variance.

We know that \(S^2\) is an unbiased estimator for \(\sigma^2,\) so Jensen’s inequality external link tells us that \(S\) must be a biased estimator for \(\sigma;\) specifically it is an underestimate. Jensen’s inequality states that, for any concave function \(\phi,\) we have

$$\mathbb E[\phi(X)] \leq \phi(\mathbb E[X])$$

The square-root function is concave and so

$$\mathbb E[S] = \mathbb E[\sqrt{S^2}] \,\,\leq\,\, \sqrt{\mathbb E[S^2]} = \sqrt{\sigma^2} = \sigma$$

Although \(S\) is a biased estimator for \(\sigma,\) it is consistent external link . That is, as \(n \rightarrow \infty\) we do get \(S \rightarrow \sigma.\)

What is the standard error on the sample standard deviation? Jensen’s inequality makes it hard to derive any fully general results. However, for large \(n\) we can derive a result.

Note that because \(\varepsilon \sim n ^{-1/2}\) as \(n\) gets large the distribution of \(S^2\) gets tighter around \(\sigma^2.\) Taylor expand \(\sqrt{S^2}\) around \(\sigma^2:\)

$$ \sqrt{S^2} = \sqrt{\sigma^2} + \frac{S^2 - \sigma^2}{2\sqrt{\sigma^2}} + O\big((S^2 - \sigma^2)^2\big) $$

Keeping only the first order term, we get

$$S = \sigma + \frac{S^2 - \sigma^2}{2\sigma} \label{S_large_n}\tag{2} $$

Now we can calculate the variance of this first order estimate of \(S:\)

$$\begin{align} \mathrm{Var}(S) &= \mathrm{Var}\Big(\sigma + \frac{S^2 - \sigma^2}{2\sigma}\Big) \\ &= \mathrm{Var}\Big(\frac{S^2}{2\sigma}\Big) \\ &= \frac{1}{4\sigma^2}\mathrm{Var}(S^2) \\ &= \frac{1}{4\sigma^2}\mathrm{Var}(S^2) \\ &= \sigma^2 \frac{\kappa - 1}{4n} \end{align}$$

On the last line we have used our large \(n\) approximation for \(\mathrm{Var}(S^2)\) because we already put ourselves in that regime in order to use the Taylor expansion approximation.

Taking the square-root and dividing by the population standard deviation yields

$$\varepsilon(S) = \sqrt{\frac{\kappa - 1}{4n}}\qquad\qquad n\gg 1 \label{err_S}\tag{3} $$

For a Gaussian, where \(\kappa = 3,\) this tells us

\[\varepsilon(S) = \frac{1}{\sqrt{2n}}\]

The argument above is a bit sloppy, I have not said under what conditions the first order approximation equation \eqref{S_large_n} is valid, nor proven when its variance is equal to the variance we seek (i.e. that of the unapproximated \(S\).) To make the argument rigorous see the “Delta Method” external link .

Sums of IID Random Variables


Let \(X_i\) be iid random variables with mean and standard deviation \(\mu_X\) and \(\sigma_X\) respectively. Define

$$Y = \sum_{i=1}^n X_i$$

We know

$$\mathbb E[Y] = n\mu_X \equiv \mu_Y \qquad\text{and}\qquad \mathrm{Var}(Y) = n\sigma_X^2 \equiv \sigma_Y^2 $$

Further, we get a similar relation for the kurtosis

$$ \kappa_Y - 3 = \frac{1}{n}(\kappa_X - 3) $$
Sanity Check

As \(n\) grows, the central limit-theorem tells us that \(Y\) approaches being normally distributed, so we expect \(\kappa_Y \rightarrow 3.\) This is indeed what we see for this relationship.

We want to estimate the underlying standard deviation \(\sigma_X\) from observations of \(Y.\) We know \(\sigma_X = \sigma_Y / \sqrt{n},\) so we can use \(S_X = S_Y / \sqrt{n}\) as an estimate. Say we have \(m\) observations of \(Y,\) and \(m \gg 1,\) then from above we know

$$\std{S_Y} = \sigma_Y \sqrt{\frac{\kappa_Y - 1}{4m}}$$

Substituting in our known relationships we get

$$\std{S_Y} = \sqrt{n}\,\sigma_X \sqrt{\frac{(\kappa_X-3)/n + 2}{4m}}$$

we know

$$\std{S_X} = \std{\frac{S_Y}{\sqrt n}} = \frac{1}{\sqrt n}\std{S_Y}$$

so

$$\std{S_X} = \sigma_X \sqrt{\frac{(\kappa_X-3)/n + 2}{4m}} \label{S_err_nm}\tag{4} $$

The kurtosis of \(X_i\) can either reduce the standard error on our estimate (for \(\kappa_X < 3\)) or increase it (for \(\kappa_X > 3\).)

Sanity Check

For \(n = 1\) we recover the result we would have got by applying equation \eqref{err_S} directly to one \(X_i,\) as we should.

The dependence on \(n\) arises purely because of the effect of the summation on the overall kurtosis of \(Y.\) Additionally as \(n\) grows the kurtosis of \(X\) affects the standard error less. We should expect this, because the central limit-theorem tells us that \(Y\) approaches a normal distribution in the limit \(n \rightarrow \infty.\) Note that for zero excess kurtosis (\(\kappa_X = 3\)) the standard error of \(S_X\) does not depend on \(n\) anyway.

Volatility Estimates


Now imagine a process

$$Z_t = \sum_{i=1}^t X_i$$

Let’s say we sample this process every \(n\) time units, i.e \( Z_{jn}.\) The differences of this process give us back a series of iid realizations of \(Y:\)

$$Z_{(j+1)n} - Z_{jn} = \sum_{i=jn+1}^{jn+n} X_i \equiv Y_{j}$$

If we have observed up to time \(t = mn\) then we have \(m\) observations of \(Y,\) each consisting of the sum of \(n\) draws of \(X\). Thus, we can use \(S_X = S_Y / \sqrt{n}\) to estimate the volatility, and we can directly apply our formula equation \eqref{S_err_nm} to calculate the error on this estimate.

Effect of the Sampling Period

A natural question to ask is; for a fixed observation time, how does sampling frequency affect the accuracy of our estimate? The number of samples \(m\) is related to the total time \(t\) and sampling period \(n\) by \(m = t/n.\) Thus

$$\varepsilon = \frac{\Std{S_x}}{\sigma_X} = \sqrt{\frac{(\kappa_X - 3) + 2n}{4t}} \label{S_err_tn}\tag{5} $$

Firstly, let’s note that

$$\varepsilon \propto \frac{1}{\sqrt{t}}$$

So, as one might hope, sampling for a longer period always gives a better estimate. How about for the sampling period \(n?\)

Define \(c = (\kappa_X - 3)/2\). Remember \(\kappa_X \geq 1\) so \(c \geq -1,\) also note \(n \geq 1.\) Then we can write

$$\varepsilon \propto \sqrt{c + n}$$

Because the square-root function is monotonically increasing, this tells us it is always beneficial to reduce \(n,\) i.e. to increase the sampling frequency. In fact, because the square-root function is concave, the smaller \(n\) is, the greater the (relative) benefit in reducing it further.

However, as we increase the kurtosis, note that the overall benefit of increasing sampling frequency, decreases. This is because we move into a flatter part of the square-root curve. It should be clear visually, but you can also show it by calculating the gradient of \(\varepsilon\) wrt \(n\) and finding

$$\frac{\partial\varepsilon}{\partial n} \propto \frac{1}{\sqrt{c+n}}$$

Not only does increasing kurtosis decrease the benefit of sampling faster, it increases the errors we will see at the very fastest time-scales, for the same reason.

Effect of the Sample Count

We can similarly ask, still for fixed \(t,\) how the standard error scales with the number of samples. By substituting \(n = t/m\) into our formula above, and rearranging, we get

$$\varepsilon = \frac{1}{2}\sqrt{\frac{\kappa_X - 3}{t} + \frac{2}{m}}$$

We can also calculate the gradient of this wrt \(m,\) but it’s not as clean as the gradient wrt \(n,\) above. In this case, let’s do the following; say we have a number of samples \(m,\) what happens to the standard error if we increase (or decrease) this by a factor \(f,\) giving \(m^\prime = mf\)? We find

$$\frac{\varepsilon^\prime}{\varepsilon} = \sqrt{\frac{cm + 2t / f}{cm + 2t}}$$

For \(\kappa_X=3\) this gives \(\varepsilon^\prime/\varepsilon = 1/\sqrt{f}.\) This makes sense, \(f\) is directly related to the number of samples. The relative benefit of one more sample, in this case, is

$$\frac{\varepsilon - \varepsilon^\prime}{\varepsilon} = 1 - \sqrt{\frac{m}{m + 1}}$$

This follows because, for one extra sample, \(f = 1+1/m\). This function approaches zero as \(\sim m^{-1},\) i.e. as we get more samples the relative benefit of adding more drops away. (Though actually fairly slowly.)

If \(\kappa_X > 0\) then we find, for large \(m\), that the relative benefit instead goes as \(\sim m^{-2} / c\), meaning it drops off faster than in the mesokurtic case. Further, the higher the kurtosis, the smaller the benefit for each increase in sample count.

Despite this, the point above stands, it is always beneficial (at least mathematically) to sample more frequently. You may have other reasons that this becomes a trade-off; e.g. ability to fit your dataset in memory or something.

Empirical Verification


Let’s empirically test our formula to check it’s correct.

For a set time \(T\) we draw a realization of \(\{Z_t: t\leq T\}\), then for a given sampling period \(n\) we slice each realization into \(m = \lfloor T/n \rfloor\) samples and use \(S_X = S_Y/\sqrt{n}\) to estimate the volatility. We repeat this process until we have \(D\) estimates of our volatility, then calculate the empirical standard deviation of these estimates. We can compare this to our formula equation \eqref{S_err_nm}.

Repeating this for all sampling periods in the range \(1 \leq n < 150\) gives us a dataset we can plot:

A graph showing how the standard error on our volatility estimate decreases as we decrease the sampling period for both a Gaussian and Student’s T distribution.
Effect of sampling period on the accuracy of our volatility estimate.

We see that in the large \(m\) limit we get good agreement between equation \eqref{S_err_nm} and the observed values. (Remember, small \(n\) means higher frequency and so more samples.) As the sampling period increases, so \(m\) decreases and we start to see that the theoretical value is not such a good estimate. Here \(T = 1000\), therefore at the upper end of the graph we have only \(\lfloor 1000 / 150 \rfloor = 6\) samples. Unsurprisingly for such a small number our “large N” approximation breaks down!

The code that produced these plots is available here.