Order Statistics
October 2024Several years ago I needed to calculate the marginal distribution of the $i$-th largest sample from $N$ iid samples of a random variable. I was not aware at the time that this is the so-called order statistic , so I ended up deriving it myself. Additionally I also derived the conditional & joint distributions of the order statistics. This is the derivation I came up with.
Marginal Distributions
Cumulative Distribution Function
Let $X_i$, $i=1,\ldots,N$ be $N$ i.i.d random variables with cumulative distribution function $F_X$. Denote the $m$-th smallest of $\{X_i\}$ by $Y_m$; that is the smallest will be $Y_1$ and the largest will be $Y_N$. We wish to know what the marginal cumulative distributions of each $Y_m$ are
$$F_i(y) = \mathbb P(Y_i \leq y)$$The first thing to note is that for any random variable $X$ the value of the CDF at $X$ is uniformly distributed. That is $F_X(X) \sim U[0, 1]$. The inverse is also true; the inverse of the CDF applied to a uniform RV is distributed like $X$, that is $F^{-1}_X(U) \sim F_X$. This means in the following we can work entirely with unifrom random variables $U_i$, then transform as appropriate for our desired distribution, taking care to appropriately rescale probability densities.
To start, we define a new set of random variables
$$ Z_i = \begin{cases} 1 & \text{if } U_i \leq y \\ 0 & \text{otherwise}\end{cases} $$Each $Z_i$ is a Bernoulli random variable with chance of success $y$. The sum of the $Z_i$s is thus Binomially distributed ;
$$ \zeta = \sum_{i=1}^N Z_i \sim \mathrm{Binom}\big(N, y\big). $$The event $Y_i \leq y$ implies that $\zeta \geq i$ because for $Y_i \leq y$ we need at least $i$ of the $U_i$ to be $\leq y$. Thus we have
$$\begin{align} F_i(y) = \mathbb P(Y_i \leq y) &= \mathbb P(\zeta \geq i) \notag\\ &= 1 - \mathbb P(\zeta \leq i-1) \label{eq:marginal_cdf_1}\tag{1} \end{align}$$The CDF of the binomial distribution can be written in terms of the the incomplete Beta function giving
$$ F_i(y) = 1 - I_{1-y}(N-i+1,i) $$It can be seen form the definition of the $I$ that $I_y(a, b)\equiv 1-I_{1-y}(b, a)$ so finally
$$ F_i(y) = I_{y}(i, N-i+1) $$The incomplete Beta function is also the CDF of a Beta distributed random variable. This means that $Y_i$ is Beta distribted for uniform RVs, and for arbitrary $X$ we have
$$ Y_i \sim F_X^{-1}(\mathrm{Beta}(i, N-i+1)) $$which has CDF
$$ \boxed{ F_i(y) = I_{F_X(y)}(i, N-i+1) } \label{eq:marginal_cdf_y}\tag{2} $$Example - Uniform Distribution
As an example, consider $N$ draws from a uniform distribution on the unit interval, $X_i \sim U[0,1]$. Since the $X_i$s are uniform we have $F_X$ is the identity function on $[0, 1]$, and so $F^{-1}_X(f) = f$, which means the $Y_i$s are just directly Beta distributed.
The following figure shows the marginal distributions of $Y_i$ for $N=6$. The coloured lines show the results from drawing 1,000,000 samples and the dashed lines show the theoretical value.
Probability Density Functions
We can calculate the marginal density functions by differentiating the marginal CDFs for the $Y_i$s. However, we must take care about the mapping from $U \rightarrow X$ since this will cause a rescaling.
$$\begin{align} f_i(y) &= \frac{d I_p(i, N-i+1)}{dp}\frac{dp}{dy}\qquad\qquad p=F_X(y) \notag\\ &= \frac{F_X(y)^{i-1}\big(1 - F_X(y)\big)^{N-i}}{B(i, N-i+1)} f_X(y) \end{align}$$The first term is the pdf of a Beta distributed random variable and the second is our rescaling arising from $F_X$ not being uniform.
Example - Uniform Distribution
Looking again at the example of $N=6$ draws from $U[0,1]$ we find that since $F_X(y) = y$ (and thus $f_X(y)=1$) that we get
$$ f_i(y) = f_B(y;\,i, N-i+1) = (N-i+1)\binom{N}{i-1} y^{i-1}(1-y)^{N-i} $$As above, the following figure shows the marginal pdf of $Y_i$ for $N=6$ compared against their theoretical values.
Pairwise Conditional Distributions
Next we may ask ourselves how knowing one or more of the order statistics affects the distribution of the others. Let’s say we are interested in finding the distribution $Y_i$ given the value of $Y_j$, where $j < i$. The situation where $j > i$ is completely analgous by symmetry.
The first thing we can note is that if we know multiple $Y_j$s, only the highest has any affect on the distributions of $Y_i$. Thus it suffices to consdier a single $Y_j$. Given a concrete value $Y_j = y_j$ we know that $Y_i \geq y_j$ by definition. Crucially this is all we know about $Y_i$, other than that it is still drawn from $U$ thus we can define
$$ u_j(y) = \frac{y - y_j}{1 - y_j}\qquad y\geq y_j $$which is the CDF of the distribution that all the larger $U_i$s are drawn from. It is just the original uniform CDF but renormalized to the region $y\geq y_j$.
This means we have $N-j$ draws from the distribution $u_j$ and $N-j$ order statistics $Y_i|Y_j$. So our conditional $i$-th order statistic is in effect the $(i-j)$-th unconditional order statistic of $N-j$, drawn from $u_j$. We can directly apply our results above. Nice!
$$F_{Y_i|Y_j=y_j}(y) = I_{u_j(y)}(i-j, N-i+1)$$The second parameter remains unchanged because both $N$ and $i$ decrease; $(N-j) - (i-j) = N-i$. Which can be understood intuitively because $i$ is still as far from the end as it was before the conditioning.
Thus the conditional distribution is a Beta distribution but now rescaled over the interval $[y_j, 1]$, and with parameters $N-i+1$ and $i-j$. This gives us the pdf as for draws from $U$
$$f(y_i|y_j) = \frac{u_j(y_i)^{i-j}\big(1 - u_j(y_i)\big)^{N-i}}{B(i-j, N-i+1)} \frac{1}{1 - y_j}$$For draws from $X$ we need to replace all occurrences of $y$ (even inside $u_j$) with $F_X(y)$. This gains us another term from the chain-rule equal to $dF_X/dy = f_X$. Thus the conditional distribution for arbitrary $X$ is
$$\boxed{ f(y_i|y_j) = \frac{u_j(y_i)^{i-j}\big(1 - u_j(y_i)\big)^{N-i}}{B(i-j, N-i+1)} \frac{f_X(y)}{1 - F_X(y_j)} }$$Joint Distribution
Pairwise Joint Distribution
We can simply transform the pairwise conditional distribution to a joint distribution using the definition of conditional probability
$$f(y_i, y_j) = f(y_i | y_j) f(y_j)$$Plugging in the results for the pdfs gives
$$ f(y_i, y_j) = \frac{u_j(y_i)^{i-j-1}\big(1 - u_j(y_i)\big)^{N-i}}{B(i-j, N-i+1)} \frac{1}{1 - y_j} \frac{y_j^{j-1}(1-y_j)^{N-j}}{B(j, N-j+1)} $$From the definition of $u_j$ we have
$$1 - u_j(y_i) = \frac{1 - y_i}{1 - y_j}$$so (excluding constant factors for now for brevity)
$$ f(y_i, y_j) \propto \left(\frac{y_i - y_j}{1 - y_j}\right)^{i-j-1}\left(\frac{1-y_i}{1-y_j}\right)^{N-i} \frac{y_j^{j-1}(1-y_j)^{N-j}}{1 - y_j} $$Collecting powers of like terms simplies this to
$$ f(y_i, y_j) \propto (y_i - y_j)^{i-j-1} (1-y_i)^{N-i} y_j^{j-1} $$Next, the constant factor can also be tackled by using the identity
$$B(a, b) = \frac{(a-1)!(b-1)!}{(a+b-1)!}\qquad a, b \in \mathbb{Z}$$yielding
$$B(i-j, N-i+1)B(j, N-j+1) = \frac{(i-j-1)!\,(N-i)!\vphantom{\cancel{(N)!}}}{\cancel{(N-j)!}}\frac{(j-1)!\cancel{(N-j)!}}{N!\vphantom{\cancel{(N)!}}}$$Thus our final result is
$$\boxed{ f(y_i, y_j) = N! \frac{(y_i - y_j)^{i-j-1}}{(i-j-1)!}\frac{(1-y_i)^{N-i}}{(N-i)!} \frac{y_j^{j-1}}{(j-1)!} }$$Since our example of $N=6$ has a relatively small number of pairs we can plot the joint distribution for all of them.
Each column has the conditioning variable $Y_j$ fixed then each row within that is a different $Y_i$. The distributions make intuitive sense, the lower $Y_{i}$s have their mass concentrated at lower values of $y_i$, and if you squint you can imagine how marginalising would give us the Beta conditional distributions above.
Full Joint Distribution
Okay, so what about the full joint distribution of the $Y_i$s? That’s not too bad to work out given we know the conditional distributions already.
We can make use of the definition of conditional probabilities & telescoping;
$$\begin{align} f(y_1, \dots y_N) &= f(y_2,\dots,y_N | y_1) f(y_1) \notag\\ &= f(y_3,\dots,y_N | y_1, y_2) f(y_2|y_1) f(y_1) \notag\\ &= f(y_N | y_1 \dots y_{N-1}) f(y_{N-1}| y_1 \dots y_{N-2}) \dots f(y_1)\notag\\ \end{align}$$But remember, due to the definition of the order statistics, if conditioning on multiple lower values only the highest matters. So we can further collapse the conditioning
$$\begin{align} f(y_1, \dots y_N) &= f(y_N | y_{N-1}) f(y_{N-1}| y_{N-2}) \dots f(y_1) \notag\\ &= f(y_1)\prod_{i=2}^N f(y_i|y_{i-1}) \end{align}$$To make things more compact we can define $f(y_1):=f(y_1|y_0)$ and $y_0 = 0$.
$$f(y_1, \dots y_N) = \prod_{i=1}^N f(y_i|y_{i-1})$$Now we can plug in our results from above for the conditional pdfs
$$\begin{align} f(y_1, \dots y_N) &= \prod_{i=1}^N \frac{\big(1 - u_{i-1}(y_i)\big)^{N-i}}{B(1, N-i+1)}\frac{1}{1-y_{i-1}}\\ &= N! \prod_{i=1}^N \frac{\big(1 - u_{i-1}(y_i)\big)^{N-i}}{1-y_{i-1}} \end{align}$$The factor of $N!$ follows because, from the definition of $B(a, b)$, we have $B(1, a) = \frac{1}{a}$, thus
$$\prod_{i=1}^N \frac{1}{B(1, N-i+1)} = \prod_{i=1}^N N-i+1 = \prod_{j=1}^N j = j!$$Writing out the $u_j$s in full, and simplifying, gives
$$\begin{align} f(y_1, \dots y_N) &= N!\prod_{i=1}^N \left(\frac{1 - y_i}{1 - y_{i-1}}\right)^{N-i}\frac{1}{1-y_{i-1}} \\ &= N! \,\, \frac{\cancel{(1-y_1)^{N-1}}}{(1-0)^{N}} \frac{\cancel{(1-y_2)^{N-2}}}{\cancel{(1-y_1)^{N-1}}}\dots\frac{(1-y_N)^0}{\cancel{(1-y_{N-1})^1}} \\ &= N! \end{align}$$Huh, the joint distribution is a constant over the space of admissible values. That’s not a mistake, Wikipedia confirms it . I suppose it makes sense since the distributions are all ultimately uniform and so any valid sequence of ${Y_i}$ is equally likely. Interesting!