Notes:Distribution of the sample median
Contents
[hide]Findings
I've found results for two sample sizes, n=3 and n=5, they are respectively:
- F(r)2[3−2F(r)] for n=3, and
- F(r)3[10−15F(r)+6F(r)2] for n=5
- I've experimentally verified this one
- F(r)4(−20F(r)3+70F(r)2−84F(r)+35) for n=7
Unfortunately it seems prior results are of no help
- F(r)5(70F(r)4−315F(r)3+540F(r)2−420F(r)+126) PREDICTED for n=9
Important results
- P[Median(X1,…,X2m+1)≤r]=P[X1≤⋯≤Xm+1≤r | X1≤⋯≤X2m+1]
- =P[X1≤⋯≤Xm+1≤Min(r,Xm+2)≤Xm+2≤Xm+3≤⋯≤X2m+1]1(2m+1)!
- =((2m+1)!)P[X1≤⋯≤Xm+1≤Min(r,Xm+2)≤Xm+2≤Xm+3≤⋯≤X2m+1]
- =limt→+∞(((2m+1)!)P[X1≤⋯≤Xm+1≤Min(r,Xm+2)≤Xm+2≤Xm+3≤⋯≤X2m+1≤t])
- =(2m+1)!m!limt→+∞[∫t−∞f(x2m+1)(∫x2m+1−∞f(x2m)(⋯∫xm+3−∞f(xm+2)(∫Min(r,xm+2)−∞f(xm+1)F(xm+1)mdxm+1)dxm+2⋯)dx2m)dx2m+1]
- =P[X1≤⋯≤Xm+1≤Min(r,Xm+2)≤Xm+2≤Xm+3≤⋯≤X2m+1]1(2m+1)!
Problem overview
Let X1,…,X2m+1 be a sample from a population X, meaning that the Xi are i.i.d random variables, for some m∈N0. We wish to find:
- P[Median(X1,…,X2m+1)≤r]- the Template:Cdf of the median.
Initial work
Since the variables are independent then any ordering is as likely as any other (which I proved the long way, rather than just jumping to 1(2m+1)!
I believe the P[Median(X1,…,X2m+1)≤r]=P[X1≤⋯≤Xm+1≤r | X1≤⋯≤X2m+1]. Let us make some definitions to make this shorter.
- O:=X1≤⋯≤X2m+1 - representing the order part
- M:=X1≤⋯≤Xm+1≤r - representing the median part
- Q:=P[Median(X1,…,X2m+1)≤r]=P[M | O] - representing the question
We should also have some sort of converse, related to r≤Xm+2≤⋯X2m+1 or something.
We also have:
- An expression for P[X1≤⋯≤Xn≤r] from Probability of i.i.d random variables being in an order and not greater than something
- It's =1n!FX(r)n
- It's =1n!FX(r)n
Analysis
Let us look at X≤r and X≤Y to see what we can say if both are true (the "and")
- Claim: (X≤r∧X≤Y)⟺(X≤Min(r,Y))
- Proof:
- ⟹
- Suppose r≤Y, so Min(r,Y)=r, obviously X≤r ⟹ X≤r=Min(r,Y), so the implication holds in this case
- Suppose Y≤r, so Min(r,Y)=Y, obviously X≤Y ⟹ X≤Y=Min(r,Y), so the implication holds in this case too.
- ⟸
- We notice either Min(r,Y)=r if r≤Y, or Min(r,Y)=Y if Y≤r (slightly modify the language for the equality, it doesn't matter though really)
- Thus if r≤Y then X≤r and as r≤Y by assumption, we use the transitivity of ≤ to see X≤r≤Y thus X≤Y too - as required
- Thus if Y≤r then X≤Y and as Y≤r by assumption, we use the transitivity of ≤ to see X≤Y≤r and thus X≤r too - as required.
- So in either case, we have X≤Y and X≤r - as required
- We notice either Min(r,Y)=r if r≤Y, or Min(r,Y)=Y if Y≤r (slightly modify the language for the equality, it doesn't matter though really)
- ⟹
Problem statement
Thus we really want to find:
- P[Median(X1,…,X2m+1)≤r]=P[X1≤⋯≤Xm+1≤r | X1≤⋯≤X2m+1]
- =P[M and O]P[O]
- =((2m+1)!)P[X1≤⋯≤Xm+1≤Min(r,Xm+2)≤Xm+2≤Xm+3⋯≤X2m+1]
- Caveat:We now need: (X≤r∧X≤Y≤Z)⟹(X≤Min(r,Y)≤Y≤Z)to justify this format. Although that's arguably not that helpful for the integral.
- =P[M and O]P[O]
Initial integral
- This isn't about the median specifically, this is just looking at the specific integral.
Suppose we have a sample of length 3, X,Y,Z then we are looking at:
- P[X≤Min(r,Y)≤Y≤Z≤t] (where t will be used for a limit towards ∞ to get P[X≤Min(r,Y)≤Y≤Z] in the end), or as an integral:
- ∫t−∞f(z)(∫z−∞f(y)(∫Min(r,y)−∞f(x)dx)dy)dz
- if t>r then the minimum will get involved (for some zs anyway) and limit it to r, otherwise it'll always stay under r - of course in practice (as we'll take t→∞) this will certainly happen.
- ∫t−∞f(z)(∫z−∞f(y)(∫Min(r,y)−∞f(x)dx)dy)dz
Progression: 1
We are evaluating: P[X1≤⋯≤Xm+1≤Min(r,Xm+2)≤Xm+2≤Xm+3⋯≤X2m+1≤t]
- ∫t−∞f(x2m+1)(∫x2m+1−∞f(x2m)(⋯∫xm+3−∞f(xm+2)(∫Min(r,xm+2)−∞f(xm+1)(∫xm+1−∞f(xm)(⋯∫x2−∞f(x1)dx1⋯)dxm)dxm+1)dxm+2⋯)dx2m)dx2m+1
We operate on the inner bit:
- ∫xm+1−∞f(xm)(⋯∫x2−∞f(x1)dx1⋯)dxm=1m!F(xm+1)m
We substitute this back in to yield:
- 1m!∫t−∞f(x2m+1)(∫x2m+1−∞f(x2m)(⋯∫xm+3−∞f(xm+2)(∫Min(r,xm+2)−∞f(xm+1)F(xm+1)mdxm+1)dxm+2⋯)dx2m)dx2m+1
Conclusion of progression 1
We see here that
Progression: 2
This'll involve induction and dealing with the Min() will be "tricky", both for practice and induction we will consider the special cases m=1 and m=2 by evaluating:
- m=1 yields I1:=11!∫t−∞f(x3)(∫Min(r,x3)−∞f(x2)F(x2)dx2)dx3, by case analysis:
- if t≤r then x3≤t≤r or x3≤r over the entire domain of interest, so Min(r,x3)=x3 over the entire domain, giving:
- I1=11!∫t−∞f(x3)(∫x3−∞f(x2)F(x2)dx2)dx3
- We now use the corollary below to see:
- I1=12!∫t−∞f(x3)F(x3)2dx3
- =13!F(t)3
- =13!F(t)3
- I1=12!∫t−∞f(x3)F(x3)2dx3
- We now use the corollary below to see:
- I1=11!∫t−∞f(x3)(∫x3−∞f(x2)F(x2)dx2)dx3
- if t≥r then we split (−∞,t] into (−∞,r) and [r,t], giving:
- I1=11![∫r−∞f(x3)(∫Min(r,x3)−∞f(x2)F(x2)dx2)dx3+∫trf(x3)(∫Min(r,x3)−∞f(x2)F(x2)dx2)dx3]
- =11![∫r−∞f(x3)(∫x3−∞f(x2)F(x2)dx2)dx3+∫trf(x3)(∫r−∞f(x2)F(x2)dx2)dx3]
- We now use the required corollary immediately below to yield:
- I1=11![∫r−∞f(x3)⋅12F(x3)2dx3+∫trf(x3)⋅12F(r)2dx3]
- =12![13F(r)3+F(r)2∫trf(x3)dx3], note that: ∫trf(x)dx=∫t−∞f(x)dx−∫r−∞f(x)dx=F(t)−F(r)
- =12!F(r)2[13F(r)+(F(t)−F(r))], note that: F(t)−F(r)=3F(t)−3F(r)3which we'll use next
- =12!F(r)2[3F(t)−2F(r)3]
- =13!F(r)2(3F(t)−2F(r))
- I1=11![∫r−∞f(x3)⋅12F(x3)2dx3+∫trf(x3)⋅12F(r)2dx3]
- =11![∫r−∞f(x3)(∫x3−∞f(x2)F(x2)dx2)dx3+∫trf(x3)(∫r−∞f(x2)F(x2)dx2)dx3]
- I1=11![∫r−∞f(x3)(∫Min(r,x3)−∞f(x2)F(x2)dx2)dx3+∫trf(x3)(∫Min(r,x3)−∞f(x2)F(x2)dx2)dx3]
- if t≤r then x3≤t≤r or x3≤r over the entire domain of interest, so Min(r,x3)=x3 over the entire domain, giving:
It is clear that as t→∞ that we end up with I1=13!F(r)2(3−2F(r))
Thus: P[X1≤X2≤Min(r,X3)≤X3]=13!F(r)2(3−2F(r))
Finally:
- P[X1≤X2≤r | X1≤X2≤X3]=F(r)2(3−2F(r))
Required corollary
Recall from Probability of i.i.d random variables being in an order and not greater than something that:
- 1k!∫r−∞f(x)F(x)kdx=1(k+1)!F(r)k+1
So:
- ∫r−∞f(x)F(x)kdx=1k+1F(r)k+1
By applying this to above (with the x2 integrals):
- ∫r−∞f(x)F(x)1dx=12F(r)2, we then substitute this for the cases r:=r and r:=x3
We'll then apply it to the x3 integrals.
Conclusion of progression 2
- P[X1≤X2≤r | X1≤X2≤X3]=F(r)2(3−2F(r))
Progression: 3
I am now looking at m=3, which is 7 samples. To find this we evaluate:
- P[Median≤r]=7!3!limt→+∞(∫t−∞f(x7)(∫x7−∞f(x6)(∫x6−∞f(x5)(∫Min(r,x5)−∞f(x4)F(x4)3dx4)dx5)dx6)dx7)
Initial work:
- I1(x6):=∫x6−∞f(x5)(∫Min(r,x5)−∞f(x4)F(x4)3dx4)dx5={1514F(x6)5if x6≤r1514F(r)4(5F(x6)−4F(r))if x6≥r- these agree if x6=r
- I2(x7):=∫x7−∞f(x6)(∫x6−∞f(x5)(∫Min(r,x5)−∞f(x4)F(x4)3dx4)dx5)dx6=∫x7−∞f(x6)I1(x6)dx6=161514{F(x7)6if x7≤rF(r)4(10F(r)2−24F(r)F(x7)+15F(x7)2)if x7≥r- note both parts agree if r=x7 as 10+15−24=1
- I3(t)= (everything in the limit) =∫t−∞f(x7)I2(x7)dx7=17161514{F(t)7if t≤rF(r)4(−20F(r)3+70F(r)2F(t)−84F(r)F(t)2+35F(t)3)if t≥r- note these agree if t=r
- Clearly as t→+∞ we get I3(t)→17161514F(r)4(−20F(r)3+70F(r)2−84F(r)+35)as F(t)→1
- Clearly as t→+∞ we get I3(t)→17161514F(r)4(−20F(r)3+70F(r)2−84F(r)+35)
From the top of this section:
- P[Median≤r]=7!3!I3(+∞)=F(r)4(−20F(r)3+70F(r)2−84F(r)+35)
Conclusion:
- P[Median≤r]=F(r)4(−20F(r)3+70F(r)2−84F(r)+35)