Median (sample)

From Maths
Jump to: navigation, search
[ilmath]\newcommand{\P}[2][]{\mathbb{P}#1{\left[{#2}\right]} } \newcommand{\Pcond}[3][]{\mathbb{P}#1{\left[{#2}\!\ \middle\vert\!\ {#3}\right]} } \newcommand{\Plcond}[3][]{\Pcond[#1]{#2}{#3} } \newcommand{\Prcond}[3][]{\Pcond[#1]{#2}{#3} }[/ilmath]
[ilmath]\newcommand{\E}[1]{ {\mathbb{E}{\left[{#1}\right]} } } [/ilmath][ilmath]\newcommand{\Mdm}[1]{\text{Mdm}{\left({#1}\right) } } [/ilmath][ilmath]\newcommand{\Var}[1]{\text{Var}{\left({#1}\right) } } [/ilmath][ilmath]\newcommand{\ncr}[2]{ \vphantom{C}^{#1}\!C_{#2} } [/ilmath]
Stub grade: B
This page is a stub
This page is a stub, so it contains little or minimal information and is on a to-do list for being expanded.The message provided is:
Requires unifying with the taxonomy of units and such
Not to be confused with the median of a random variable

Caveat:

Median of an even number of samples

For a sample of an even number number of values, say [ilmath]x_1,\ldots,x_{2m} [/ilmath] for some [ilmath]m\in\mathbb{N}_{\ge 1} [/ilmath], and let us write [ilmath]x'_1,\ldots,x'_{2m} [/ilmath] for the sorted sample values, so [ilmath]x'_1\le x'_2\le\cdots\le x'_{2m-1}\le x'_{2m} [/ilmath], then it is convention to define the median as:

  • [math]\text{Median}(x_1,\ldots,x_{2m}):\eq\frac{x'_m+x'_{m+1} }{2} [/math] - the average of the two middle points.

However, as per Alec's taxonomy of units, we can "do Median" on just "ordered" unit types, Here there is no natural or canonical concept of "average between them" unless you map them onto some subset of the natural numbers. There are some options in this case, see Notes:Median of an ordered unit type sample.

The current options being considered are:

  1. Consider two items as the median, this means that the median is exactly 1 or 2 items, or zero should median of no samples be considered
  2. Introduce some new element, say [ilmath]a[/ilmath], which is by definition: [ilmath]x'_m < a < x'_{m+1} [/ilmath]

Definition

There is complete consensus on the median of an odd number of sample values. However for an even number of samples things are a little less clear. As per the caveat.

In what follows we shall define:

  • [ilmath]x_1,\ldots,x_n[/ilmath] as the sample, for which we may have [ilmath]n:\eq 2m+1[/ilmath] or [ilmath]n:\eq 2n[/ilmath] depending on the case, and
  • [ilmath]x'_1,\ldots,x'_n[/ilmath] meaning "the ordered sample", a permutation on the [ilmath]x_i[/ilmath] where the values have been sorted, so:
    • [ilmath]x'_1\le x'_2\le\cdots\le x'_{n-1}\le x'_n[/ilmath]

Odd sample

Let [ilmath]x_1,\ldots,x_{2m+1} [/ilmath] be given, for some [ilmath]m\in\mathbb{N}_{\ge 0} [/ilmath], then we define:

  • [ilmath]\text{Median}(x_1,\ldots,x_{2m+1}):\eq x'_{m+1} [/ilmath]


Example:

  • Say giving the sample [ilmath]x_1,\ x_2,\ x_3,\ x_4,\ x_5[/ilmath] then the median is [ilmath]x'_3[/ilmath]
  • [ilmath]\text{Median}(2,3,2,4,5)\eq\text{Median}(2,2,3,4,5)\eq 3[/ilmath]

Even sample

Warning:This is a "conventional definition" that requires [ilmath]\frac{1}{2}(a+b)[/ilmath] to be defined for sample values [ilmath]a,b[/ilmath]. There is work to be done as per the caveat above

Let [ilmath]x_1,\ldots,x_{2m} [/ilmath] be given, for some [ilmath]m\in\mathbb{N}_{\ge 1} [/ilmath], then we conventionally define:

  • [math]\text{Median}(x_1,\ldots,x_{2m}):\eq \frac{x'_m+x'_{m+1} }{2} [/math]

This requires a concept of "division by 2" and adding. This may not always be the case! See the caveat for more details.

See also

Notes