Alec's c.d.f-sidestepping sampling method

From Maths
Jump to: navigation, search
See generating samples from a distribution for and overview of methods used


Let [ilmath]f(x)[/ilmath] be a p.d.f for which computing [ilmath]F^{-1}(x)[/ilmath] is either difficult, impractical, impossible or otherwise unavailable, but where [ilmath]f(x)[/ilmath] itself is easy to compute.

Suppose that either:

  • [ilmath]\exists a,b\in\mathbb{R}\big[\forall x\in[a,b]\subseteq\mathbb{R}[f(x)>0]\big][/ilmath], there exists a closed interval, [ilmath][a,b][/ilmath] for which [ilmath]f(x)>0[/ilmath] on over, or:
  • [ilmath]\forall x\in\mathbb{R}[f(x)>0][/ilmath] - [ilmath]f(x)>0[/ilmath] everywhere

then we will sample [ilmath]f(x)[/ilmath] over [ilmath][a,b][/ilmath] or [ilmath]\mathbb{R} [/ilmath]

  • herein we will use [ilmath]A:\eq[a,b][/ilmath] or [ilmath]A:\eq\mathbb{R} [/ilmath] for whichever case is chosen

Define the following function carefully:

  • Define [ilmath]h:A\rightarrow\mathbb{R} [/ilmath] such that [ilmath]\forall x\in A[h(x)>f(x)][/ilmath]

Then the algorithm proceeds as follows:


To generate [ilmath]X[/ilmath], a sample from the distribution [ilmath]f[/ilmath]:

  1. Define [ilmath]g:A\rightarrow\mathbb{R} [/ilmath] by [math]g(x):\eq\frac{h(x)}{\int_Ah(t)\mathrm{d}t} [/math] - note that this makes [ilmath]g[/ilmath] itself a p.d.f
  2. Generate [ilmath]Y[/ilmath] on [ilmath]A[/ilmath] from [ilmath]g[/ilmath]
    • if [ilmath]G:A\rightarrow [0,1]\subseteq\mathbb{R} [/ilmath], or [ilmath]G(x)[/ilmath], is the c.d.f of [ilmath]g[/ilmath] then [ilmath]G^{-1}:[0,1]\rightarrow A[/ilmath] is how we usually sample from the distribution, [ilmath]g[/ilmath], we generate a random number, say [ilmath]y[/ilmath] uniformly from [ilmath][0,1][/ilmath], then [ilmath]Y\eq G^{-1}(y)[/ilmath] is a random sample from [ilmath]g[/ilmath]
  3. Generate [ilmath]Z[/ilmath] as an (independent of [ilmath]Y[/ilmath]) uniformly on [ilmath][0,1][/ilmath]
  4. If [ilmath]Z h(Y) \le f(Y) [/ilmath] then:
    • Define [ilmath]X:\eq Y[/ilmath]
    • Goto step [ilmath]2[/ilmath]

Note that other methods may be used to generate either [ilmath]Y[/ilmath] or [ilmath]Z[/ilmath], it is only important that we generate them somehow.

It is possible (although extremely unlikely for high numbers) for this to loop a lot of times before generating a sample, that is the trade-off it makes compared with generating samples from the inverse of a cumulative distribution function. The better the [ilmath]h[/ilmath] approximation is to [ilmath]f[/ilmath] the higher the throughput of generated samples.

Also unlike generating samples from the inverse of a cumulative distribution function it requires at least 2 generated values unlike the inverse c.d.f generating method which requires only on generated uniform value on [ilmath][0,1][/ilmath] always.


Grade: B
This page requires one or more proofs to be filled in, it is on a to-do list for being expanded with them.
Please note that this does not mean the content is unreliable. Unless there are any caveats mentioned below the statement comes from a reliable source. As always, Warnings and limitations will be clearly shown and possibly highlighted if very important (see template:Caution et al).
The message provided is:
The gist is outlined
TODO: Take a picture of sheet 242
and upload



Template:Algorithm Of