User:MarkSweep/work
The Pearson distribution is a family of continuous probability distributions. It was first published by Karl Pearson in 1895 and subsequently extended by him in 1901 and 1916 in a series of articles on biostatistics.
The Pearson system was originally devised in an effort to model visibly skewed observations. It was well known at the time how to adjust a theoretical model to fit the first two cumulants or moments of observed data: Any probability distribution can be extended straightforwardly to form a location-scale family. Except in pathological cases, a location-scale family can be made to fit the observed mean (first cumulant) and variance (second cumulant) arbitrarily well. However, it was not known how to construct probability distributions in which the skewness (standardized third cumulant) and kurtosis (standardized fourth cumulant) could be adjusted equally freely. This need became apparent when trying to fit known theoretical models to observed data that exhibited skewness. Pearson's examples include survival data, which are usually asymmetric.
In his original paper, Pearson (1895, p. 360) identified four types of distributions (numbered I through IV) in addition to the normal distribution (which was originally known as type V). The classification depended on whether the distributions were supported on a bounded interval, on a half-line, or on the whole real line; and whether they were potentially skewed or necessarily symmetric. A second paper (Pearson 1901) fixed two omissions: it redefined the type V distribution (originally just the normal distribution, but now the inverse-gamma distribution) and introduced the type VI distribution. Together the first two papers cover the five main types of the Pearson system (I, III, VI, V, and IV). In a third paper, Pearson (1916) introduced further special cases and subtypes (VII through XII).
Rhind (1909, pp. 430–432) devised a simple way of visualizing the parameter space of the Pearson system, which was subsequently adopted by Pearson (1916, plate 1 and pp. 430ff., 448ff.). The Pearson types are characterized by two quantities, commonly referred to as and . The first is the square of the skewness: where is the skewness, or third standardized moment. The second is the traditional kurtosis, or fourth standardized moment: . (Modern treatments define kurtosis in terms of cumulants instead of moments, so that for a normal distribution we have and . Here we follow the historical precedent and use .) The diagram on the right shows which Pearson type a given concrete distribution (identified by a point ) belongs to.
XXXX
Many of the skewed and/or non-mesokurtic distributions familiar to us today were still unknown in the early 1890s. What is now known as the beta distribution had been used by Thomas Bayes as a posterior distribution of the parameter of a Bernoulli distribution in his 1763 work on inverse probability. The Beta distribution gained prominence due to its membership in Pearson's system and was known until the 1940s as the Pearson type I distribution. [1] (Pearson's type II distribution is a special case of type I, but is usually no longer singled out.) The gamma distribution originated from Pearson's work (Pearson 1893, p. 331; Pearson 1895, pp. 357, 360, 373–376) and was known as the Pearson type III distribution, before acquiring its modern name in the 1930s and 1940s. [2] Pearson's 1895 paper introduced the type IV distribution, which contains Student's t-distribution as a special case, predating William Gosset's subsequent use by several years. His 1901 paper introduced the inverse-gamma distribution (type V) and the beta prime distribution (type VI).
Definition
[edit]A Pearson density p is defined to be any valid solution to the differential equation (cf. Pearson 1895, p. 381)
The Pearson family has four parameters (a0, b0, b1, b2), which can be used to freely adjust the first four moments of the distribution, subject to very few constraints. The parameter a0 determines a stationary point, and hence under some conditions a mode of the distribution, since
follows directly from the differential equation.
Since we are confronted with a linear differential equation with variable coefficients, its solution is straightforward:
The integral in this solution simplifies considerably when certain special cases of the integrand are considered. Pearson (1895, p. 367) distinguished two main cases, determined by the sign of the discriminant (and hence the number of real roots) of the quadratic function
Case 1, negative discriminant: The Pearson type IV distribution
[edit]If the discriminant of the quadratic function (2) is negative (), it has no real roots. Then define
- and
Observe that is a well-defined real number and , because by assumption and therefore . Applying these substitutions, the quadratic function (2) is transformed into
The absence of real roots is obvious from this formulation, because is necessarily positive.
We now express the solution to the differential equation (1) as a function of y:
Pearson (1895, p. 362) called this the "trigonometrical case", because the integral
involves the inverse trigonometic arctan function. Then
Finally, let
- and
Applying these substitutions, we obtain the parametric function:
This unnormalized density has support on the entire real line. It depends on a scale parameter and shape parameters and . One parameter was lost when we chose to find the solution to the differential equation (1) as a function of y rather than x. We therefore reintroduce a fourth parameter, namely the location parameter λ. We have thus derived the density of the Pearson type IV distribution:
The normalizing constant involves the complex Gamma function (Γ) and the Beta function (B).
The Pearson type VII distribution
[edit]The shape parameter ν of the Pearson type IV distribution controls its skewness. If we fix its value at zero, we obtain a symmetric three-parameter family. This special case is known as the Pearson type VII distribution (cf. Pearson 1916, p. 450). Its density is
where B is the Beta function.
An alternative parameterization (and slight specialization) of the type VII distribution is obtained by letting
which requires . This entails a minor loss of generality but ensures that the variance of the distribution exists and is equal to . Now the parameter m only controls the kurtosis of the distribution. If m approaches infinity as λ and σ are held constant, the normal distribution arises as a special case:
This is the density of a normal distribution with mean λ and standard deviation σ.
It is convenient to require that and to let
This is another specialization, and it guarantees that the first four moments of the distribution exist. More specifically, the Pearson type VII distribution parameterized in terms of has a mean of λ, standard deviation of σ, skewness of zero, and excess kurtosis of .
Student's t-distribution
[edit]The Pearson type VII distribution subsumes Student's t-distribution, and hence also the Cauchy distribution. Student's t-distribution arises as the result of applying the following substitutions to its original parameterization:
- and
where . Observe that the constraint is satisfied. The density of this restricted one-parameter family is
which is easily recognized as the density of Student's t-distribution.
Case 2, non-negative discriminant
[edit]If the quadratic function (2) has a non-negative discriminant (), it has real roots r1 and r2 (not necessarily distinct):
XXXX We will assume without loss of generality that . XXXX (is that really necessary?)
In the presence of real roots the quadratic function (2) can be written as
and the solution to the differential equation is therefore
Pearson (1895, p. 362) called this the "logarithmic case", because the integral
involves only the logarithm function, and not the arctan function as in the previous case.
Using the substitution
we obtain the following solution to the differential equation (1):
Since this density is only known up to a hidden constant of proportionality, that constant can be changed and the density written as follows:
XXXXXXXXXXXXXXXXXXXXXXXXX
The Pearson type I distribution arises when the roots of the quadratic equation (2) are of opposite sign, that is, . Then the solution p is supported on the interval . Apply the substition
which yields a solution in terms of y that is supported on the interval :
Regrouping constants and parameters, this simplifies to:
where m and n are two arbitrary real parameters. It turns out that is necessary and sufficient for p to be a proper probability density function.
The Pearson type II distribution is a special case of the Pearson type I family restricted to symmetric distributions.
Pearson type III distribution gamma distribution, chi-squared distribution
Pearson type V distribution inverse-gamma distribution
Pearson type VI distribution beta prime distribution, F-distribuion
Relation to other distributions
[edit]The Pearson family subsumes the following distributions, among others:
- beta distribution (type I)
- beta prime distribution (type VI)
- Cauchy distribution (type IV)
- chi-squared distribution (type III)
- continuous uniform distribution (limit of type I)
- exponential distribution (type III)
- gamma distribution (type III)
- F-distribuion (type VI)
- inverse-chi-squared distribution (type V)
- inverse-gamma distribution (type V)
- normal distribution (limit of type I, III, IV, V, or VI)
- Student's t-distribution (type IV)
OLD
[edit]The Pearson family is divided into several types whose shape depends on the values of its parameters. The following cases are distinguished:
- . This is type III.
- . This yields the normal distribution with mean a and variance b0.
- . Then
- No real roots. This is type IV.
- No real roots and furthermore . This is type VII, a special case of type IV.
- Exactly one real root. This is type V.
- Two real roots.
- Two real roots of the opposite sign. This is type I.
- Real roots r and −r. This is type II, a special case of type I.
- Two real roots of the same sign. This is type VI.
- Two real roots of the opposite sign. This is type I.
More old stuff
[edit]Pearson described five types of continuous distributions:
- I. Limited range in both directions, with skewness.
- II. Limited range in both directions, symmetric.
- III. Limited range in one direction only (must be skew).
- IV. Unlimited range in both directions, with skewness.
- V. Unlimited range in both directions, symmetric.
Generalizing the hypergeometric distribution, Pearson proposed a probability density proportional to:
for , and by taking various limits of this derived forms for his Types I, II, III, and V. For Type IV he derived a related form:
These can be shifted to the desired location. Some of his forms correspond to other named distributions.
Applications
[edit]These models are used in financial markets, given their ability to be parametrised in a way that has intuitive meaning for market traders. A number of models are in current use that capture the stochastic nature of the volatility of rates, stocks etc. and this family of distributions may prove to be one of the more important.
Pearson type III distribution
[edit]Parameters |
location (real) scale (real) shape (real) | ||
---|---|---|---|
Support | |||
CDF | |||
Mean | |||
Variance | |||
Skewness | |||
Excess kurtosis | |||
CF |
The Pearson Type III distribution is given by the probability density function
where x ∈ [α,∞) and α, β and p are parameters of the distribution with β > 0 and p > 0 (Abramowitz and Stegun 1954, p. 930). Here, Γ denotes the Gamma function.
- Mean:
When α=0, β=2, and p is half-integer, the Pearson Type III distribution becomes the chi squared distribution of 2p degrees of freedom.
Notes
[edit]- ^
Miller, Jeff (2006-07-09). "Beta distribution". Earliest Known Uses of Some of the Words of Mathematics. Retrieved December 9, 2006.
{{cite web}}
: External link in
(help); Italic or bold markup not allowed in:|work=
|work=
(help); Unknown parameter|coauthors=
ignored (|author=
suggested) (help) - ^
Miller, Jeff (2006-12-07). "Gamma distribution". Earliest Known Uses of Some of the Words of Mathematics. Retrieved December 9, 2006.
{{cite web}}
: External link in
(help); Italic or bold markup not allowed in:|work=
|work=
(help); Unknown parameter|coauthors=
ignored (|author=
suggested) (help)
Sources
[edit]Primary sources
[edit]- Pearson, Karl (1893). "Contributions to the mathematical theory of evolution [abstract]". Proceedings of the Royal Society of London. 54: 329–333.
- Pearson, Karl (1895). "Contributions to the mathematical theory of evolution, II: Skew variation in homogeneous material". Philosophical Transactions of the Royal Society of London. A. 186: 343–414.
- Pearson, Karl (1901). "Mathematical contributions to the theory of evolution, X: Supplement to a memoir on skew variation". Philosophical Transactions of the Royal Society of London. Series A, Containing Papers of a Mathematical or Physical Character. 197: 443–459.
- Pearson, Karl (1916). "Mathematical contributions to the theory of evolution, XIX: Second supplement to a memoir on skew variation". Philosophical Transactions of the Royal Society of London. Series A, Containing Papers of a Mathematical or Physical Character. 216: 429–457.
- Rhind, A. (1909). "Tables to facilitate the computation of the probable errors of the chief constants of skew frequency distributions". Biometrika. 7 (1/2): 127–147. doi:10.1093/biomet/7.1-2.127.
{{cite journal}}
: Unknown parameter|month=
ignored (help)
Secondary sources
[edit]- Milton Abramowitz and Irene A. Stegun (1964). Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables. National Bureau of Standards.
- Eric W. Weisstein et al. Pearson Type III Distribution. From MathWorld.