Grouped Dirichlet distribution

In statistics, the grouped Dirichlet distribution (GDD) is a multivariate generalization of the Dirichlet distribution It was first described by Ng et al. 2008.^[1] The Grouped Dirichlet distribution arises in the analysis of categorical data where some observations could fall into any of a set of other 'crisp' category. For example, one may have a data set consisting of cases and controls under two different conditions. With complete data, the cross-classification of disease status forms a 2(case/control)-x-(condition/no-condition) table with cell probabilities

	Treatment	No Treatment
Controls	θ₁	θ₂
Cases	θ₃	θ₄

If, however, the data includes, say, non-respondents which are known to be controls or cases, then the cross-classification of disease status forms a 2-x-3 table. The probability of the last column is the sum of the probabilities of the first two columns in each row, e.g.

	Treatment	No Treatment	Missing
Controls	θ₁	θ₂	θ₁+θ₂
Cases	θ₃	θ₄	θ₃+θ₄

The GDD allows the full estimation of the cell probabilities under such aggregation conditions.^[1]

Probability Distribution

Consider the closed simplex set ${\mathcal {T}}_{n}=\left\{\left(x_{1},\ldots x_{n}\right)\left|x_{i}\geq 0,i=1,\cdots ,n,\sum _{i=1}^{n}x_{n}=1\right.\right\}$ and $\mathbf {x} \in {\mathcal {T}}_{n}$ . Writing $\mathbf {x} _{-n}=\left(x_{1},\ldots ,x_{n-1}\right)$ for the first $n-1$ elements of a member of ${\mathcal {T}}_{n}$ , the distribution of $\mathbf {x}$ for two partitions has a density function given by

\operatorname {GD} _{n,2,s}\left(\left.\mathbf {x} _{-n}\right|\mathbf {a} ,\mathbf {b} \right)={\frac {\left(\prod _{i=1}^{n}x_{i}^{a_{i}-1}\right)\cdot \left(\sum _{i=1}^{s}x_{i}\right)^{b_{1}}\cdot \left(\sum _{i=s+1}^{n}x_{i}\right)^{b_{2}}}{\operatorname {\mathrm {B} } \left(a_{1},\ldots ,a_{s}\right)\cdot \operatorname {\mathrm {B} } \left(a_{s+1},\ldots ,a_{n}\right)\cdot \operatorname {\mathrm {B} } \left(b_{1}+\sum _{i=1}^{s}a_{i},b_{2}+\sum _{i=s+1}^{n}a_{i}\right)}}

where $\operatorname {\mathrm {B} } \left(\mathbf {a} \right)$ is the Multivariate beta function.

Ng et al.^[1] went on to define an m partition grouped Dirichlet distribution with density of $\mathbf {x} _{-n}$ given by

\operatorname {GD} _{n,m,\mathbf {s} }\left(\left.\mathbf {x} _{-n}\right|\mathbf {a} ,\mathbf {b} \right)=c_{m}^{-1}\cdot \left(\prod _{i=1}^{n}x_{i}^{a_{i}-1}\right)\cdot \prod _{j=1}^{m}\left(\sum _{k=s_{j-1}+1}^{s_{j}}x_{k}\right)^{b_{j}}

where $\mathbf {s} =\left(s_{1},\ldots ,s_{m}\right)$ is a vector of integers with $0=s_{0}<s_{1}\leqslant \cdots \leqslant s_{m}=n$ . The normalizing constant given by

c_{m}=\left\{\prod _{j=1}^{m}\operatorname {\mathrm {B} } \left(a_{s_{j-1}+1},\ldots ,a_{s_{j}}\right)\right\}\cdot \operatorname {\mathrm {B} } \left(b_{1}+\sum _{k=1}^{s_{1}}a_{k},\ldots ,b_{m}+\sum _{k=s_{m-1}+1}^{s_{m}}a_{k}\right)

The authors went on to use these distributions in the context of three different applications in medical science.

References

^ ^a ^b ^c Ng, Kai Wang (2008). "Grouped Dirichlet distribution: A new tool for incomplete categorical data analysis". Journal of Multivariate Analysis. 99: 490–509.

[ng2008-1] Ng, Kai Wang (2008). "Grouped Dirichlet distribution: A new tool for incomplete categorical data analysis". Journal of Multivariate Analysis. 99: 490–509.

[1]