In probability theory , it is possible to approximate the moments of a function f of a random variable X using Taylor expansions , provided that f is sufficiently differentiable and that the moments of X are finite.
A simulation-based alternative to this approximation is the application of Monte Carlo simulations .
Given
μ
X
{\displaystyle \mu _{X}}
and
σ
X
2
{\displaystyle \sigma _{X}^{2}}
, the mean and the variance of
X
{\displaystyle X}
, respectively,[ 1] a Taylor expansion of the expected value of
f
(
X
)
{\displaystyle f(X)}
can be found via
E
[
f
(
X
)
]
=
E
[
f
(
μ
X
+
(
X
−
μ
X
)
)
]
≈
E
[
f
(
μ
X
)
+
f
′
(
μ
X
)
(
X
−
μ
X
)
+
1
2
f
″
(
μ
X
)
(
X
−
μ
X
)
2
]
=
f
(
μ
X
)
+
f
′
(
μ
X
)
E
[
X
−
μ
X
]
+
1
2
f
″
(
μ
X
)
E
[
(
X
−
μ
X
)
2
]
.
{\displaystyle {\begin{aligned}\operatorname {E} \left[f(X)\right]&{}=\operatorname {E} \left[f\left(\mu _{X}+\left(X-\mu _{X}\right)\right)\right]\\&{}\approx \operatorname {E} \left[f(\mu _{X})+f'(\mu _{X})\left(X-\mu _{X}\right)+{\frac {1}{2}}f''(\mu _{X})\left(X-\mu _{X}\right)^{2}\right]\\&{}=f(\mu _{X})+f'(\mu _{X})\operatorname {E} \left[X-\mu _{X}\right]+{\frac {1}{2}}f''(\mu _{X})\operatorname {E} \left[\left(X-\mu _{X}\right)^{2}\right].\end{aligned}}}
Since
E
[
X
−
μ
X
]
=
0
,
{\displaystyle E[X-\mu _{X}]=0,}
the second term vanishes. Also,
E
[
(
X
−
μ
X
)
2
]
{\displaystyle E[(X-\mu _{X})^{2}]}
is
σ
X
2
{\displaystyle \sigma _{X}^{2}}
. Therefore,
E
[
f
(
X
)
]
≈
f
(
μ
X
)
+
f
″
(
μ
X
)
2
σ
X
2
{\displaystyle \operatorname {E} \left[f(X)\right]\approx f(\mu _{X})+{\frac {f''(\mu _{X})}{2}}\sigma _{X}^{2}}
.
It is possible to generalize this to functions of more than one variable using multivariate Taylor expansions . For example,[ 2]
E
[
X
Y
]
≈
E
[
X
]
E
[
Y
]
−
cov
[
X
,
Y
]
E
[
Y
]
2
+
E
[
X
]
E
[
Y
]
3
var
[
Y
]
{\displaystyle \operatorname {E} \left[{\frac {X}{Y}}\right]\approx {\frac {\operatorname {E} \left[X\right]}{\operatorname {E} \left[Y\right]}}-{\frac {\operatorname {cov} \left[X,Y\right]}{\operatorname {E} \left[Y\right]^{2}}}+{\frac {\operatorname {E} \left[X\right]}{\operatorname {E} \left[Y\right]^{3}}}\operatorname {var} \left[Y\right]}
Similarly,[ 1]
var
[
f
(
X
)
]
≈
(
f
′
(
E
[
X
]
)
)
2
var
[
X
]
=
(
f
′
(
μ
X
)
)
2
σ
X
2
−
1
4
(
f
″
(
μ
X
)
)
2
σ
X
4
{\displaystyle \operatorname {var} \left[f(X)\right]\approx \left(f'(\operatorname {E} \left[X\right])\right)^{2}\operatorname {var} \left[X\right]=\left(f'(\mu _{X})\right)^{2}\sigma _{X}^{2}-{\frac {1}{4}}\left(f''(\mu _{X})\right)^{2}\sigma _{X}^{4}}
The above is obtained using a second order approximation, following the method used in estimating the first moment. It will be a poor approximation in cases where
f
(
X
)
{\displaystyle f(X)}
is highly non-linear. This is a special case of the delta method .
Indeed, we take
E
[
f
(
X
)
]
≈
f
(
μ
X
)
+
f
″
(
μ
X
)
2
σ
X
2
{\displaystyle \operatorname {E} \left[f(X)\right]\approx f(\mu _{X})+{\frac {f''(\mu _{X})}{2}}\sigma _{X}^{2}}
.
With
f
(
X
)
=
g
(
X
)
2
{\displaystyle f(X)=g(X)^{2}}
, we get
E
[
Y
2
]
{\displaystyle \operatorname {E} \left[Y^{2}\right]}
. The variance is then computed using the formula
var
[
Y
]
=
E
[
Y
2
]
−
μ
Y
2
{\displaystyle \operatorname {var} \left[Y\right]=\operatorname {E} \left[Y^{2}\right]-\mu _{Y}^{2}}
.
An example is,[ 2]
var
[
X
Y
]
≈
var
[
X
]
E
[
Y
]
2
−
2
E
[
X
]
E
[
Y
]
3
cov
[
X
,
Y
]
+
E
[
X
]
2
E
[
Y
]
4
var
[
Y
]
.
{\displaystyle \operatorname {var} \left[{\frac {X}{Y}}\right]\approx {\frac {\operatorname {var} \left[X\right]}{\operatorname {E} \left[Y\right]^{2}}}-{\frac {2\operatorname {E} \left[X\right]}{\operatorname {E} \left[Y\right]^{3}}}\operatorname {cov} \left[X,Y\right]+{\frac {\operatorname {E} \left[X\right]^{2}}{\operatorname {E} \left[Y\right]^{4}}}\operatorname {var} \left[Y\right].}
The second order approximation, when X follows a normal distribution, is:[ 3]
var
[
f
(
X
)
]
≈
(
f
′
(
E
[
X
]
)
)
2
var
[
X
]
+
(
f
″
(
E
[
X
]
)
)
2
2
(
var
[
X
]
)
2
=
(
f
′
(
μ
X
)
)
2
σ
X
2
+
1
2
(
f
″
(
μ
X
)
)
2
σ
X
4
+
(
f
′
(
μ
X
)
)
(
f
‴
(
μ
X
)
)
σ
X
4
{\displaystyle \operatorname {var} \left[f(X)\right]\approx \left(f'(\operatorname {E} \left[X\right])\right)^{2}\operatorname {var} \left[X\right]+{\frac {\left(f''(\operatorname {E} \left[X\right])\right)^{2}}{2}}\left(\operatorname {var} \left[X\right]\right)^{2}=\left(f'(\mu _{X})\right)^{2}\sigma _{X}^{2}+{\frac {1}{2}}\left(f''(\mu _{X})\right)^{2}\sigma _{X}^{4}+\left(f'(\mu _{X})\right)\left(f'''(\mu _{X})\right)\sigma _{X}^{4}}
First product moment [ edit ]
To find a second-order approximation for the covariance of functions of two random variables (with the same function applied to both), one can proceed as follows. First, note that
cov
[
f
(
X
)
,
f
(
Y
)
]
=
E
[
f
(
X
)
f
(
Y
)
]
−
E
[
f
(
X
)
]
E
[
f
(
Y
)
]
{\displaystyle \operatorname {cov} \left[f(X),f(Y)\right]=\operatorname {E} \left[f(X)f(Y)\right]-\operatorname {E} \left[f(X)\right]\operatorname {E} \left[f(Y)\right]}
. Since a second-order expansion for
E
[
f
(
X
)
]
{\displaystyle \operatorname {E} \left[f(X)\right]}
has already been derived above, it only remains to find
E
[
f
(
X
)
f
(
Y
)
]
{\displaystyle \operatorname {E} \left[f(X)f(Y)\right]}
. Treating
f
(
X
)
f
(
Y
)
{\displaystyle f(X)f(Y)}
as a two-variable function, the second-order Taylor expansion is as follows:
f
(
X
)
f
(
Y
)
≈
f
(
μ
X
)
f
(
μ
Y
)
+
(
X
−
μ
X
)
f
′
(
μ
X
)
f
(
μ
Y
)
+
(
Y
−
μ
Y
)
f
(
μ
X
)
f
′
(
μ
Y
)
+
1
2
[
(
X
−
μ
X
)
2
f
″
(
μ
X
)
f
(
μ
Y
)
+
2
(
X
−
μ
X
)
(
Y
−
μ
Y
)
f
′
(
μ
X
)
f
′
(
μ
Y
)
+
(
Y
−
μ
Y
)
2
f
(
μ
X
)
f
″
(
μ
Y
)
]
{\displaystyle {\begin{aligned}f(X)f(Y)&{}\approx f(\mu _{X})f(\mu _{Y})+(X-\mu _{X})f'(\mu _{X})f(\mu _{Y})+(Y-\mu _{Y})f(\mu _{X})f'(\mu _{Y})+{\frac {1}{2}}\left[(X-\mu _{X})^{2}f''(\mu _{X})f(\mu _{Y})+2(X-\mu _{X})(Y-\mu _{Y})f'(\mu _{X})f'(\mu _{Y})+(Y-\mu _{Y})^{2}f(\mu _{X})f''(\mu _{Y})\right]\end{aligned}}}
Taking expectation of the above and simplifying—making use of the identities
E
(
X
2
)
=
var
(
X
)
+
[
E
(
X
)
]
2
{\displaystyle \operatorname {E} (X^{2})=\operatorname {var} (X)+\left[\operatorname {E} (X)\right]^{2}}
and
E
(
X
Y
)
=
cov
(
X
,
Y
)
+
[
E
(
X
)
]
[
E
(
Y
)
]
{\displaystyle \operatorname {E} (XY)=\operatorname {cov} (X,Y)+\left[\operatorname {E} (X)\right]\left[\operatorname {E} (Y)\right]}
—leads to
E
[
f
(
X
)
f
(
Y
)
]
≈
f
(
μ
X
)
f
(
μ
Y
)
+
f
′
(
μ
X
)
f
′
(
μ
Y
)
cov
(
X
,
Y
)
+
1
2
f
″
(
μ
X
)
f
(
μ
Y
)
var
(
X
)
+
1
2
f
(
μ
X
)
f
″
(
μ
Y
)
var
(
Y
)
{\displaystyle \operatorname {E} \left[f(X)f(Y)\right]\approx f(\mu _{X})f(\mu _{Y})+f'(\mu _{X})f'(\mu _{Y})\operatorname {cov} (X,Y)+{\frac {1}{2}}f''(\mu _{X})f(\mu _{Y})\operatorname {var} (X)+{\frac {1}{2}}f(\mu _{X})f''(\mu _{Y})\operatorname {var} (Y)}
. Hence,
cov
[
f
(
X
)
,
f
(
Y
)
]
≈
f
(
μ
X
)
f
(
μ
Y
)
+
f
′
(
μ
X
)
f
′
(
μ
Y
)
cov
(
X
,
Y
)
+
1
2
f
″
(
μ
X
)
f
(
μ
Y
)
var
(
X
)
+
1
2
f
(
μ
X
)
f
″
(
μ
Y
)
var
(
Y
)
−
[
f
(
μ
X
)
+
1
2
f
″
(
μ
X
)
var
(
X
)
]
[
f
(
μ
Y
)
+
1
2
f
″
(
μ
Y
)
var
(
Y
)
]
=
f
′
(
μ
X
)
f
′
(
μ
Y
)
cov
(
X
,
Y
)
−
1
4
f
″
(
μ
X
)
f
″
(
μ
Y
)
var
(
X
)
var
(
Y
)
{\displaystyle {\begin{aligned}\operatorname {cov} \left[f(X),f(Y)\right]&{}\approx f(\mu _{X})f(\mu _{Y})+f'(\mu _{X})f'(\mu _{Y})\operatorname {cov} (X,Y)+{\frac {1}{2}}f''(\mu _{X})f(\mu _{Y})\operatorname {var} (X)+{\frac {1}{2}}f(\mu _{X})f''(\mu _{Y})\operatorname {var} (Y)-\left[f(\mu _{X})+{\frac {1}{2}}f''(\mu _{X})\operatorname {var} (X)\right]\left[f(\mu _{Y})+{\frac {1}{2}}f''(\mu _{Y})\operatorname {var} (Y)\right]\\&{}=f'(\mu _{X})f'(\mu _{Y})\operatorname {cov} (X,Y)-{\frac {1}{4}}f''(\mu _{X})f''(\mu _{Y})\operatorname {var} (X)\operatorname {var} (Y)\end{aligned}}}
If X is a random vector, the approximations for the mean and variance of
f
(
X
)
{\displaystyle f(X)}
are given by[ 4]
E
(
f
(
X
)
)
=
f
(
μ
X
)
+
1
2
trace
(
H
f
(
μ
X
)
Σ
X
)
var
(
f
(
X
)
)
=
∇
f
(
μ
X
)
t
Σ
X
∇
f
(
μ
X
)
+
1
2
trace
(
H
f
(
μ
X
)
Σ
X
H
f
(
μ
X
)
Σ
X
)
.
{\displaystyle {\begin{aligned}\operatorname {E} (f(X))&=f(\mu _{X})+{\frac {1}{2}}\operatorname {trace} (H_{f}(\mu _{X})\Sigma _{X})\\\operatorname {var} (f(X))&=\nabla f(\mu _{X})^{t}\Sigma _{X}\nabla f(\mu _{X})+{\frac {1}{2}}\operatorname {trace} \left(H_{f}(\mu _{X})\Sigma _{X}H_{f}(\mu _{X})\Sigma _{X}\right).\end{aligned}}}
Here
∇
f
{\displaystyle \nabla f}
and
H
f
{\displaystyle H_{f}}
denote the gradient and the Hessian matrix respectively, and
Σ
X
{\displaystyle \Sigma _{X}}
is the covariance matrix of X .
^ a b Haym Benaroya, Seon Mi Han, and Mark Nagurka. Probability Models in Engineering and Science . CRC Press, 2005, p166.
^ a b van Kempen, G.m.p.; van Vliet, L.j. (1 April 2000). "Mean and Variance of Ratio Estimators Used in Fluorescence Ratio Imaging" . Cytometry . 39 (4): 300–305. doi :10.1002/(SICI)1097-0320(20000401)39:4<300::AID-CYTO8>3.0.CO;2-O . Retrieved 2024-08-14 .
^ Hendeby, Gustaf; Gustafsson, Fredrik. "ON NONLINEAR TRANSFORMATIONS OF GAUSSIAN DISTRIBUTIONS" (PDF) . Retrieved 5 October 2017 .
^ Rego, Bruno V.; Weiss, Dar; Bersi, Matthew R.; Humphrey, Jay D. (14 December 2021). "Uncertainty quantification in subject‐specific estimation of local vessel mechanical properties" . International Journal for Numerical Methods in Biomedical Engineering . 37 (12): e3535. doi :10.1002/cnm.3535 . ISSN 2040-7939 . PMC 9019846 . PMID 34605615 .