User:Mct mht/Wide-sense stationary time series
Definition
[edit]Let {ξt} be a family of complex-valued random variables of mean zero indexed by t ∈ ℝ or ℤ. Such a family is said to be a wide-sense stationary stochastic process (or wide-sense stationary time series in the case of discrete time) when the covariance between any two members ξt and ξt, i.e.
is finite and only depends on t - s. This implies that {ξt} lie in the Hilbert space L2.
The function
is called the autocovariance function of the process.
Spectral measure
[edit]Existence
[edit]The autocovariance function is by construction a positive definite function on the group ℝ (in the continuous time case) or ℤ (discrete time case). By Bochner's theorem, there exists a positive measure μ on ℝ or the unit circle T such that the Fourier transform of μ is R(t):
Examples
[edit]Some examples in the discrete time case:
An orthonormal sequence {εt} of random variables is called a white noise time series. The autocovariance funtion of is given by the Kronecker delta function on ℤ: R(t) = δ0 t. The spectral measure is the Lebesgue measure dm on [0,1].
Let {ak} be a l1-sequence of complex numbers. A moving average time series {ξt} is formed by formally convolving {ak} and the white noise {εt}:
The autocovariance function is given by convolution (denoted by *) between the sequence {ak} and the entry-wise conjugate of {a-k}:
If ak is only non-zero for 0 ≤ k ≤ p, then the process is said to be a one-sided moving average of order p. The Fourier transform of {ak} in this case is a polynomial P. The Fourier transform of R(t) is simply the squared modulus of P:
The spectral measure is then absolutely continuous with respect to the Lebesgue measure with Radon-Nikodym derivative |P( e-2π i λ)|2. This function is called the spectral density of the process.
An autoregressive process is a process of the form
where {εt} is a white noise process. When all zeros of the complex polynomial lies outside the unit disk, the stochastic difference equation defining an AR process has a wide-sense stationary solution. Consider here the Banach space consisting of sequences of L2 random variables equipped with the supremum norm. Denote by L the shift operator on this space and Id the identity operator. Then the AR equation has the operator form
By the spectral mapping theorem, the bounded operator is invertible. von Neumann's inequality then implies that its inverse is the given by the series
ARMA
Spectral analysis
[edit]First example: almost-periodic time series
[edit]The existence of a spectral measure is the starting point of Fourier analysis for stationary time series. The goal is to understand the series in terms of its frequency content.
It turns out that ξt can be viewed as a pure harmonic e-2π i λk t with "random amplitudes" Z(λ). This is made precise by the notion of integration with respect to an orthogonal stochastic measure.
Consider the following special case. Let {ξt = ∑k = 1Nzk e-2π i λk t} where zk, k = 1...N, are orthogonal L2-random variables with mean 0 and standard deviation ||zk||2 = σk. Such a stationary time series is said to be almost periodic. By definition, each ξt is a sum of "pure frequencies" e-2π i λk t with "random amplitude" zk of "intensity" σk. If one defines an discrete L2-valued measure Z on [0,1] by Z(Δ) = zk for any Borel set Δ containing λk and no other λ 's, then each ξt is the stochastic integral of the pure harmonic e-2π i λ t with respect to Z.
For an almost-periodic {ξt}, the autocovariance function is R(t) = ∑k = 1N σk2 e-2π i λk t and the spectral measure is the sum of Dirac measures dμ = ∑k = 1Nσk2δλk. The spectral measure gives a Hilbert space isomorphism from (L2, μ) to the Hilbert subspace generated by {ξt}. Under this isomorphism, the image of the indicator function IΔ where Δ is a Borel set containing λk and no other λ 's is precisely zk. This is the stochastic measure for an almost-periodic process.
This discussion can be extended to an arbitrary stationary time series, and thus allows one to view the t-th element as the integral of the t-th harmonic with respect to a suitable stochastic measure. This is a Bochner's theorem for stationary time series: every stationary time series is the sequence of "Fourier coefficients" of a stochastic measure on the unit circle.
Orthogonal stochastic measures
[edit]Let (E, Ɛ) be a measurable space and Ɛ0 ⊂ Ɛ an algebra of subsets. A map Z: Ɛ0 → L2(Ω, P) is an orthogonal stochastic measure if it satisfies:
- (Finite additivity) For any two disjoint Δ1 and Δ2 in Ɛ0, Z(Δ1 ∪ Δ2) = Z(Δ1) + Z(Δ2).
- (Orthogonality) For any two disjoint Δ1 and Δ2 in Ɛ0, Z(Δ1)⊥Z(Δ2).
Such a measure is a special case of vector-valued measure.
Given such a Z, the function m(Δ) = E(|Z(Δ)|2) = ||Z(Δ)||2 is a finitely additive positive measure on Ɛ0, and therefore by Caratheodory's theorem can be extended to a finite positive measure on Ɛ. This measure, still denoted by m is called the structure function of Z.
The stochastic integral of f in L2(E, Ɛ) with respect to a stochastic measure Z is defined in a natural way as a unitary operator from L2(E, Ɛ, m) to L2(Ω, P). For any simple function f = ak IΔk in L2(E, Ɛ, m), define
This defines a linear operator on the dense subspace of simple functions, and it preserves the inner product:
Extending by continuity allows one to define the integral ∫ f dZ(Δ) for any f in L2(E, Ɛ, m).
Spectral resolution
[edit]As stated above, the spectral resolution is a Bochner's theorem for stationary time series.
Theorem For every stationary time series {ξt} with mean 0 and spectral measure μ, there exists an orthogonal stochastic measure Z = Z(Δ) defined on Borel subsets Δ of [0,1] such that
- The variance of Z(Δ), ||Z(Δ)||2 = E |Z(Δ)|2 = μ(Δ).
- For all t ∈ ℤ, ξt = ∫ e-2π i λ t dZ(λ) P-almost everywhere.
The proof of the theorem follows the same outline as in the almost periodic case. Let L2(ξ) denote the Hilbert subspace generated by {ξt}. By definition of μ (and the Stone-Weierstrass theorem), the map ξt↦ e-2π i λ t extends to a unitary operator U : L2(ξ) → L2([0,1], μ). Form an orthogonal stochastic measure by Z(Δ) = U-1(IΔ). Then by unitarity, ||Z(Δ)||2 = ||IΔ||2 = μ(Δ). Therefore, crucially, the structure function of Z(Δ) is the spectral measure μ.
On the set of simple functions, the isomorphism U-1 defined above agree with integration ∫ with respect to Z. Therefore, for any f ∈ L2([0,1], μ), U-1(f) = ∫ f dZ(Δ) P-almost everywhere. In particular, it is true for f = e-2π i λ t.
The distribution function associated to μ is sometimes called the spectral function of the time series {ξt}. Its stochastic analog is an stochastic process with orthogonal increments indexed by λ and defined using Z(Δ): Zλ = Z([0, λ]).
L2-ergodic theorem
[edit]0 should be replaced with one-half
The dominated convergence theorem yields that
In terms of the autocovariance function R(t),
Similarly, In L2([0,1], μ),
Via the unitary operator ∫(⋅)dZ(Δ), we have the L2-ergodic theorem for stationary time series:
Theorem For any stationary time series {ξt} with mean m and corresponding stochastic measure Z,
In particular, when μ({0}) = 0, then arithmetic mean/sample average (1/n)∑t = 0n ξt of the time series converges to its true mean m in L2. Conversely, when (1/n)∑t = 0n ξt converges m in L2, then μ({0}) must be 0 by the Cauchy-Schwarz inequality. In other words, a L2-law of large numbers hold for a stationary time series if and only if μ({0}) = 0.
When m = 0 and μ({0}) ≠ 0 (and consequently Z({0}) = α ≠ 0 in L2), one can apply the same calculation to the modified series ηt = ξt - α and obtain that (1/n)∑t = 0n ξt converges to the "random constant" α in L2</sup.
Filtering
[edit]The proof of the spectral resolution theorem constructs explicitly a unitary operator from L2([0,1], μ) to L2(ξ) which is integrating with respect to Z. Thus the theorem can be rephrased as follows:
Corollary For any η in L2(ξ), there exists a unique φ in L2([0,1], μ) such that η = ∫ φ dZ(λ). The image of η under U is φ.
In other words, any linear combination of {ξt} (and their L2-limits) can be obtained by integrating some φ in L2([0,1], μ) with respect to Z(Δ).
Of particular interest among such linear transformation are linear filters. Formally, a filter is represented by convolution with a l1- or 12-sequence {h(s)}s∈ℤ. After receiving as input the time series {ξt}, the resulting output of the filter is
The implementing sequence is called the impulse response of the filter. A filter is said to be physically realizable if h(s) = 0 for all s < 0, i.e. the output of the system only depends on past values of input. A moving-average process is obtained by filtering a white-noise process, and is physically realizable if it is a one-sided moving-average.
Assuming the series defining ηt converges in L2, each ηt lies in L2(ξ) and therefore must be of the form ηt = ∫ φt dZ(λ) for some φt. In fact,
where φ(λ) = ∑s∈ℤ hs e-2π i λ s is the Fourier transform of h; it is also called the spectral characteristic of the filter. In other words, in λ-domain the frequency content of the input {ξt} is filtered by φ(λ).
By the above calculation, a moving average process necessarily has a spectral density. In fact, the converse holds also: any stationary sequence with spectral density can be represented as a moving-average process (on a possibly "larger" probability space).
Characterization of process with "squared" spectral density. (One-sided MA)
Characterization of process with rational spectral density. (ARMA)
Statistical estimation
[edit]Consider a stationary time series {ξt} of mean m, autocovariance function R(t), and spectral density f.
For mean
[edit]Given observation x = (x0,...,xN-1) of size N from ξ0...ξN-1, the sample mean is
By linearity of expectation, mN is a unbiased estimator for the true mean m. By the ergodic theorem above, mN is also a consistent estimator in the L2-sense (the existence of the spectral density implies that μ(1/2) = 0).
For autocovariance function
[edit]For the autocovariance function R(n), it is natural to define the following estimator bases on N observations x = (x0,...,xN-1), where 0 ≤ n < N:
This is an unbiased estimator for the elements of R(n) it computes:
Next we consider L2-consistency. Fix n, consider the series {ηt} = {ξtξt + n}. Each ηt has the same mean R(n). If this is again a stationary time series, and the hypothesis of the L2-law of large numbers is satisfied, then consistency holds:
i.e.
A special case under which these conditions can be easily characterized is when {ξt} is a Gaussian stationary series with mean 0. For jointly-normal random variables, the means and variance-covariance matrix specifies the joint distribution. So the Gaussian assumption implies that ηt is wide-sense stationary. Its autocovariance function is given by
For spectral density
[edit]Assume the spectral density f(λ) exists. Then the autocovariance function R(t) is the Fourier transform of f:
Recovering the L1 function f on the circle from its Fourier series R(t) is a classical problem in Fourier analysis. The difficulty is due to fact that Fourier inversion theorem only applies for f in L1(T) whose Fourier transform is an l1-sequence. Even for a continuous f, the symmetric partial sum
diverges in general. (In fact there is a residual set of continuous functions in C(T) for whom Sm(f) diverges on a dense subset of T. See the article Convergence of Fourier series).
The classical remedy is to introduce a summability kernel Φs(t). Φs(t) should have the following property:
- (Φs(t))t∈ℤ that forms an approximate unit, as s→0, in the Banach algebra c0 of sequences vanishing at infinity.
- For each s, (Φs(t)) lies in the domain of the Fourier inversion theorem.
Then by the inversion theorem,
converges to f in L1 and, if f is continuous, uniformly as s→0. This works because the Fourier transforms of Φs(t) = Φ^s(λ)forms an approximate unit in the convolution algebra L1(T).
One example of a summability kernel is the Fejer kernel (let s = 1/N)
It has Fourier transform
In the context of estimating the spectral density of a stationary time series, the same techniques apply but one need to replace R(t) by an appropriate estimator.
Wold decomposition
[edit]The spectral representation gives a integral decomposition of a stationary time series in the frequency domain; it provides a Fourier-type analysis for stationary time series. In contrast, Wolds's decomposition expresses a stationary time series as the sum of "deterministic" and "completely nondeterministic" parts in the time domain by using geometric features of Hilbert space.
For a stationary time series {ξt}, denote by L2(ξ) the Hilbert subspace generated by {ξt}t∈ℤ and L2t(ξ) the Hilbert subspace generated by {ξt, ξt-1}, ξt-2...}. Define
Then L2(ξ) can be written as an orthogonal sum
Each ξt then is a corresponding orthogonal sum ξt = ξtr + ξts where ξtr ∈ R(ξ) and ξts ∈ S(ξ). Informally, the sequence {ξts} is the part of {ξt} that live in the infinite past ("at the beginning of time") and is the deterministic part of {ξt}.
More precisely, a time series {ηt) is called deterministic if S(η) = L2(η) and completely nondeterministic is R(η) = L2(η). For {ξtr}, S(ξr) ⊥ S(ξ) because every ξtr is orthogonal to S(ξ) by definition. But S(ξr) ⊂ S(ξ) also, which implies S(ξr) = {0}. So {ξtr} is completely nondeterministic. For {ξts}, S(ξs) ⊂ L2(ξs) ⊂ S(ξ). But S(ξ) ⊂ L2t(ξs) (⊕ L2t(ξr)) for all t. So S(ξ) ⊂ S(ξs). This shows S(ξs) = L2(ξs), i.e. {ξts} is deterministic. One can also show this decomposition is unique. In summary, we have the following theorem.
Theorem For any stationary time series {ξt}, there exists a unique pair of time series {ξtr} and {ξts} such that
- ξt = ξtr + ξts for all t.
- {ξtr} and {ξts} are orthogonal.
- {ξtr} is completely nondeterministic and {ξts} is deterministic.
Remark Wold's decomposition has a counterpart in operator theory, which bears the same name. The operator version says that any unitary operator on a Hilbert space can be decomposed into a unitary part and a completely nonunitary part. These correspond to the deterministic and completely nondeterministic part of a time series respectively.
Characterization of completely nondeterministic time series as one-sided moving averages
[edit]Let {εt} be a white-noise process. A one-sided moving average is an immediate example of a completely nondeterministic time series:
for some l1-sequence {ak}. This in fact characterizes completely non-deterministic processes, i.e. they can all be viewed as the output signal of a physically realizable filter whose input is white noise.
A white-noise process {εt} is said to be an innovation process for {ξt} if L2t(ε)= L2t(ξ) for all t. Innovation" means εt+1 provided "new information" that is needed to form ξt+1, together with the past.
Theorem A stationary time series {ξt} is completely nondeterministic if and only if it is a one-sided moving average, i.e.
for some (ak) ∈ l2 and some {εt} that is innovation for {ξt}. The convergence of the series holds in the L2-sense.
As stated above, sufficiency holds by definition. Necessity follows from the Gram-Schmidt procedure as follows: Fix t. Let ε0 be a unit vector in
and a0ε0 be the projection of ξt onto ε0. By stationarity and the assumption that {ξt} is completely nondeterministic, for each s the subspaces
is one-dimensional. (If any one of them is {0}, then it is {0} for any s by stationarity, in which case {ξt} is trivially deterministic.) So this procedure must produce an orthonormal basis for L2t(ξ) and we have
where {εt} is an innovation for {ξt} by construction. The coefficients ak produced is independent of t by covariance-stationarity. This proves the theorem.
This gives a refinement of the Wold decomposition.
Corollary