In information theory, a result is known as a converse if it provides an upper bound on the achievable rate for a given channel. Thus a converse, combined with a matching achievability result (lower bound) determines the capacity of a channel. In general, two types of converse results have been studied. The first type, known as a weak converse, provides an upper bound on the achievable rate for a given probability of error ε. The implies that if a transmitter sends at a rate higher than the bound provided by the weak converse, the probability of error will be greater than ε. A strong converse, on the other hand, provides a much stronger result: if a transmitter sends at a rate higher than the given bound, the probability of error will not only be greater than ε, but will converge to one as codes with larger blocklengths are utilized.
In Shannon's noisy-channel coding theorem, Fano's inequality is used to obtain a weak converse. However, while Fano's inequality provides a non-vanishing lower bound for the probability of error when the transmitter's rate is above the channel capacity, it is not sufficient to prove that the probability of error converges to one when the blocklength goes to infinity.
Let and be sets. A channel, with input alphabet and output alphabet , is a sequence of conditional probability distributions where
The channel is said to be discrete if both and are finite sets. A channel is called memoryless if for every positive integer we have
We say a channel is stationary if every stationary input results in a stationary output. A memoryless channel is stationary if the functions are equal for all .
Therefore a stationary memoryless channel can be simply represented as the triple
A (2nR,n) code consists of a message set
an encoder
a decoder
The average probability of error of the code is given by
The value of n is known as the blocklength of the code.
A rate R (which is a nonnegative number) is said to be achievable, if there exists a sequence of (2nR,n) codes with P(n)e going to zero as
n goes to infinity. The noisy-channel coding theorem states that a rate R is achievable if and only if R is smaller than the capacity of the channel C, where
Wolfowitz's theorem states that for any discrete memoryless channel with capacity C
and any (2nR,n) code with rate R>C,
for some positive constant A dependent on the channel but not on n or
M. The proof which follows is based on Gallager's book[1].
For the proof we first require a lemma. This lemma is essentially a special case of the method of Lagrange multipliers for a concave function defined on the standard simplex . It is then followed by a corollary which simply applies the lemma to the mutual information.
Let be a concave function. Suppose f has continuous
partial derivatives on its domain. Then
maximizes iff there exists some real such that for every
and for every
Suppose satisfies the above conditions. We'll
show that achieves its maximum at . Let
be any element of . By the concavity
of for any , we have
thus
Allowing and making use of the continuity of partial
derivatives results in
For the other direction suppose maximizes . Then for every and every ,
This implies
and by the continuity of the partial derivatives,
| | (1) |
Since , at least one of its components, say
is strictly positive. Now let be an arbitrary element of . Furthermore,
for every , let denote the element of that
consists of all zeros but one one in the position. Define
Then inequality (1) simplifies to
In addition, if and we define
then (1) results in
Thus if we define as
the result follows.
For any discrete memoryless channel the distribution
achieves capacity iff there exists a real number such that
for , and
for , where
.
Furthermore, is the capacity of the channel.
The proof of the corollary is straightforward and follows directly from the lemma.
To see this, note that for any ,
Since
this implies
Now using the lemma, the claim follows.
Proof of the Strong Converse
[edit]
For any two random variables X and Y define the information
density as
Note that
and
Let be the capacity-achieving output distribution.
For any positive integer n, define
For any ,
define
where
Consider a (2nR,n) code with codewords
and decoding regions
. Then the probability that a codeword is
decoded correctly is given by
Fix positive ε. For every m, define
the set
Then
Based on the definition of Bm, the second sum can be upper bounded as
Using Chebyshev's inequality we can find an upper bound on the first sum
where
If we define
(note that A only depends on the channel and is independent of n and M) then
Therefore
Since , we get
Should R>C, setting ε
to R-C/2 results in
- Gallager, Robert G. (1968). Information Theory and Reliable Communication. Wiley. p. 87-89 and 173-175.
InfoTheorist (talk) 00:24, 5 November 2014 (UTC)