Type is all you need

Introduction
Definitions
Rate of Constant Composition Code
Error Exponent of Constant Composition Code
- Lower Bound
- Upper Bound
Conclusion

Introduction

There is a surprising result from Imre Csiszar’s Coding Theorems for Discrete Memoryless Systems which states the following: For any discrete memoryless channels, as long as the codewords all belong to some type class, an upperbound on the rate of the code can be established that depends only on the type. This result is significant for a variety of researches and here I attempt to demonstrate it by simplifying the proofs in the book. Note that a basic understanding of information theory is required.

Definitions

For any discrete set \(\mathcal{X}\), and for \(n \in \mathbb{N}\), sequence of length \(n\) is defined as \(\mathbf{x}\triangleq (x_{1},\dots,x_{n}) \in \mathcal{X}^n\). Type, which is the empirical distribution of a sequence is defined as \(\hat{p}_{\mathbf{x}}(x) = \frac{N\left(x|\mathbf{x} \right)}{n} = \frac{1}{n}\sum_{i=1}^n \mathbf{1}\{x_{i}=x\}\). For any type \(P\) , type class \(\mathcal{T}_{P}^n\) is defined as the set of all sequence \(x\in{\mathcal{X}^n}\) such that \(\hat{p}_{\mathbf{x}}(x) = P\). \(\mathbb{I}(P_{X},W_{Y|X})\) is the mutual information between \(X\) and \(Y\) when \(X\) ~ \(P_{X}\) and \(Y\) ~ \(\sum_{x}P_{X}(x)W_{Y|x}\) , where \(W_{Y|x}\) is a conditional distribution on \(Y \in \mathcal{Y}\) given \(x\in \mathcal{X}\). Message is defined as \(m \in [1;|\mathcal{M}|]\) with cardinality of message set as \(|\mathcal{M}|\). Note that there is an inconsistency with the way I defined message \(m\), unlike \(\mathbf{x},\mathbf{y}\), exceptionally, lowercase unbold letter is used to represent a message sequence \(m\) and set \(\mathcal{M}\) is used to represent message sequence of length \(k\). Notice that we use base \(e\) log for this blog.

Rate of Constant Composition Code

We attempt to show the following:

For arbitrary \(\epsilon,\tau \in (0,1)\), if \((f,\psi)\) is an \((n,\epsilon)\)-code for some discrete memoryless channel \(W\) such that all codewords belong to \(\mathcal{T}_P^n\), then the rate of the code is bounded as:

\[\begin{equation} R<\mathbb{I}(P,W)+2\tau \end{equation}\]

The outline of the proof is the following: By fixing some type \(P\) such that all codewords belong the typeclass, we construct maximal code and find lower bound on the cardinality of message set by considering the conditional typicality of received codewords. Thus an upper bound on the rate can be as a function of the conditional mutual information on type \(P\).

Proof:

Let \(\mathcal{X}^n\) and \(\mathcal{Y}^n\) be two finite sets of input and output codewords. Define a stochastic matrix \(W: \mathcal{X}\rightarrow \mathcal{Y}\), whose entries \(W(y|x)\) should be interpreted as conditional probability mapping of every state of \(\mathcal{X}\) to every state of \(\mathcal{Y}\). Thus, \(n\) channel use can be written as \(W^n: \mathcal{X}^n\rightarrow \mathcal{Y}^n\). Given a channel \(W\), a code for channel can be uniquely specified by any encoder/decoder pair \((f: \mathcal{M}\rightarrow \mathcal{X}^n,\psi: \mathcal{Y}^n\rightarrow \mathcal{M}')\), such as we can define equivalent channel as:

\[T(m'|m)\triangleq W(\mathbf{y} = \psi^{-1}(m')|\mathbf{x} =f(m)).\]

The probability of erroneous decoding of message \(m\) can be now defined as:

\[e_{m} = e_{m}(W,f,\psi) \triangleq \cup_{m'\neq m} T(m'|m) = 1-T(m|m).\]

The channel coding problem is essentially the task to make \(|\mathcal{M}|\) as large as possible while keeping the maximum probability of error \(e = \underset{m\in \mathcal{M}}{\mathrm{max}} \,\,e_m\) as low as possible. We will focus solely on discrete memoryless channel defined as \(W^n(\mathbf{y}|\mathbf{x}) =\prod_{i=1}^n W(y_{i}|x_{i}).\) Thus, an \(n\)-length block code with maximum probability of error bounded as \(e(W_{n},f,\psi)\leq \epsilon\) will be called an \((n,\epsilon)\)-code.

A set \(\mathbb{B}\subset \mathcal{Y}^n\) is an \(\eta\)-image (\(\eta\in(0,1]\)) of a set \(\mathbb{A}\subset \mathcal{X}^n\) over \(W\) if \(W(\mathbb{B}|\mathbf{x})\geq \eta \,\,\forall \mathbf{x}\in \mathbb{A}\) , which is to say for every element in \(\mathbb{B}\), the probability of the event associated with the element is higher than \(\eta\) conditioned on all possible \(\mathbf{x}\in \mathbb{A}\). Define the minimum cardinality of such \(\eta\)-image as:

\[g_{W}(\mathbb{A},\eta)=\underset{\mathbb{B}}{\mathrm{min}}|\mathbb{B}|.\]

Let \((f,\psi)\) be an \((n,\epsilon)\)-code satisfying \(f(m)\in \mathbb{A} \subset \mathcal{T}^n_{P}\) (all codewords generated by encoder \(f\) from a message belong to the same type class of type \(P\)) and \(\psi^{-1}(m)\subset \mathcal{T}^n_{W}(f(m))\) (if message \(m\) is received at decoder, the mapping of \(\psi:\mathcal{M}\rightarrow \mathcal{Y}^n\) belongs to the conditional type class of \(W_{Y|X}\) given \(P_{X}\)).

Thus, we can define \(\mathbb{B}\) as \(\mathbb{B}\triangleq \cup_{m\in \mathcal{M}}\psi^{-1}(m),\) which is to say set \(\mathbb{B}\) is union of all possible received codewords such that the decoding outcome is of \(m \in \mathcal{M}\). In other words, using graphical representation, we say that \(\mathbb{B}\) is a union of hamming spheres of all possible codewords. Now we further restrict \((f,\psi)\) to be an \((n,\epsilon)\) maximal code, meaning that no extension on the code can be possible. With this definition, note that if we have a non-zero probability of \(W^n(\mathcal{T}_{W}(\mathbf{x}')-\mathbb{B}|\mathbf{x}')\geq 1-\epsilon,\) then the code is no longer maximal, as essentially we can add a new message \(m'\) with codeword \(f(m') =\mathbf{x}'\) whose hamming sphere exists outside of \(\mathbb{B}\). Thus, it is necessary for maximal code that

\[W^n(\mathcal{T}_{W}(\mathbf{x})-\mathbb{B}|\mathbf{x})< 1-\epsilon \,\, \forall \mathbf{x}\in \mathbb{A}.\]

Since provided for all \(n\geq n_{0}(|\mathcal{X}|,|\mathcal{Y}|,\tau)\) (which is just a fancy way to say \(n\) is sufficiently long depending on \(|\mathcal{X}|,|\mathcal{Y}|,\tau\)). By the definition of typicality, \(W^n(\mathcal{T}_{W}(\mathbf{x})|\mathbf{x})\geq 1-\tau \,\, \forall \mathbf{x}\in \mathbb{A}\) , it follows that \(W^n(\mathbb{B}|\mathbf{x})>\epsilon-\tau \,\, \forall \mathbf{x}\in \mathbb{A}\), and therefore by definition \(|\mathbb{B}|\geq g_{W^n}(\mathbb{A},\epsilon-\tau)\). On the other hand, by Asymptotic Equipartition Property on conditional type, we have \(|\mathbb{B}|\leq |\mathcal{M}|e^{n(H(W|P)+\tau)}\). The following two results are thus immediate:

\[|\mathcal{M}| \geq g_{W^n}(\mathbb{A},\epsilon-\tau) e^{-n(H(W|P)+\tau)},\] \[\frac{1}{n}\log g_{W^n}(\mathcal{T}_{P},\epsilon-\tau)\geq H(\mathcal{Y})-\tau.\]

Now, let \(\mathbb{B}'\subset \mathcal{Y}\) be an \((\epsilon+\tau)\)-image of \(\mathbb{A}'\subset \mathbb{A}\) for which \(|\mathbb{B}'|=g_{W^n}(\mathbb{A}',\epsilon+\tau)\), since \(e_{m} = 1-W^n(\psi^{-1}(m)|f(m))\leq \epsilon\), we have \(W^n(\mathbb{B}'\cap\psi^{-1}(m)|f(m))\geq \tau\) for every \(m:\psi^{-1}(m)\in \mathbb{A}'\). We also know that if \(X \subset \mathcal{X}^n\), \(P^n(X)\geq \eta\), then \(\frac{1}{n}\log|X|\geq H(P)-\epsilon_{n}\), and we can apply this result to conclude \(|\mathbb{B}'\cap\psi^{-1}(m)|\geq e^{n({H}(W|P)-\tau)}\), and it follows that

\[g_{W^n}(\mathbb{A}',\epsilon+\tau)=|\mathbb{B}'|\geq \sum_{m:\psi^{-1}(m)\in \mathbb{A}'}|\mathbb{B}'\cap\psi^{-1}(m)|\geq|\mathcal{M}'|e^{n(H(W|P)-\tau)},\]

where \(\mathcal{M}' =\{m:\psi^{-1}(m)\in \mathbb{A}'\}\). Since \(\mathbb{A}'\subset \mathbb{A}\), we have

\[\frac{1}{n}\log|\mathcal{M}'|< \frac{1}{n}\log g_{W^n}(\mathbb{A},\epsilon+\tau)-H(W|P)-\tau.\]

Finally, note that since

\[|\mathbb{B}'| = g_{W^n}(\mathbb{A}',\epsilon+\tau)\leq g_{W^n}(\mathbb{A},\epsilon+\tau) \leq e^{n(H(W)+\tau)},\]

we finally arrive at the result, where note that the left hand side of the equation is the rate.

\[\frac{1}{n}\log|\mathcal{M}'|<\mathbb{I}(P,W)+2\tau\]

Error Exponent of Constant Composition Code

We know that for rate region described by the above result, error tends to zero for sufficiently long \(n\), however, it can be shown that even the rate of those convergence depend on type only. We provide both lower and upper bounds on error and error exponent. Note that an upper bound on error is a lower bound on error exponent, since the exponent involves the negative sign.

Lower Bound

The following statement is made in Csiszar’s coding theorems for discrete memoryless channels. For every \(R > \lambda > 0\) and every type \(P\) of sequence in \(\mathcal{X}^n\), there exists an \(n\)-length block code \((f,\psi)\) of rate

\[\frac{1}{n}\log|\mathcal{M}'|\geq R-\delta,\]

such that all codewords \(f(m),m\in \mathcal{M}'\) are of type \(P\) and the error can be bounded as:

\[\begin{equation} e(W^n,f,\psi)\leq e^{-n((\underset{V}{\mathrm{min}}\,\,D(V||W|P)+|\mathbb{I}(P,V)-R|^+)-\delta)}. \end{equation}\]

Here, \(E_r(R,P,W) = \underset{V}{\mathrm{min}}\,\,D(V||W|P)+|\mathbb{I}(P,V)-R|^+\) is known as random coding error exponent of channel \(W\) with input distribution of type \(P\), and \(V:\mathcal{X}\rightarrow \mathcal{Y}\) ranges over all channels.

My opinion is that Csiszar’s proof is very obscure and therefore I will restate the main results using Gallagher’s proof.

Assume we have a fixed composition code of composition \(Q\), and we define the typical set \(\mathcal{T}^n_{Q_{X}}\) to be its composition. In other words

\[|\mathcal{T}^n_{Q_X}|=\exp\{n[H(Q)-\delta_{n}]\}\]

for \(\delta_{n}\rightarrow 0, n\rightarrow \infty\). The probability of decoding error for the \(m\)-th codeword in a code \(\mathcal{C}\) is defined using maximum likelihood decoder as

\[P_{e}=\underset{m}{\mathrm{max}}\,\,P_{e,m}\]

We know that a code \(|\mathcal{C}|=M\) can be expurgated into a code of size \(\frac{M}{2}\) by removing the half worst performing codewords, with a resulting maximum probability of error at most twice the average of original code. Hence, the distinction between using maximum or average error probability is minimal. If we define \(\mathcal{T}^n_{Q_{X}\hat{P}_{Y|X}}\) to be the joint type of \(X,Y\), then we have

\[|\mathcal{T}^n_{Q_{X}\hat{P}_{Y|X}}|=\exp\{n[H(Q,\hat{P})-\delta_{n}]\}.\]

For a given input \(\mathbf{x}\) of type \(Q\), we have the conditional type in \(\mathbf{y}\) such that \((x,y)\in \mathcal{T}^n_{Q_{X}\hat{P}_{Y|X}}\) as \(\mathcal{T}^n_{\hat{P}_{Y}|Q_{x}}\), then assume that \(\mathbf{x}\in \mathcal{T}_{Q_{X}}^n\) and \(\mathbf{x}'\in \mathcal{T}_{Q_{X}}^n\), the same permutation on \(\mathbf{x}\) that results in \(\mathbf{x}'\) can permute \(\mathbf{y}\) into \(\mathbf{y}\in \mathcal{T}^n_{\hat{P}_{Y}|Q_{x'}}\) it can be seen that \(|\mathcal{T}^n_{\hat{P}_{Y}|Q_{x}}| = |\mathcal{T}^n_{\hat{P}_{Y}|Q_{x'}}| \forall \, \mathbf{x}\in \mathcal{T}^n_{Q_{X}}\). Hence we can conclude

\[|\mathcal{T}^n_{\hat{P}_{Y}|Q_{x}}| = \frac{|\mathcal{T}^n_{Q_{X}\hat{P}_{Y|X}}|}{|\mathcal{T}_{Q_{X}}^n|}.\]

Note that the left hand side is the conditional type, for instance \(|\mathcal{T}^n_{\hat{P}_{Y}}| = \frac{|\mathcal{T}^n_{Q_{X}\hat{P}_{Y|X}}|}{|\mathcal{T}_{Q_{X}}^n|}\) is not true when \(X\) and \(Y\) are correlated sources. The immediate result is

\[|\mathcal{T}^n_{\hat{P}_{Y}|Q_{x}}|=\exp\{n[H(Q,\hat{P})-H(Q)-\delta_{n}]\},\]

which isn’t anything new. But this gives us an operational interpretation on the elements of \(\mathcal{T}^n_{\hat{P}_{Y}|Q_{x}}\) as the set of output codewords arising from \(\mathbf{x}\) under a given noise composition \(\hat{P}\).

Now let’s introduce the discrete memoryless channel defined by some transition matrix \(P\), the following relation can be established:

\[P(\mathbf{y}|\mathbf{x})=P(y_{1}|x_{1})\dots P(y_{n}|x_{n})=\exp\left\{ n\left[ \sum_{k,j}Q(x_{k})\hat{P}(y_{j}|x_{k}) \ln P(y_{j}|x_{k}) \right]\right\}.\]

Note that we use \(\mathrm{Pr}(.)\) to mean general probability and the left hand side of the above equation means the probability of receiving \(\mathbf{y}\in \mathcal{T}^n_{\hat{P}_{Y}|Q_{x}}\) given that \(\mathbf{x}\in \mathcal{T}_{Q_{X}}^n\) is transmitted over a channel with transition matrix \(P\). Thus,

\[P[\mathbf{y}\in \mathcal{T}^n_{\hat{P}_{Y}|Q_{x}}|\mathbf{x}]=|\mathcal{T}^n_{\hat{P}_{Y}|Q_{x}}|P(\mathbf{y}|\mathbf{x}) = \exp\left\{ n\left[ \sum_{k,j}Q(x_{k})\hat{P}(y_{j}|x_{k}) \ln \frac{P(y_{j}|x_{k})}{\hat{P}(y_{j}|x_{k})} \right] -\delta_{n}\right\}.\]

This can be equivalently written as:

\[\begin{equation} P[\mathbf{y}\in \mathcal{T}^n_{\hat{P}_{Y}|Q_{x}}|\mathbf{x}] = \exp\{-n[D(\hat{P}||P|Q)+\delta_{n}]\}, \end{equation}\]

where \(D(\hat{P}||P|Q) = \sum_{k,j}Q(x_{k})\hat{P}(y_{j}|x_{k}) \ln \frac{\hat{P}(y_{j}|x_{k})}{P(y_{j}|x_{k})}\) and we have dropped the subscripts on probability for clarity.
The next step of argument requires a new definition, define \(Y \sim Q_X \circ P_{Y|X} \triangleq \sum_{x}Q_{X}(x)\hat{P}_{Y|X}(y|x)\), note that this is different from the definition of joint type as the summation are over both \(X\) and \(Y\). The operational meaning is the output composition. Assume we have some noise composition \(\hat{P}\) for which \(|\mathcal{T}^n_{\hat{P}_{Y}|Q_{x}}|\geq \frac{2}{|\mathcal{C}|}|\mathcal{T}^n_{Q_X \circ P_{Y|X}}|\) (it will become clear what this assumption is made). Since the total output composition cardinality is \(|\mathcal{T}^n_{Q_X \circ P_{Y|X}}|\), and there are a total of \(|\mathcal{C}|\) codewords, hence the average message \(m\) has \(\frac{|\mathcal{T}^n_{Q_X \circ P_{Y|X}}|}{|\mathcal{C}|}\) corresponding outputs. This is equivalent to say that there exists a \(m\) for which at most \(\frac{|\mathcal{T}^n_{Q_X \circ P_{Y|X}}|}{|\mathcal{C}}|\) outputs are decoded into \(m\). Now it becomes clear that the assumption \(|\mathcal{T}^n_{\hat{P}_{Y}|Q_{x_{m}}}|\geq \frac{2}{|\mathcal{C}|}|\mathcal{T}^n_{Q_X \circ P_{Y|X}}|\) means there exists some \(m\) such that half of \(|\mathcal{T}^n_{\hat{P}_{Y}|Q_{x_{m}}}|\) are not decoded into \(m\). Recall that the error is defined as the maximum over all realizations of codewords, since for some \(m\), half of the decoding results in error, we have:

\[\begin{equation} P_{e}\geq \frac{1}{2}P[\mathcal{T}^n_{\hat{P}_{Y}|Q_{x}}|\mathbf{x}]. \end{equation}\]

The graphical understanding of this result is that if the spheres of output sets \(\mathcal{T}^n_{\hat{P}_{Y}|Q_{x}}\) are too large, they can not be packed disjointly into the output composition set \(\mathcal{T}^n_{Q_X \circ P_{Y|X}}\), which means decoding error.

This is reason why this lower bound on error exponent is called sphere packing bound, and here we present a bit of a digression to explore this interpretation. The information-theoretic understanding of this packing can be understood using the following formula:

\[\frac{|\mathcal{T}^n_{Q_X \circ P_{Y|X}}|}{|\mathcal{T}^n_{\hat{P}_{Y}|Q_{x}}|}= \exp\left\{ n\left[ \sum_{k,j}Q(x_{k})\hat{P}(y_{j}|x_{k}) \ln \frac{\hat{P}(y_{j}|x_{k})}{\sum_{k}Q(x_{k})\hat{P}(y_{j}|x_{k})} \right] -\delta_{n}\right\}.\]

Now, the term in the exponent should be familiar, if we expand the mutual information between \(X\) and \(Y\), given that they are of composition \(Q\) and \(\hat{P}\):

\[\mathbb{I}(Q_{X},\hat{P}_{Y|X}) = \sum_{x}Q(x_{k})\sum_{y}\hat{P}(y_{i}|x_{k})\ln \frac{\hat{P}(y_{i}|x_{k})}{\hat{P}(y_{j})},\]

from which it should be obvious that

\[\frac{|\mathcal{T}^n_{Q_X \circ P_{Y|X}}|}{|\mathcal{T}^n_{\hat{P}_{Y}|Q_{x}}|} = \exp\{n[\mathbb{I}(Q_{X},\hat{P}_{Y|X})-\delta_{n}]\}\]

Thus the assumption we made earlier \(|\mathcal{T}^n_{\hat{P}_{Y}|Q_{x_{m}}}|\geq \frac{2}{|\mathcal{C}|}|\mathcal{T}^n_{Q_X \circ P_{Y|X}}|\) can be written as \(R\geq \mathbb{I}(Q_{X},\hat{P}_{Y|X})+\delta_{n}+ \frac{\ln 2}{n}\)

Now back to the main argument, combining equation (3) and (4), we see that the error is bounded as:

\[P_{e}\geq \exp\{-n[E_{sp}(R+\delta_n,Q_{X},P_{Y|X})-\delta_{n}]\},\]

where the sphere packing error exponent is

\[E_{sp}(R,Q_{X},P_{Y|X}) = \underset{\hat{P}:\mathbb{I}(Q_{X},\hat{P}_{Y|X})\leq R}{\mathrm{min}}D(\hat{P}||P|Q).\]

The condition \(\mathbb{I}(Q_{X},\hat{P}_{Y|X})\leq R\) seems counterintuitive, but note that \(\hat{P}\) is not the channel transition matrix, it can be seen that if \(\mathbb{I}(Q_{X},P_{Y|X})\leq R\) , \(E_{sp}(R,Q_{X},P_{Y|X})\) is zero, because we can choose \(\hat{P} = P\) , since \(R\leq \mathbb{I}(Q_{X},P_{Y|X})\) is required to have any rate, the violation of which means error never decreases with zero exponent.

Upper Bound

Next, we try to establish the upper bound on the error exponent for constant composition codes. Now we define a new typical set \(\mathcal{T}^n_{Q_{X}|P'_{y}}\), which the set of all input codewords in composition \(Q\) that can be decoded into \(\mathbf{y}\) in the presence of some noise composition \(P'\). For some message \(m\), codeword \(\mathbf{x}_{m}\) and received output \(\mathbf{y}\), there is an error if there is some other codeword \(\mathbf{x}_{m'}\) in the set \(\mathcal{T}^n_{Q_{X}|P'_{y}}\) such that \(P(\mathbf{y}|\mathbf{x}_{m'})\geq P(\mathbf{y}|\mathbf{x}_{m})\), which is to say

\[\begin{equation} \exp\left\{ n\left[ \sum_{k,j}Q(x_{k})P'(y_{j}|x_{k}) \ln P(y_{j}|x_{k}) \right]\right\}\geq \exp\left\{ n\left[ \sum_{k,j}Q(x_{k})\hat{P}(y_{j}|x_{k}) \ln P(y_{j}|x_{k}) \right]\right\}. \end{equation}\]

Define \(P(\hat{P})\) as the set of noise compositions \(P'\) such that (5) is satisfied, the error probability is then

\[Pr(\mathrm{error}|m,\mathbf{x}_{m},\mathbf{y})\leq \sum_{m'\neq m}\sum_{P'\in P(\hat{P})}Pr[\mathbf{x}_{m'}\in \mathcal{T}^n_{Q_{X}|P'_{y}}].\]

Given that \(m'\) is randomly selected out of \(\mathcal{T}^n_{Q_{X}}\), we have

\[Pr[\mathbf{x}_{m'}\in \mathcal{T}^n_{Q_{X}|P'_{y}}]=\frac{|\mathcal{T}^n_{Q_{X}P'_{Y|X}}|}{|\mathcal{T}^n_{Q_{X}}||\mathcal{T}^n_{Q_X \circ P_{Y|X}}|}\leq \exp \{-n\mathbb{I}(Q_{X},P'_{Y|X})\}.\]

By type counting lemma, there are fewer than \((n+1)^{|\mathcal{X}||\mathcal{Y}|-1}\) joint composition of \(Q P'\) , and there are \(|\mathcal{C}|-1 = e^{nR}-1\) choices of \(m'\neq m\) , we bound the error as:

\[Pr(\mathrm{error}|m,\mathbf{x}_{m},\mathbf{y})\leq (e^{nR}-1)(n+1)^{|\mathcal{X}||\mathcal{Y}|-1}\underset{P'\in P(\hat{P})}{\mathrm{max}}Pr[\mathbf{x}_{m'}\in \mathcal{T}^n_{Q_{X}|P'_{y}}]\leq (n+1)^{|\mathcal{X}||\mathcal{Y}|-1} e^{-n[\mathbb{I}(Q_{X},P'_{Y|X})-R]}.\]

Define \(\delta_{n}=\frac{|\mathcal{X}||\mathcal{Y}|-1}{n}\ln(n+1)\) , which goes to zero as \(n\rightarrow \infty\) , we have

\[Pr(\mathrm{error}|m,\mathbf{x}_{m},\mathbf{y})\leq e^{-n[\mathbb{I}(Q_{X},P'_{Y|X})-R-\delta_{n}]}.\]

Taking the maximum inside and recall that \(P_{e,m}\leq \sum_{\hat{P}}Pr(\mathrm{error}|m,\mathbf{x}_{m},\mathbf{y}\in \mathcal{T}^n_{\hat{P}_{Y}|Q_{x_{m}}})\exp\{-nD(\hat{P}||P|Q)\},\) and finally we have

\[P_{e}\leq \exp[-n(E_{r}(R,Q_{X},P_{Y|X})-\delta_{n})],\]

where

\[E_{r}(R,Q_{X},P_{Y|X}) = \underset{\hat{P}}{\mathrm{min}}(D(\hat{P}||P|Q)+[\mathbb{I}(Q_{X},\hat{P}_{Y|X})-R]^+),\]

is the random coding error exponent, which provides an upper bound on the error exponent of constant composition code. Once again, an upper bound on error is a lower bound on error exponent, since the exponent involves the negative sign. Hence we have the relation

\[E_{r}(R,Q_{X},P_{Y|X})\leq E \leq E_{sp}(R,Q_{X},P_{Y|X})\]

Conclusion

This result is useful in a lot of information-theoretic proofs as the error exponent of constant composition codes is also extensively discussed in literature. More recently, this result proves useful in joint sensing and communication as the sensing capacity is also heavily influenced by the type of the codeword.