Skip to main content

Full text of "Information inequalities and a dependent Central Limit Theorem"

See other formats


Information inequalities and a dependent 
Central Limit Theorem 

Oliver Johnson 
September 5th 2001 

Abstract 

We adapt arguments concerning information-theoretic convergence 
in the Central Limit Theorem to the case of dependent random vari- 
ables under Rosenblatt mixing conditions. The key is to work with 
random variables perturbed by the addition of a normal random vari- 
able, giving us good control of the joint density and the mixing coef- 
ficient. We strengthen results of Takano and of Carlen and Soffer to 
provide entropy-theoretic, not weak convergence, f f | 

1 Introduction and notation 

Under a variance constraint, entropy is maximised by the Gaussian. It is 
natural to consider whether entropy converges to this maximum in the Cen- 
tral Limit Theorem regime. This is a strong sense of convergence, and is 
discussed by Brown [3J, Barron [1] and Johnson [7j. These papers only deal 
with the case of independent random variables, [3] and [1] in the case of 
identically distributed variables, and [7j for non-identical variables satisfying 
a Lindeberg-like condition. This paper extends these techniques to weakly 
dependent random variables. 



Key words: Normal Convergence, Entropy, Fisher Information, Mixing Conditions 
AMS 1991 subject classification: 60F05, 94A17, 62B10 

Address: O.T.Johnson, Statistical Laboratory, Centre for Mathematical Sciences, 
Wilberforce Road, Cambridge, CB3 OWB, UK. Contact Email: otj lOOOScam. ac .uk. 



1 



Takano [12], [13] considers the entropy of convolutions of dependent random 
variables, though he imposes a strong (54-mixing condition (see Definition [53]) • 
Carlen and Soffer also use entropy-theoretic methods in the dependent 
case, though the conditions which they impose are not transparent. Takano, 
in common with Carlen and Soffer, does not prove convergence in relative 
entropy of the full sequence of random variables, but rather convergence of 
the 'rooms' (in Bernstein's terminology), equivalent to weak convergence of 
the original variables. Our conclusion is stronger. In a previous paper [8], we 
used similar techniques to establish entropy-theoretic convergence for FKG 
systems, which whilst providing a natural physical model, restrict us to the 
case of positive correlation. 

We will consider a doubly infinite stationary collection of random variables 
. . . , X_i,Xq, Xi, X2, . . ., with mean zero and finite variance. We write Vn for 
Var {Ylt=i -^i) = {J2'i=i ^i)/ V^- We will consider perturbed random 

variables V^^^ = (Eti^i + ~ t^n + Z(-\ for a sequence of 

A^(0, r) independent of Xi and each other. In general, Z^^^ will be a iV(0, s). 
If the limit J^JL-oo Cov(Xo,Xj) exists then we denote it by v. 

Definition 1.1 Given two random variables S,T, the a-mixing coefficient 
is defined to be: 

a{S, T) = sup |P((^ e A)n{T e B)) - F{S e A)F{T e B)\ . 

A,B 

If is the a-field generated by X^, Xa+i, . . . , (where a or h can be infi- 
nite), then for each t, define: 

a(t) = sup {a(5,T) : 5 G TP_^,T e T,f] , 

and define the process to be a-mixing if a{t) — as t ^ 00. 

See Bradley [2J for a discussion of the properties and alternative definitions 
of mixing coefficients. Note that a-mixing is sometimes referred to as strong 
mixing, and is implied by uniform mixing (control of |P(y4|i?)—P(y4)|, equiva- 
lent to the Doeblin condition for Markov chains). All m-dependent processes 
are a-mixing, as well as any stationary, real aperiodic Harris chain (which 
includes every finite state irreducible aperiodic Markov chain). 

Definition 1.2 For a random variable U with smooth density p, we consider 
the score function p{u) = p'{u)/p{u), the Fisher information J{U) = Kp'^{U), 
and the standardised Fisher information Jst{U) = afjJ{U) — 1. 



2 



We continue the technique used to prove convergence in relative entropy 
first developed by Barron [T], and later adapted to the non-identical case by 
Johnson [7]. That is, we use de Bruijn's identity: 



Lemma 1.3 If U is a random variable with density f and variance 1, and 
Z^^^ is a sequence of normals independent of U , then the relative entropy 
distance D between f and the standard Gaussian density (p is given by: 

Our main theorems concerning strong mixing variables are as follows: 

Theorem 1.4 Consider a stationary collection of random variables Xi, with 
finite (2 + 5)th moment. IfYlJLi a{jY^^'^^^^ < oo, then for any r > 0; 

lim JsM^^) ^ 0. 

n— >oo 

Note that the condition on the a{j) implies that Vn/n v < oo (see Lemma 
12.71) . In the next theorem, we have to distinguish two cases, where v = 
and where f > 0. For example, if Yj are IID, and Xj = Yj — Yj^i then 
Un = (Yi — Yn+i)/y/n Sq. Howcvcr, since we make a normal perturbation, 
we know that J,t(Ki^^) = K/^ + r) J(Ki^^) - 1 < K/n + r) J(Z(^)) - 1 = 
Vn/nr, so the case v = automatically works in Theorem II. 4[ 

We can provide a corresponding result for convergence in relative entropy, 
with some extra conditions: 

Theorem 1.5 Consider a stationary collection of random variables X^, with 
finite (2 + 5)th moment. If 

2- ^ = Er=-ooCov(Xo,Xj)>0 

3. IffN{r) = sup ) I ^ j.^^ ^^^^ J f^i^j--^^^ ^ 

n>N \ Vn + UT 



3 



then writing Qn for the density of {Y17=i -^i) / then: 

lim D{gj(t>)^0- 

n— >oo 

Proof Follows from Theorem 11.41 by a dominated convergence argument 
using de Bruijn's identity, Lemma [L3l □ 

Note that convergence in relative entropy is a strong result and implies con- 
vergence in and hence weak convergence of the original variables. 

Convergence of Fisher information, Theorem ll.4l is actually implied by Ibrag- 
imov's ^ classical weak convergence result. This follows since the density 
of Vn'^^ (and its derivative) can be expressed as expectations of a continu- 
ous bounded function of Shimizu [11] discusses this technique, which can 
only work for random variables perturbed by a normal. We hope our method 
may be extended to the general case, since results such as Proposition 13.21 do 
not need the random variables to be in this smoothed form. For example in 
the independent case, we show in a forthcoming paper that Jst{Un) — ^ 0, if 
J{Um) is finite for some m, and if is unimodal for infinitely many k (no 
normal perturbation is necessary). In any case, we feel there is independent 
interest in seeing why the normal distribution is the limit of convolutions, as 
the score function becomes closer to the linear case which characterises the 
Gaussian. 



2 Fisher Information and convolution 

Definition 2.1 For random variables X , Y with score functions px, pY, for 
any jS, we define p for the score function of y/j3X + ^yl — i3Y and then: 

A(x, Y,p) = E ( v^px(x) + v^r^py (r) - p ( v^x + v/r^y)) ' . 

Firstly, we provide a theorem which tells us how Fisher information changes 
on the addition of two random variables which are nearly independent. 

Theorem 2.2 Let S and T be random variables, with max(Var S, Var T) < 
Kr. Define X = S+Z^ andY = T+Z^^'^ (for Z^ and Z^ normal N{0,t) 



4 



independent of S, T and each other), with score functions px and py- There 
exists a constant C = C{K,T,e) such that: 

/3J(X) + (l-/5)J(r)-j(v^X+ v/r^r)+Ca(5,T)i/3- > A(X,r,/3). 

If S,T have bounded kth moment, we can replace 1/3 by k/{k + 4). The 
proof requires some involved analysis, and is deferred to Section [31 

In comparison, Takano [I2], [13] produces bounds which depend on 64^(8, T), 
where: 

Definition 2.3 For random variables S,T with joint density ps^T{s,t) and 
marginal densities ps{s) andpxit), define the 6n coefficient to be: 



6^{S,T)=i Ps{s)pT{t) 



P5,T(s,t) ^ 



Ps{.s)pT{t) 



n \ 1/" 



dsdt I 



In the case where 5*, T have a continuous joint density, it is clear that Takano's 
condition is more restrictive, and lies between two more standard measures 
of dependence: 

P{Ar}B) ^ 



P{A)P{B) 



Aa{S,T) < 64{S,T) < 6^{S,T) = ij{S,T) = sup 

A,B 

(as before see Bradley [2J for a discussion of different mixing conditions). 

Another use of the smoothing of the variables allows us to control the mixing 
coefficients themselves: 

Theorem 2.4 For S and T , define X = S + Z^g^ and Y = T + z!f\ where 
max(Var S, Var T) < Kr. If Z has variance e, then there exists a function 
fx such that 

a{X + Z,Y)<a{X,Y) + fK{e), 
where fxi^) as e ^ 0. 

Proof See Section H □ 



To complete our analysis, we need lower bounds on the term A(X, Y, (3). For 
independent X, Y it equals zero exactly when px and py are linear, and if 
it is small then px and py are close to linear. Indeed, in [7] we make two 
definitions: 



5 



Definition 2.5 For a function ip, define the class of random variables X 
with variance Vx such that: 



C^ = {X : EXH{\X\ > R^) < VxHR)}. 
Further, define a semi-norm \\ \\q on functions via: 



inf E(/(Z(^/2))-aZ(^/2)_^y 

a,b 



Combining results from previous papers we obtain: 

Proposition 2.6 For S and T with max(Var S, Var T) < Kr, define X = 
S + Zlr\ Y = T + ZP. For any if), 5 > there exists a function v = i'^^s,k,t, 
with z/(e) ^0 as e — s> 0, such that if X,Y G C^, and /? G (5, 1 — 5) then 

UX)<u{AiX,Y,P)). 

Proof We reproduce the proof of Lemma 3.1 of Jolmson and Suiiov [9], 
wliicli implies p{x,y) > (exp(— 4i^')/4)0^/2(a;)0^/2(?/). This follows since by 
Chebyshev / l{s^+t^ < 4:KT)dFs,T{s,t) > 1/2, and since {x-s f < 2x^+2s^: 

pix,y) = j (pr{x - s)(pr{y -t)dFs,Tis,t) 

> ^mm{(f)r{x - s)^r{y -t) : + t'^ <4:Kt} 

(Pr/2{x)4>r/2{y) ( . ( - - 

= exp mm < 

4 \s^+t^<4KT [ T 

Hence writing /i(a;, y) = ^/^px{x)+^/T^^pY{y)-p {y/Px + ^/T^^y) , then: 
A{X,Y,P) = j p{x,y)h{x,yfdxdy 

^ ^^^16^^^ / <Pr/2ix)(f)r/2{y)h{x,y)^dxdy 



6 



by Proposition 3.2 of Johnson [7]. The crucial result of [7j implies that for 
fixed ip, if the sequence X„ G have score functions p„, then ||pn||e ~^ 
implies that Jst(X„) — > 0. □ 

We therefore concentrate on random processes such that the sums (X1+X2 + 
. . . Xm) have uniformly decaying tails: 

Lemma 2.7 (Ibragimov, f6]) // {Xj} are stationary with E|Xp+^ < 00 
for some 5 > and Yl'jLi < 00, then 

1. {Xi + . . . Xm) belong to some class C^, uniformly in m. 

2. Vn/n ^v = ZT=-oo Cov(Xo, Xj) < 00. 

We are able to complete the proof of the CLT, under strong mixing condi- 
tions. 

Proof of Theorem 11.41 Combining Theorems 12.21 and 12. 4[ and defining 
^n'^^ = (Z]r=m+i + ^i^V v^) we obtain that for m > n, 

UVi%) < ^Jst(e^)+^Jst(e))+c(m)-A (vi^KV^^^K^) , 
m + n m + n \ m + n J 

where c(m) — > as m ^ 00. We show this using the idea of 'rooms and 
corridors' - that the sum can be decomposed into sums over blocks which 
are large, but separated, and so close to independence. For example, writing 
^ {J2T=%lX^)/V^+ Z(-/^\ Theorem Ea shows that 

w!c^'^) < aivi^!%, w^i^/^)) + fM^M) = «(v^) + /.(1/v^). 

In the notation of Theorem[221 c(m) = C{K,T/2,e){a{y/m)+fk{l/yM)y^^~'. 
We first establish convergence along the 'powers of 2 subsequence' Sk = V^k\ 
writing 5*^ for (XlLi + ^i^V since 

Jst{Sk+i) < JstiSk) + c{k) - A{Sk, Sk, 1/2) 

where c{k) 0. Then use an argument structured like Linnik's proof [10]. 
Given e, we can find K such that c{k) < e/2, for all k > K. Now 



7 



1. either for all k > K, 2c{k) < A{Sk, Sk, 1/2), and so 

Jst{Sk) — Jst{Sk+i) > A(S'fc, Sk, l/2)/2, 

so summing the telescoping sum, we deduce that J2k '^{^k, Sk, 1/2) is 
finite, and hence there exists L such that A{Sl, Sl, 1/2) < e. 

2. or for some L>K, 2c(L) > A{Sl, Sl, 1/2), then A{Sl, Sl, 1/2) < e. 

Thus, in either case, there exists L such that A{Sl, Sl, 1/2) < e, and hence 
by Proposition 12.6^ Jst{SL) < z^(e). 

Now, for any k > L, either Jst{Sk+i) < Jst{Sk), or A{Sk, Sk, 1/2) < c{k) < e. 
In the second case, Jst{Sk) < '^(e), so that Jst(S'A;+i) < z^(e) + e. In either 
case, we prove by induction that for all k > L, that Jst('S'fc+i) < z^(e) + e. 

We can fill in the gaps to gain control of the whole sequence, adapting the 
proof of the standard sub-additive inequality, using the methods described 
in Appendix 2 of Grimmett [3]. □ 



3 Proof of sub-additive relations 

This is the key part of the argument, proving the bounds at the heart of the 
limit theorems. However, although the analysis is somewhat involved, it is 
not technically difficult. 

We introduce notation where it will be clear whether densities and score 
functions are associated with joint or marginal distributions, by their num- 
ber of arguments: px{x) will be the score function of X, and p'x{x) the 
derivative of its density. For joint densities px,Y{x,y), PxYi^-iV) ^i^^ ^e the 
derivative of the density with respect to the first argument and p^xY^'^iV) ~ 
p^^Yi^,y) / px,Y{x,y), and so on. 

Note that a similar equation to the independent case tells us about the be- 
haviour of Fisher Information of sums: 

Lemma 3.1 If X , Y are random variables, with joint density p{x,y), and 
score functions p"xY '^^'^ PxV then X + Y has score function p given by 



p{z)=E p%{X,Y) 



X + Y 



E 



pfAx,Y) 



X + Y 



8 



Proof Since X -\-Y has density r{z) — J Px,y{z — y, y)dy, then: 

Hence dividing, we obtain that: 

r\z) J ' r\z) 
as claimed. □ 

For given a, 6, define the function M{x, y) — Mafi{x, y) by: 

M(x, y) = a (^p^Jy (a;, y) - px{x)^ + b {p%]y{x, y) - pviy)) , 

which is zero if X and Y are independent. Using properties of the perturbed 
density, we will show that if a{S, T) is small, then M is close to zero. 

Proposition 3.2 If X,Y are random variables, with marginal score func- 
tions px, Py, O'TT'd if the sum \^X + \/T^^Y has score function p then 

/3J{X) + (1 - /?) J(F) - J (V^X + - 

+2^P{1-P)Epx{X)py{Y) + 2EM^,^(X, Y)p{X + Y) 
= E(^y/PpxiX) + y/T^PYiY)-p(^y^X+y/T^Y^y 

Proof By the two-dimensional version of Stein's equation, for any function 

f{x, y) and for i = 1,2: 

Ep%y{X,Y)f{X,Y) = -E/«(x,y). 
Hence, we know that taking /(x, y) — p{x + y), for any a, b: 

E{apx{X) + hpY{Y))p{X + r) = (a + b)J{X + Y) - EM,,5(X, Y)p{X + Y). 

By considering J p{x, y) {apx{x) + bpyi^y) — (a + + y))^ dxdy, dealing 

with the cross term with the expression above, we deduce that: 

aV(X) + - (a + bfj{X + Y) 

+2abEpx{X)pY(Y) + 2(a + 6)EMa,b(X, Y)p{X + Y) 
= E iapx{X) + bpY{Y) - (a + + Y)f > 0. 



9 



As in the independent case, we can rescale, and consider X' = y/j3X, Y' = 
^/l — f3Y, and take a = (3,h = 1 — (3. Note that y/Ppx'iu) = px{u/y/P), 

Next, we require an extension of Lemma 3 of Barron [1] apphed to single and 
bivariate random variables: 

Lemma 3.3 For any S, T, define {X, Y) = {S + Zg'\ T + Z^'^) and define 
p(2T) density of {S + z'g^\ T + z!j^'^^). There exists a constant Cr,k = 

\/2(2fc/re)''/^ such that for all x, y: 

pP{x)\px{x)\' < c,,,p^'^\x) 

p^-\x,y)\p^i]y{x,y)\'' < c^,kP^'^\x,y) 

pM(x,i/)|pg^(x,i/)|'= < c^,,p^'-\x,y) 

and hence 



(E|,.,A-)|')"'</^'"2* 



re 



Proof We adapt Barron's proof, using Holder's inequality and the bound; 
{u/rYcpT-iu) < c^,fc02r(w) for all u. 

X — b 



T 



< E - S) (E0,(x - S)) 



k-l 



< Cr,km2r{x-S))px{xf-^ 

A similar argument gives the other bounds. □ 

Now, the normal perturbation ensures that the density doesn't decrease too 
large, and so the modulus of the score function can't grow too fast. 

Lemma 3.4 Consider X of the form X = S + Z^p, where Var S < Kr. // 
X has score function p, then for B > 1: 



10 



Proof As in Proposition 12.61 p{u) > {2exp2K) ^0t/2(w), so that for u G 
{-B^,By/?), {By/Tp{u)y^ < 2y/^exp{B^ + 2K)/B < 2^ exp{B^ + 2K). 
Hence for any A; > 1, by Holder's inequality: 



< ( r ^^p^S\2BV7) 

< \—^ \ k (2V2^exp{B^ + 2K)j exp(-l). 

Since we have a free choice of A; > 1 to maximise /cexp(w/fc), choosing k 
V >1 means that kexp{v/k) exp(— 1) = v. Hence we obtain a bound of 



piufdu <^(b^ + 2K + log(2v^) ) <^{3 + 2K). 

■B^ ^ ^ 



□ 



By considering S normal, so that p grows linearly with u, we know that the 
B^ rate of growth is a sharp bound. 

Lemma 3.5 For random variables S,T, let X = S + Z''^^ andY = Y + z!f\ 
define Lb = {\x\ < By/T,\y\ < By/T}. // max(Var S,Var T) < Kr then 
there exists a function fi{K, r) such that for B > 1: 

EMa,b{X,Y)p{X + Y)1{{XX) e Lb) < a{S,T)B\a + h)f^{K,T). 



Proof Lemma 1.2 of Ibragimov ^ states that if ^, u are random variables 
measurable with respect to A,B respectively, with |^| < Ci and \u\ < C2 
then: 

|Cov(e,z/)| <4CiC2a(AS). 

Now since |0r(w)| < l/VSvrr, and lucpriu) / t\ < exp(— l/2)/v27n^, we de- 
duce that: 

2 

\px,Yi^^y) -Px{x)pY{y)\ = |Cov(0^(x-S),0^(y-T))| < — a(S,T). 



11 



Similarly: 



Cov( (^)0,(x-S),0.(y-T) 



T 



By rearranging Ma j,, we obtain: 

, , 2a(S,T) fa + b 

Px,Yi^^y)\Ma,b{^,y)\ < + \apx{x) + opY[y)\ 

vrr V \/Te 



By Cauchy-Schwarz: 



< 



^^^^ y/32B^{3 + 2K){a + b) (^^^^ + ^16B^{3 + 2K)^ 



< a{S,T)B\a + b) 



40^2(3 + 2K) 



T 



This follows firstly since by Lemma 13.41 



px{,xfm,x,y) e LB)dxdy < (25v^) / px{xfdx < 16E^(3 + 2K) 
and by Lemma 13.41 

p(x + y) G LB)dxdy 

< jp{x + yfl{\x + y\< 2By/?)l{\y\ < By/T)dxdy 



□ 



< 2By/T / p{zfdz < 32B^(3 + 2K). 

J-2BJT 



Now uniform decay of the tails gives us control everywhere else: 

12 



Lemma 3.6 For S, T with mean zero and variance < Kt, let X = S + Zg'^ 
and Y = T + There exists a function /2(t, K, e) such that: 

f2{r,K,e) 



EMa,biX, Y)p{X + Y)mX, Y) ^ LB)dxdy < (a + b)- 



B 



2-e 



For S,T with kth moment (k > 2) bounded above, we can achieve a rate of 
decay ofl/B^^"^. 

Proof By Chebyshev P [{S + zf^\T + 4^"^) ^ L^)) < j p^^^\x, y){x^ + 
y^) / {2B\)dxdy < {K + 2)/B^ so by H51der-Minkowski for 1/p + 1/g = 1: 



Ep%iX, Y)piX + F)I((X, Y) i Lb) 

^Vp 



< {¥.\p%{X,Y)n{{X,Y)iLB)) '(E|p(X + F)|' 

< cl{^cl{^^¥ ({S + Zt\ T + )) i Lb)) 
^ 2v^exp(-l) ^[K + 2VI' 

By choosing p arbitrarily close to 1, we can obtain the required expression. 
The other terms work in a similar way. □ 



Similarly we bound the remaining product term: 



Lemma 3.7 For random variables S, T with mean zero and variances satis- 
fying max(Var S, Var T) < Kt, let X = S + Z^p and Y = T + Z^\ There 
exist functions f^ij^K) and fi{T^K) such that 

¥.px{X)py{Y) < fs{T, K)B^a{S, T) + ^(r, K) I B\ 



Proof Using part of Lemma [3.5[ we know that pxyi^iV) ~ Px{x)pY{y) < 
2a{S,T)/{TTT). Hence by an argument similar to that of Lemmas I3.6[ we 



13 



obtain that: 



EpxiX)pYiY) = j {px,Y{x,y) -px{x)pY{y)) px{x)pY{y)dxdy 
< / \px{x)\\py{yM{x,y)eLB)dxdy 



ITT 

+ j p{x,y)\px{x)\\pY{y)\'^{{x,y) ^ LB)dxdy 
+ / p{x)p{y)\px{x)\\pY{y)\l{{x,y) i LB)dxdy 



2 



ITT 



+2 ( / px,Y{Xiy)\Px{x)\'^l{{x,y) ^ LB)dxdy 



as required. □ 

Proof of Theorem 12.21 Combining Lemmas 13.51 13.61 and 13.71 , we obtain for 
given r, e that there exist constants Ci, C2 such that 

EM^^^p + ^l3{l-mpxPY < Cia{S,T)B^ + C2/B''-\ 

so choosing B = (l/4a(5', T))^/^ > 1, we obtain a bound of Ca{S,TY^^~^. 

By Lemma [3l6l note that if X, Y have bounded kth moment, then we obtain 
decay at the rate Cia{S,T)B'^ + C2/B^ , for any k' < k. Choosing B = 
a{S, T)-i/(fc'+4)^ we obtain a rate of a{S, Tf'/ik'+^), □ 



4 Control of the mixing coefficients 

To control a{X+Z, Y) and to prove Theorem 12. 41 we use truncation, smooth- 
ing and triangle inequality arguments similar to those of the previous sec- 
tion. Write for X + Z, Lb = {ix,y) : |x| < B^, \y\ < B^/t}, and R for 
Rn{-B^,B^). Note that by Chebyshev, F{{W,Y) e L%) < ¥{\W\ > 
5v/r) +P(|r| > B^) < 2{K+l)/B'^. Hence by the triangle inequality, for 



14 



any sets S, T: 



\^{{W,Y) e {S,T))-F{W e S)F(Y e T)| 

< \F{{W, Y) G {S, T) n Lb) - P(W^ G G T) | 
+¥{{W, Y) G L^) +^{\W\> 5v^)P(|F| > 

< \¥{{W, Y) G T)) - P((X r) G T)) 
+|p((x, r) G T)) - P(X G 5)P(r G T) I 

+ |P(X G 5^) - ¥iW G :S) |P(F G T) + ^^^^ 

< y \Pw,Y{w,y) - px,Y{w,y)\l.{{w,y) e LB)dwdy + a{X,Y) 

j \px{w)-pw{w)\t{\w\<B^)dw + ^^^^^ 



+ 



Here, the first inequality follows on splitting TZ^ into Lb and L^, the second 
by repeated application of the triangle inequality, and the third by expanding 
out probabilities using the densities. Now the key result is that: 

Proposition 4.1 For S and T, define X = S + Z^p and Y = T + , 
where max(Var S, Var T) < Kr. // Z has variance e, then there exists a 
constant C — C{B, K, r) such that: 

J \pw{w) -px{w)\l{\w\ < B^/^)dw < (exp(Ce^/^) - 1) + 2e^/^ 

Proof We can show that for \z\ < 6'^ and \x\ < Byjr: 

Px,z{x - z,z) 



Px,z{x,z) 

< exp I I /" ^ p^iKi'^,zfdu 1 S 




2BV-r ^ 



15 



by adapting Lemma [231 to cover bivariate random variables. Hence we know 
that: 

J \pw{w) — pxiw)\l{\w\ < By/T)dw 

< j \px,z{w - z,z) -px,z{w,z)\l.{\z\<5'^,\w\<B^)dzdw 
+ j \px,z{w - z,z) -px,z{w,z)\'S.{\z\>5'^)dwdz 

< j Px,ziw,z){expC5-l)dwdz + 2F{\Z\>5^) 

< {expC6 -1) + 2F{\Z\> 6^) 

Thus choosing S = e^/^, the result follows. □ 

Similar analysis allows us to control 

IPwxi^^y) -Px,y(w^,2/)|]I((w,?/) G LB)dwdy. 



Acknowledgements 

The author is a Fellow of Christ's College Cambridge, who funded travel and 
research expenses. I also thank Yuri Suhov of the Statistical Laboratory for 
useful discussions, and the anonymous referee for helpful comments. 



References 

[1] A.R. Barron. Entropy and the Central Limit Theorem. Annals of Prob- 
abzHty, 14:336-342, 1986. 

[2] R.C. Bradley. Basic properties of strong mixing conditions. In E. Eber- 
lein and M. Taqqu, editors. Dependence in Probability and Statistics, 
pages 165-192. Birkhauser, Boston, 1986. 

[3] L.D. Brown. A proof of the Central Limit Theorem motivated by the 
Cramer-Rao inequality. In G. Kallianpur, P.R. Krishnaiah, and J.K. 



16 



Ghosh, editors, Statistics and Probability: Essays in Honour of C.R. 
Rao, pages 141-148. North- Holland, New York, 1982. 

[4] E.A. Carlen and A. Soffer. Entropy production by block variable sum- 
mation and Central Limit Theorems. Communications in Mathematical 
Physics, 140:339-371, 1991. 

[5] G.R. Grimmett. Percolation (Second Edition). Springer- Verlag, Berlin, 
1999. 

[6] LA. Ibragimov. Some hmit theorems for stationary processes. Theory 
of Probability and Rs Applications, 7:349-381, 1962. 

[7] O.T. Johnson. Entropy inequalities and the Central Limit Theorem. 
Stochastic Processes and Their Applications, 88:291-304, 2000. 

[8] O.T. Johnson. Entropy and FKG random variables. Submitted to Com- 
munications in Mathematical Physics, 2001. 

[9] O.T. Johnson and Yu.M. Suhov. Entropy and random vectors. To appear 
in Journal of Statistical Physics, 104, 2001. 

[10] Yu.V. Linnik. An information-theoretic proof of the Central Limit The- 
orem with the Lindcbcrg Condition. Theory of Probability and Rs Ap- 
plications, 4:288-299, 1959. 

[11] R. Shimizu. On Fisher's amount of information for location family. 
In G.P.Patil et al, editor, Statistical Distributions in Scientific Work, 
Volume 3, pages 305-312. Reidel, 1975. 

[12] S. Takano. The inequalities of Fisher Information and Entropy Power for 
dependent variables. In S. Watanabe, M. Fukushima, Yu.V Prohorov, 
and A.N. Shiryaev, editors. Proceedings of the 7th Japan-Russia Sym- 
posium on Probability Theory and Mathematical Statistics, Tokyo 26-30 
July 1995, pages 460-470, Singapore, 1996. World Scientific. 

[13] S. Takano. Entropy and a limit theorem for some dependent variables. 
In Prague Stochastics '98, volume 2, pages 549-552. Union of Czech 
Mathematicians and Physicists, 1998. 



17