# Full text of "Convergence of the Poincare Constant"

## See other formats

```Convergence of the Poincare constant

Oliver Johnson
February 1, 2008

Abstract

The Poincare constant Ry of a random variable Y relates the
L 2 (Y)-norm of a function g and its derivative g' . Since Ry — Var (Y)
is positive, with equality if and only if Y is normal, it can be seen as a
distance from the normal distribution. In this paper we establish the
best possible rate of convergence of this distance in the Central Limit
Theorem. Furthermore, we show that Ry is finite for discrete mix-
tures of normals, allowing us to add rates to the proof of the Central
Limit Theorem in the sense of relative entropy.

1 Introduction and results

Poincare (or spectral gap) inequalities provide a relationship between L 2
norms on functions and their derivatives.

Definition 1.1 (Borovkov and Utev) Given a random variable Y , define
the Poincare constant Ry:

Varfl(r)

where Hi (Y) is the space of absolutely continuous functions on the real line
such that Var g(Y) > and Eg'(y) 2 < oo.

Key words: Poincare constant, spectral gap, Central Limit Theorem, Fisher Infor-
mation

AMS 2000 subject classification: 60E15, 60F99

Address: Statistical Laboratory, Centre for Mathematical Sciences, Wilberforce
Road, Cambridge, CB3 OWB, United Kingdom. Email: otjl000@cam.ac.uk.

1

1 INTRODUCTION AND RESULTS

2

Ry will not in general be finite, however it will be finite for the normal
and other strongly unimodal distributions (see for example Klaasen (1985),
Chernoff (1981), Chen (1982), Cacoullos (1982), Nash (1958), Borovkov and
Utev (1984)).

We will exploit various relationships between the Poincare constant and
Fisher information:

Definition 1.2 For a random variable Y with smooth density p, define the
score function py(y) = p'(y)/p(y), o,nd Fisher information I(Y) = Ep{Y) 2 .

Notice that for a given Y, if g is a local maximum of Var g(Y)/(Eg'(Y) 2 )
then for all functions h and small t:

so multiplying out, < t 2 (R g Eh' 2 - Eh 2 ) + 2t(R g Eg'h' - Egh), which can
only hold in an interval around zero if:

Integration by parts implies therefore that g = —Rg{pyg' + g"), so local
maxima correspond to eigenfunctions of the Laplacian D Y g = (pyQ 1 + g"),
and the global maximum to the least strictly negative eigenvalue (hence the
alternative name of spectral gap inequality).

Example 1.3 The Poincare constant can be infinite. For example, consider
the discrete random variable, where P(X = 1) = P(X = —1) = 1/2. Then,
we can choose g such that g'(—l) = g'iX) = e, but g{— 1) = —1, g(l) = 1,
so that Var g(X) = 1, but Eg'(X) 2 = e 2 . This argument will work for any
discrete random variable, indeed any random variable whose support is not
an interval.

However, our first main result shows that discrete random variables per-
turbed by small normals have a finite Poincare constant:

Theorem 1.4 Consider X , a random variable with variance o 2 taking a
finite number of values with probabilities pi,P2, ■ ■ - Pn respectively, and Z T an

Var (g + th) Var (g)
E{g' + th') 2 ~ Eg' 2

= R,

RgEg'h' = Egh.

(1)

1 INTRODUCTION AND RESULTS

3

indepedent normal with mean zero variance r. Then Y T = X + Z T satisfies a
Poincare inequality with constant

Note that part 8 of Theorem 1.1 of Utev (1992) also shows that Ry T is
finite. However, our bound has an explicit dependence on a 2 , and so has
independent interest.

In a paper by Johnson and Barron (2002), we show that finiteness of the
Poincare constant gives an explicit rate of convergence of relative entropy
distance in the Central Limit Theorem. This is a strong result, and implies
convergence in L 1 .

Theorems 2, 3 and 4 of Borovkov and Utev provide the following results:
Lemma 1.5 For the constant Rx defined above:
1- Rax+b = a 2 R x

2. If X, Y are independent, then Rx+y < Rx + Ry

3. Rx > Var (X), with equality if and only if X is normal

4- If Rx is finite then Eexp(|X — KX\/12y/R x ) < 2, so X has moments

of all orders.

5. If i?x„/Var (X n ) — > 1, then Kw(X n ) — ► Kw(Z) ,where Z is normal, for
any continuous w with \w(t)\ < exp(c|t|) ; for sufficiently small c.

The first three properties are reminiscent of those of Fisher Information - a
subadditive relation holds and the minimising case characterises the normal
distribution. In analogy with the approach to the Central Limit Theorem
developed by Brown (1982) and Barron (1986), we add an extra term into
the subadditive relation Rx+y < Rx + Ry, which is sandwiched as conver-
gence occurs. This gives us an answer to the question posed by Chen and
Lou (1990), of identifying the limit of the Poincare constant in the Central
Limit Theorem. This was also answered by Utev (1992), though without the
explicit rate of convergence that we provide.

Proof See Section

□

1 INTRODUCTION AND RESULTS

4

Theorem 1.6 Consider X\, X2, ■ ■ ■ IID, with R = Rx t and I = I{X) finite.
Defining U n = (Xi+. . . X n ) / V no 2 , then there exists a constant C, depending
only on I and R, such that Ru n — 1 < C/n.

Proof See Section |. □

We can argue that this is the best possible rate, up to the choice of the
constant. Considering g(x) = x 2 — 1, we know that Rx > (EX 4 — l)/4.
Since EU% n - 1 = (Ef/ 4 - l)/2 > 0, if R Un - 1 = f(n)/n then f{2 k )/2 k =
R Uk -l> (EX 4 - l)/2 k , so f(2 k ) > EX 4 - 1 and hence does not tend to

y 2 k

zero.

Since we can perturb random variables by adding small normals to ensure
that the Fisher information is finite, we use this to prove a strong form of
the Central Limit Theorem.

Theorem 1.7 IfX-y, X2, ■ ■ ■ are IID random variables with mean 0, variance
a 2 and finite Rx = R, then U n = (Xi + . . . X n ) / v 'no 2 has the property that:

Ew{U n ) -> Ew(Z),

where Z is standard normal, for any continuous w such that w(t) < expc\t\,
where c < c = l/(12-/ff).

Proof Given a random variable U, define U T ~ U + Z T . Now, since Rjjt ^
Ru n + r < i? + r, by Lemma |T~5|.4, taking r such that c = l/(12\/i? + To):

Eexp(c|f4 r |)<2,

for all n, if r < r , c < c .

Now, since have uniformly bounded Fisher information I(U^) < I{Z T ) =
1/r, and Rut < R + r, Theorem [TB] implies that Kw(U^) — > Ew(Z T ).

Hence, since:

\Ew(U n ) -Ew(Z)\
< \Ew(U^) -Ew(U n ) \ + \Ew(U^) -Ew(Z T )\ + |Ew(Z) - Ew(Z T ) |,

we need only show that given e, |Ew(f/ T ) — Ew(U)\ < e for r small enough.
This follows by uniform integrability arguments (see Theorem 25.12 of Billings-
ley), since E|w(?7 r )| p < Eexpcp|C/ T | < 2, for some small p, and since w(U T )
converges weakly to w(U). □

2 FINITENESS OF R FOR MIXTURES OF NORMALS

5

2 Finiteness of R for mixtures of normals

Proof of Theorem |1.4| Without loss of generality, consider X taking a

finite number of values a\ > a 2 > ■ ■ ■ > a n with probabilities pi,P2, ■ ■ - Pn
respectively, where EX = J2iPi a i = 0-

We introduce the 'squared span' M = max(|a 2 — af|, |a| — a||, . . . |a^_ 1 —
a 2 |, (ai - a 2 ) 2 , (a 2 - a 3 ) 2 , . . . (a n -i - a n ) 2 ), and write p for min s p s .

By Theorem 1 of Borovkov and Utev, we need to check that for some R and
all x, the density f T of Y satisfies:

POO

/ yf T (y)dy<Rf T (x). (2)

J X

Since f T (y) = Y^iPi&Av ~ a i)i the LHS of Equation (Q) becomes (defining

U j = Yll=lPi a i and a n+l = -OO):

poo n poo n POO

/ Vfr(y)dy = ^Pi / (y - di)(f) T (y - a { )dy + ^2piCi t / 4> T (y
Jx - =1 i=1 A

n n poo

= ^2piT<f) T {x - di)dy + J^pa / <j) T {y)dy

i=l i=l ./ai-Oi

n n PX—C

= rf T {x) + ^2piai^2 I

i=i j=i Jx ~ a i

di)dy

(f) T (y)dy

j

n-l

" 1 I rx-a j+1 \

/ T (ac) / <Pr(y)dy ,

3=1

since u n = 0, so for each interval Ij = (x — aj , x — Oj+i) we need to consider
bounds on min^^ y 2 .

We write r for the index such that a r < x < a r _i.

First, we consider x > 0, where we can distinguish 3 cases: for j < r;
x — Oj+i < 0, so for y G If

y 2 > (x — cij + i) 2 = (x — dj) 2 + 2(cij — a,j + i)x + (a| +1 — a 2 ) = (x — a,) 2 — M.

For j = r; y G /j means that: y 2 > > (x — dj) 2 — (oj_i— a,,) 2 > [x—a^) 2 — M.
For j > r; x — > 0, so for y G J,: y 2 > (x — a,j) 2 .

2 FINITENESS OF R FOR MIXTURES OF NORMALS

6

Hence for all j, min ye j. y 2 > (x — aj) 2 — M, so:

4> T (y)dy < (aj — a,- + i) max <p T (y) < (aj — aj + i)(f) T (x — aj) exp(M/2r).

In Lemma 2A, we prove two technical results, that Uj(aj — aj + i) < a , and

that Mp < 2cr . This allows us to deduce that for x > 0:

n-l / -x— a.-,i \ n— 1

/ px-a J+1 \ »-x

^2 u j [ ^r(y)dy\ < exp(M/2r)^2uj(aj - a j+1 )(j) T (x - aj)

. I V J X — ai J

n-l

< exp(M/2r)^ a 2 (p T (x — aj)

3=1
n-l

< exp(a 2 /Tp)^2a 2 (pj/p)(f) T (x - aj),

3=1

as required.

Similarly, for x < 0, we deduce that for y e /j, min y6 j. y 2 > (x — %+i) 2 — M,
and thus:

1-1 / pX-Uj+l \ n ~i

J / <Pr(y)dy) < exp(M/2r)J2

Uj(aj — a j+1 )<j) T (x — a j+1 )

j= i \Jx- aj J j=1

< exp(a 2 /rp)^2a 2 (p j+1 /p)(j) T (x - a j+1 ).

3=1

□

Lemma 2.1 Using the notation above: Uj(aj — aj + i) < a 2 , and Mp < 2a 2 .

Proof Note that Uj = Y^i=\Pi a i = ~ Y!i=j+iPi a i-

For a j+1 > 0: Uj(aj - a j+1 ) < UjOj < Yli=iPi a i a j - Y^i=\Vi a \-

For aj < 0: Uj(aj - a j+1 ) < -Uja j+1 < J27=j+iPi a i a 3+i < J27=j+iPi a l

For a j+ i < < ay. Uj(aj - a j+1 ) < Y.'' i + Y!i=j+iPi a i a j+i -

En o
i= iPi a i-

3 CONVERGENCE OF THE POINCARE CONSTANT

7

For the second part, we consider two cases, firstly where M = a 2 s — a 2 s±1 . In
this case:

a 2 = ^Pta 2 > p s a 2 s = p s (a 2 s - a 2 s±1 ) + p s a 2 s±1 > pM.
t

Alternatively, if M — (a s — a s+ i) 2 then:

° 2 = \$>*a t 2 > p{a 2 s + a 2 s+1 ) > (p/2)(a s - a s+1 ) 2 > pM/2.
t

□

3 Convergence of the Poincare constant

We establish an explicit rate of convergence of the Poincare constant, using
projection inequalities similar to those in Johnson and Barron (2002)

Lemma 3.1 Given independent random variables X,Y with Poincare con-
stants Rx, Ry , for any function g:

Var g(X + Y) < (R x + R Y )Eg'(X + Y) 2 - ^-^-^ Var g\X + Y),
and hence Rx+y < Rx + Ry-

Proof Without loss of generality, we can consider g such that Kg(X+Y) = 0,
and define h(u) = W* Y g{u + Y), which thus also has mean zero. Now:

Var#(X + r) = Eg 2 (X + Y)

= E x {Eg 2 (X + Y)\X)

= E x V^(g(X + Y)\X) + E x (Eg(X + Y)\X) 2

< R Y Ex(E Y g'(X + Y) 2 \X)+Eh(X) 2

< R Y Eg' 2 (X + Y) + R x Eh'(X) 2 .

To consider the second term, we use the score function p Y and define:
f(x)=E Y [(g'(x + Y)-h'(x))p Y (Y)\,

3 CONVERGENCE OF THE POINCARE CONSTANT

8

where by the Stein equation, f(x) = — Ey(/'(x + Y) — —h"(x). Further, by
Cauchy- S chwarz :

Eh"(X) 2 = Ef(X) 2 < I(Y)E(g'(X + Y) - h'(X)) 2 ,

so that:

Var h'(X) < R x Eh"(X) 2 < R X I(Y) (Eg'(X + Y) 2 - Eh'(X) 2 ) ,
and writing fi = Eh'(X) = Eg'{X + Y), we obtain:

Eti(X) 2 {l + R X I(Y)) < R x I(Y)Eg'(X + Y) 2 + /i 2 ,
which, rearranging, leads to:

E/i'(V) 2 < E#'(X + Y) 2 -

RxKY) + 1

□

Next we need a Lemma which again uses the idea that if g' is nearly con-
stant, then g is close to linear. We'd like to apply it to the optimal g, which

achieves the maximum in Definition 1.1. However, rather than use compact

ness arguments to show such a function exists, we can instead use a 'good' g

Lemma 3.2 For any random variable W with mean zero, and any function
g such that R(t) = Var (g(W) + tW)/E(g'(W) +t) 2 has a local maximum at

Eg'(W) 2 J ~ V Eg'(W) 2 '

Proof Without loss assume that Eg(W) = 0, and write \i = Eg'(W) and
5 g = Var g'(W)/Eg'(W) 2 = 1 - ^/Eg'{W) 2 implies:

Eg 2 (W)

= E(g(W) - nW) 2 + 2fjEW{g(W) - /jW) + fi 2 EW 2

< E(g(W) - fiW) 2 + 2\fi\ VVar (W)^E(g(W) - fiW) 2 + /i 2 Var (W)

< (Eg'(W) 2 ) [R w \6 g + 2^5 g (l - 5 g ) ) + (1 - <y Var (W)

< (Eg'(W) 2 ) (m w ^/5~ g + Var (W)

3 CONVERGENCE OF THE POINCARE CONSTANT

9

since E{g{W)-^W) 2 < R w E(g'(W)-fi) 2 = R w Var (g'(W)) < R w 5 g Eg\W) 2 ,
and since by Lemma [I7|.3, Var (W) < Rw- D

Note, we can come up with tighter bounds: for example taking h(W) = W
in Equation ( TO, EWg ( W) = R(0)fi. Hen ce fi( R(0) - Var (W)) = E(g (W) -
yW)W < VVar (W)y/E(g(W) - fiW) 2 < ^/R w Var (W)Var g'(W). This
implies that:

R(0) - Var (W) < R

w t

s 9

1-S g

However, Lemma |372| is sufficient for our purposes.

Proof of Theorem O) We consider convergence along the 'powers of 2'
subsequence S k = U 2 k, which implies convergence for the whole sequence by

For all k, 1 < R Sk < R, and I Sk < I. Taking X = S k /^2 and Y = S' k /V2 (an
identical copy) in Lemma |5TT] implies (since R Sk /^/2 = Rs k /2 an d I(S k /\^2) =
21 (Sk)) that for any g:

Varg'(S k+1 ) rfv^fn ^ar g(S k+1 ) \

Now, given W = S k+ \, we can find h such that Var h/Eh' 2 > max(R —
e, Var (W)). Since Var (h(W) + tW)/E(h' + 1) 2 tends to Var (W) at ±oo,
and has one maximum to and one minimum, we can find g(W) = h(W)+toW,

which satisfies the conditions of Lemma 3.2

((Rs k+1 - e - 1) + ) 2 < ( ^y^J - I)' < 9Rl k J g < C(R Sk -R Sk+1+ e),

where x+ = max(x, 0) and C = 18R 2 Sk+i (l/ R Sk + I(S k )) < 18R(IR+ 1).
That is, since e is arbitrary,

(R Sk+1 -l) 2 <C(R Sk+1 -R Sk ). (3)

Note that since Rs k is decreasing and bounded below, successive differences
tend to zero, and thus Rs k — > 1.

To obtain a rate, write u k = (Rs k — Equation (|3|) gives u k (l + u k ) <

itfc-i. Since u k are decreasing: . . . m^ +2 < w^+i — u n — u n-i ~u n , and hence:

REFERENCES

10

u\ < - u n , u\ < m„_ 2 - itti-1, < m„_ 3 - M n _ 2 and so on. Summing,
we obtain that for m < n: (n — m)v? n < u m — u n < u m . Taking n = 2 r ,
m = 2 r_1 implies that:

/U 2 r-1

Repeating this times, we deduce that (since u k < u± < 1):

U2r < 2 -Er=i(r-i)/^ = 2 -r+2 2 (r-2-iV)/2^ j

so if = r — 2, then < 4/2 r , and we can 'fill in the gaps' by subadditivity,
to show that u k < 16/k for all k. □

Acknowledgements

The author is a Fellow of Christ's College, Cambridge, who helped support
a trip to Yale University during which many useful discussions with Andrew
Barron took place. Yurii Suhov of Cambridge University and Alexander Hol-
royd of UCLA provided useful advice, and the anonymous referee provided
several extremely useful improvements to the proofs.

References

[1] A.R. Barron. Entropy and the Central Limit Theorem. Annals of Prob-
ability, 14:336-342, 1986.

[2] P. Billingsley. Convergence of Probability Measures. John Wiley, New
York, 1968.

[3] A. A. Borovkov and S.A. Utev. On an inequality and a related char-
acterisation of the normal distribution. Theory of Probability and Its
Applications, 28:219-228, 1984.

[4] L.D. Brown. A proof of the Central Limit Theorem motivated by the
Cramer-Rao inequality. In G. Kallianpur, P.R. Krishnaiah, and J.K.
Ghosh, editors, Statistics and Probability: Essays in Honour of CR.
Rao, pages 141-148. North- Holland, New York, 1982.

REFERENCES

11

[5] Th. Cacoullos. On upper and lower bounds for the variance of a function
of a random variable. Annals of Probability, 10:799-809, 1982.

[6] L.H.Y. Chen. An inequality for the normal distribution. Journal of
Multivariate Analysis, 12:306-315, 1982.

[7] L.H.Y. Chen and J.H. Lou. Asymptotic normality and convergence of
eigenvalues. Stochastic Processes and their Applications, 34:197, 1993.

[8] H. Chernoff. A note on an inequality involving the normal distribution.

Annals of Probability, 9:533-535, 1981.

[9] O.T. Johnson and A.R. Barron. Fisher Information Inequalities and the
Central Limit Theorem. In submission, 2002.

[10] C.A.J. Klaasen. On an inequality of Chernoff. Annals of Probability,
13:966-974, 1985.

[11] J. Nash. Continuity of solutions of parabolic and elliptic equations.
American Journal of Mathematics, 80:931-954, 1958.

[12] S.A. Utev. An application of integrodifferential inequalities in probabil-
ity theory. Siberian Advances in Mathematics, 2:164-199, 1992.

```