Consider a vector where . It is often claimed that this distribution is “very close”, in the limit of large , to that of the uniform measure on the sphere of unit radius. This is, in fact, not hard to see (informally) using a Chernoff bound argument. Firstly, let denote the entry of . More precisely, the Chernoff argument yields the following inequalities:
The way to obtain this is pretty standard. For instance the first inequality can be obtained as:
where and the inequality is by Markov. Optimizing over yields the result. This tells us that concentrates around the expected value of 1 at least to the extent of for a small constant . In fact one can do better than this using an exact large deviations calculation, which I will probably leave to another post.
Thus for large dimensions , the probability mass of the distribution is concentrated in a shell of thickness and mean radius 1, which intuitively, looks very much like . The question that interested me is how we can prove the converse: assuming where denotes the spherical measure on do the marginals of each entry, say look Gaussian? More precisely, let denote the measure of on . Then, does converge weakly to ? This is, in fact, true and is an old result due to Borel in his (book/treatise) “Introduction géometrique à quelques théorie physiques”. I will sketch a simple argument via characteristic functions.
The function is the characteristic function of , where . We can compute this integral explicitly. The formula for the “area” of an -dimensional spherical cap as given (here) is:
where is the surface area of , is the depth and is the normalized incomplete beta function. Differentiating this, dividing by and noting that , we can compute the characteristic function as the following integral:
where is the beta function. To do the above, we have to take care of a few sign conventions, which I have omitted. Now, for fixed and large by Stirling’s approximation. With this and a change of variables , we obtain:
by dominated convergence. The last limit is precisely what we require, namely the characteristic function where . The rest follows from Lévy’s continuity theorem.
There are significant improvements to this result. Diaconis and Freedman prove that this holds for all low-dimensional marginals, even in the sense of variation distance. More precisely, let denote the restriction of to the first entries. They give a sharp bound on the variation distance between the law of and . Interesting generalizations include this result by D’Aristotile, Diaconis and Newman which says that when is a orthogonal matrix of size chosen uniformly at random, and , and others which state that even a “small” submatrix of behaves as though its entries were iid standard normals.
I am sure I have barely scratched the surface here, but it is clear that many (and sometimes intuitively plausible) approximations of this kind are actually fairly accurate even rigorously.
UPDATE (04/05/13): Thanks to Jiantao Jiao for some corrections above.
Leave a comment