Coupon collecting

Coupon collecting is a classical homework problem in undergraduate probability. Suppose there exist {n} different types of coupons. At each time one selects a coupon of a uniformly random type. In expectation, how many coupons does one need to collect before one has at least one coupon of every type? The analogous problem of throwing balls in bins is similar: suppose there were {n} bins and one threw balls uniformly randomly into the bins. How many balls would one throw, in expectation, before all the bins were filled?

By the pigeonhole principle one would have to collect at least {n} coupons, but this is not enough. With probability {1 - n!/n^n > 0} one would get at least one overlap and thus, at least one bin would be empty. If {\tau} is the (random) number of coupons required to cover all the types, then the task is to find {\mathop{\mathbb E}\tau}.

It is useful to approach this in an inductive fashion: assume that {k} types have already been collected. Since the number of selections required to get a new type is a geometric random variable of success probability {p_k = 1-k/n}. Thus, in expectation, one needs {n/(n-k)} tries to get a new type after collecting {k} types. This yields that, starting from nothing, one needs {\sum_{i=1}^{n-1} n/(n-i) = nH_{n-1}} tries in expectation, where {H_{n}} denotes the harmonic sum upto {n}. This number is roughly {\log n} yielding that {\mathop{\mathbb E}\tau \approx n\log n}.

More interesting forms of “coupon-collecting” type questions are, say, the number of tries are needed to obtain a coupon of each type with high probability (and not just in expectation). Or, assuming one collected {n} coupons u.a.r as before, what would be the rough size of the coupon type with maximum collection. It is interesting how relatively simple problems arise in unexpected places. I started this post after noting that the same “coupon collector” effect yields a lower bound of {O(nr\log(n))} randomly sampled entries to recover a rank {r} matrix of size {n} (as proved Tao and Candes’ paper on low rank matrix recovery).

 

Leave a comment