SamuZai
3blue1brown
3blue1brown

patreon


A draft video on proving the central limit theorem (which I don't particularly love)

Hey folks,

Last month I thought I'd put together a "quick" video offering a sketch for a proof of the central limit theorem. I edited it together most of the way, with several to-do stubs for animations I could throw in to help clarify. When I stepped back, though, I felt like the underlying structure could use some improvement beyond a few clarifying animations.

I'm sharing this partly in the spirit of sharing more partial work before putting projects on a shelf, but also to get your reaction and see if you agree with me on aspects that need improving.

One aim here was to try something stylistically different, beginning with a simple overhead shot of a pen-to-paper line of reasoning. I enjoyed playing around with the style, and it's worth doing more to have simple pen-and-paper segments in videos to convey better what doing math actually feels like. In the end, though, it's not actually that much faster than animating, if at all.

My current beef with it is less about style than substance, though. Here are a few notes I jotted down when I stepped back and tried to view it as if I were a new learner.

* Too many points feel unmotivated, which makes it hard to hold all the different objects in one's head. Most notably, where does the Moment Generating Function come from? In the sea of new terms, it's easy to get a bit lost without a sense of what each one is trying to accomplish.

* A key question is whether knowing that the MGF converges to a single universal shape is enough to justify that the original distribution also converges to a single universal shape. I mention this, but it feels glossed over. I don't even address the sense of convergence that the CLT claims. Really, the argument focussing on MGFs and cumulants is more intuition than proof, given that it's only a subset of distributions that have well-defined MGFs. It could be clearer that most of the video is providing scaffolding that characteristic functions can rigorously fill.

* Arguably the most important object referenced here is the characteristic function, but it's brought up almost as an afterthought. Perhaps a better structure would be to put that function front-and-center and make it as much a video about how probability theorists use Fourier Transforms as it is about anything else.

---

I toyed with a second version that took more of a discovery fiction approach. That is, trying to walk down a path for how you might have invented the characteristic function yourself in an attempt to prove the CLT. There's something to that approach, but by this point, I felt I'd overthought the whole thing, and in the grander scheme of things have probably spent way more time on central limit theorem-related videos than is justified.

So, I put this on a shelf, and for August I've turned to a completely separate topic about physics. Stay tuned for updates there, it's shaping up to be a fun one.

In the meantime, if you have thoughts about what you'd like to see whenever I _do_ turn back to add a video to the probability sequence about characteristic functions and proving the CLT, I'm all ears!

-Grant

A draft video on proving the central limit theorem (which I don't particularly love)

Comments

This video as a whole went a bit over my head, I don't think I'm familiar enough with all the pieces that go into it. But 0:12 in the animation you have "Culumants", and I assume this should be "Cumulants".

C.J. Smith

I really appreciate this video and the whole Central Limit Theorem series. I have wanted a deeper understanding of the theorem for years and this helps. I'd seen the moment generating function proof before but your explanation was better. I'd love to see this visually someday, like showing a function built up by its moments (how does its graph change as you add terms?) I also still find the fact that the extra terms fall away to be unsatisfying. I see the algebra but do not feel like it's an intuitive concept yet.

Trevor Strohman

Hey Grant, I liked the video. Given that the concept of moment generating functions pretty much came out of nowhere, I think you'd be better off to just introduce the characteristic function to begin with. It's the deeper and more important concept and is just as easy to define as the moment generating function.

Nico Zimmer

Broadly: This is extremely interesting! I love this idea of a distribution being defined uniquely by mean, variance and these higher level extensions. And I love that somehow the limit of n/n^m/2 leads to all the higher levels approaching 0 and so the final distribution is one defined only by mean and variance which is exactly gaussians! However, I am having trouble understanding the level at which this video presents this. I absorbed this at a very broad level, but the video goes into a lot of algebraic detail and the use of generating functions does feel unmotivated (doesn't it always though ...). I could imagine this being a whole series of videos. Maybe the first digs into the Cumulants for a distribution. Define them without the whole generating function stuff and discuss how they generalize the idea of mean and variance. Then show how they uniquely define a probability distribution (and maybe note that gaussians are exactly the ones with 0 for all higher order cumulants). Maybe discuss how they could be seen like a Fourier transform of the distribution a bit more? Finally that first video could end with noting how these cumulants turn out to be the coefficients of this weird generating function (and then link to a generating function video since I feel like generating functions never make sense at first, but seem to be so useful in so many situations). Then the next video could be showing what happens as that generating function is applied to this normalized sum of iid variables culminating in the revelation that the gaussian is exactly the distribution with all higher level cumulants 0. What I like about this direction is that by the time you define and work with the cumulant generating function, it has been clearly motivated by learning about the cumulants and how they fully describe the distribution. I don't quite understand the characteristic function connection here, it sounds like it is needed to rigorously prove convergence of these functions, but that you felt it complicated the intuition in a way that you didn't want to introduce it originally (only at the end to note to anyone worried about convergence that it does work with a small variation).

Detail: I found the two different meanings of the symbol K to be confusing. AFAICT, you are using K_X(t) = sum K_m[X] t^m / m! but I found myself confusing the K(t) vs. K[X] which was especially confusing around 11:30 in the video where you jump between them. I'm guessing that this is standard notation, but if I was presenting this I would probably try to use different symbols to reduce confusion.


More Creators