Let's begin with a terse statement of the law of large numbers (sometimes called Bernoulli's law), then pick it apart to see what it means:

**p**, of some outcome in each test, the chance that the fraction of times that the outcome occurs differs from **p** converges to zero as the number of trials approaches infinity.

The law of large numbers (LLN) says that if we're interested in some probability experiment, say the rolling of a single six-sided die, the probability of rolling, say, a 3, is known (P(3) = 1/6), and is the same for each roll of the die. In other words, the trials are independent. While it's entirely possible to roll five 3's in a row, which might make us think that P(3) is large, it's much less likely that if we roll the die 1000 times, we'll get all 3's. We'd in fact expect 3 to come up about 1/6 of the time.

Here's an example of how it works: If we calculate the average of all six sides of a die (half a pair of dice), we get (1 + 2 + 3 + 4 + 5 + 6) / 6 = 3.5.

Now of course we won't ever roll a 3.5, so getting that average on a single roll is impossible. It's also unlikely after just a handful of rolls.

For example, if we rolled two 6's and a 5, our early average would be about 5.67. But over a large number of rolls, as the graph on the right shows, the average of all rolls will get closer and closer to our "expected average" of 3.5.

Given enough trials, the average value of any random event will approach its calculated average. We can think of it this way: *Things that are essentially random by nature can become quite predictable if we observe them for long enough*.

Imagine putting a drop of ink in a beaker of clear water. You've probably seen this before. The ink slowly spreads out until it uniformly colors the liquid.

The ink spreads because of random collisions between and among ink and water molecules.The cartoon below might help explain the idea.

On the top-left are two boxes separated by a barrier (thick black line). The barrier separates small red atoms (maybe those can be ink) from large blue ones.

When the barrier is removed, the random motions of the particles (atoms are *always* in motion) cause them to mix, eventually coming to a state (panel 4) of being completely mixed. But with such small numbers, it's possible for fluctuations to occur that would cause small imbalances (more red on one side than another, for example) to occur. For example, for small numbers, panel 4 could conceivably– and momentarily–turn back into panel 3.

On the other hand, if we have a very **large** number of atoms – moles, let's say, then we observe a *smooth* diffusion of each side into

the other, and we never see any fluctuations that would "unmix" the atoms; the sample is just too big for that to happen.

Insurance companies rely in having large numbers of people in their "pools" to even out risk and make their business predictable—something that's good for everybody.

Imagine you have an insurance company. You charge money to provide a benefit to your clients if they are involved in a certain kind of accident. Let's say that it's well known that the probability of having this kind of accident is one in 10,000 people per year.

If you have only a few clients, it's entirely conceivable that two might have an accident in same year, and that would ruin your company. But if you have 10,000 clients, then on average you'll be paying one benefit per year (some years more, some less, of course), which could conceivably be covered by the money you take in.

Large numbers of clients ensure that the risk is as close to the known odds as possible. It's a very important concept to understand for both insurers and people who make public policy about insurance.

We see political polls and take surveys of all kinds all of the time. While most surveys have big reliability issues, polling by random sampling can be pretty reliable because of the law of large numbers.

It turns out that a sample of only 1000 people, as long as it's chosen randomly, is sufficient to predict the outcome of a presidential race to within about ±3%.

And these things have a diminishing return. By doubling the number of samples from 1000 to 2000, we don't double our accuracy to ±1.5%.

The gain scales as the square root of the number of extra samples, so we'd only gain a factor of about 1.4 (square root of 2), not 2. And adding more samples would cost more, as well.

The law of large numbers says that if we have enough samples, the average we get will converge to the "actual" underlying value.

Of course, we see polls change and have different outcomes all the time, but remember that those usually change over time as the attitudes of the voters change, and not all polls ask the same questions.

This graph shows polling data taken between Oct. 2011 and Sept. 2012, just before the presidential election between President Barack Obama and Mitt Romney. Notice that while Obama led most of the way, the difference between the two was mostly within the margin of polling error. The race turned out roughly as the polls predicted it would. Mr. Obama won 51.1% of the popular vote to Mr. Romney's 47.2%. (Source: Real Clear Politics).

The law of large numbers suggests that as the number of observations grow, any random event will behave more-or-less predictably. But that does *not* mean that any one random event becomes more predictable. It absolutely does not. A common fallacy of gamblers is that, for example, if a six hasn't been rolled for some time on a single die, then one is "due," and we ought to bet on that. In fact, the probability of rolling a six on a fair die is 1 in 6, just as it always was.

One of the most famous examples of the gambler's fallacy occurred in a game of roulette at the Casino de Monte-Carlo in 1913. The roulette ball landed on black (about half of the slots are black, half red) 26 times in a row, an extremely unlikely occurrence. Gamblers in the casino heard about the streak and lost millions of francs betting against black. They reasoned (incorrectly) that because of the streak, red was "due" to be next. In fact, the probability of red or black for each spin of the wheel is about ½.

X
### Terse

To be **terse**, or to speak or write **tersely** is to use few words, to be sparing with words, even to the point of abruptness. Mathematical definitions and other descriptions are often viewed as terse, but the point is to be *precise* and economical with words.

**xaktly.com** by Dr. Jeff Cruzan is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License. © 2016, Jeff Cruzan. All text and images on this website not specifically attributed to another source were created by me and I reserve all rights as to their use. Any opinions expressed on this website are entirely mine, and do not necessarily reflect the views of any of my employers. Please feel free to send any questions or comments to jeff.cruzan@verizon.net.