By now, unless you're hiding under a rock, you've seen about a bazillion (10 ^ huge) polls about the Presidential election. They always have a "margin of error" listed, and sometimes even remember to say that if the difference in scores is less than this margin, it's a "statistical dead heat". But they generally don't explain what that all means.

So I will.



Imagine that you have a box of M&Ms, ten thousand or so of them. Most are either blue or red, but there's a handful of other colors as well. On November 2, you'll dump them into a machine that will count them all for you, but you don't have access to that machine before Nov 2.

Until then, unless you want to spend all day counting, your best bet is to try to take some smaller number of them to count. You do this by reaching in without looking, then seeing what you pull out.

Say you pull out just one, and it's blue. This tells you almost nothing, except that there's at least one blue M&M.

You pull out another one, and it's also blue. Again, out of 10,000 M&Ms, this doesn't tell you much. You'd have to pull out a LOT of blue ones in a row before it started to suggest blue actually had a majority.

There's some unknown but defined chance that any given M&M you pull out will be blue. Or red, or green, etc. Just like there's a defined chance of getting heads when flipping a coin. It's possible that a scoop of 100 M&Ms would get all blue even if red is the majority, because you're pulling from a bin of 10000.

However, the more M&Ms you get in one scoop, the less likely it is that the numbers will depart from the percentages found in the entire bin. And if you're sure to stir up the bin before scooping, you reduce the chances of just taking a scoop from an area that's all one color.

The margin of error is simply a way of saying how unlikely it is to get percentages different from the "correct" ones, assuming that you stir carefully. There's formulas for figuring it out in any given situation, based on your results and the size of your scoop. It's not linear, however...doubling the size of your scoop doesn't cut your margin of error in half, although it does reduce the margin. Once you reach a certain point, to get any noticeable reduction of the margin of error requires an impractical increase in the size of the scoop.

In polls, the scoop is called a sample. It's the people who were polled. 1000-1500 subjects is generally good for a 3 to 4 percent margin of error, and getting it down to 2 percent requires a much larger sample, so they rarely bother.

Thus, if a poll says Candidate A has 52% of the vote and Candidate B has 43%, with a margin of error of 4%, you can be pretty sure that if the election had been held at the time of the poll, Candidate A would have won. However, if the split had been 52/48 (no third party or undecided responses), it would be too close to call. You can easily pull 52 blue M&Ms and 48 red M&Ms out of a bin with 4000 blue and 6000 red M&Ms. It's not the most likely outcome, but it's not too unlikely either.

Why do different polls vary by more than their margins of error, then?

Well, the margin of error calculations presume that the sample is representative of the whole population. In other words, that the bin has been thoroughly mixed, and there aren't big pockets of mostly red or mostly blue that you might plunge the scoop into. This is sometimes really hard to ensure, and different polling organizations have different methods of trying to pull it off.

Also, margin of error calculations presume that there is no bias in the questioning. While obvious bias is easily eliminated (and all reputable polling organizations work very hard at this), there's often subtle forms of bias in the wording of questions that sneak through, and can push those who are on the fence to one side or the other.

Finally, a poll is a snapshot, and while your M&M bin doesn't change over time, people's opinions do. Let's say there's an even that happens Monday that is likely to sway a particular type of voter. If one polling group covered most of that type of voter on Friday, but another didn't get to them until Tuesday, two polls released on Wednesday might disagree simply because the effects of Monday's event weighed more heavily on one poll.

The "changes over time" thing is another reason why polling organizations rarely worry about getting margins of error tighter than 3%. Given that the real numbers can change by 3% in a few days because of what happens in the race, a smaller margin of error just means the results go "stale" faster.



And there you have it, the reasons behind the margin of error.

Take a poll number, add and subtract the margin of error to it, and you have a range that's 90% likely to be correct...assuming everything about the polling process was done right.
This account has disabled anonymous posting.
If you don't have an account you can create one now.
HTML doesn't work in the subject.
More info about formatting
.

Profile

dvandom: (Default)
dvandom

Most Popular Tags

Powered by Dreamwidth Studios

Style Credit

Expand Cut Tags

No cut tags