(or, in which I finally do an analysis of some 2012 election data)
Many are celebrating the success of the poll aggregators who forecasted the states won by each candidate -- many called all 50 right, including FiveThirtyEight
. No doubt Nate Silver will continue to be the world's most famous meta-analyst given this accomplishment -- even though several of his peers, such as the Princeton Election Consortium
and Simon Jackman's projections for the Huffington Post
, seemed to do equally well. The strength and depth of the number of polls in swing states no doubt had a lot to do with all their successes.
How much of an accomplishment this is, of course, depends on context; the winner in most states was easily predicted ahead of time with the barest minimum of polling. Consider instead a related question: how close were the vote shares in each state to the prediction, as a function of the margin of error?
The simplest way to check this is to calculate a p-value for each prediction: for each prediction and its associated uncertainty, calculate the probability that the observed value (vote share) is greater than a simulated draw from this distribution. The key is that for a large number of independent prediction-uncertainty pairs, we should see a uniform distribution of p-values between 0 and 1.
I grabbed the estimates from FiveThirtyEight and Votamatic (at this time, I have only estimates, not uncertainties, for PEC or HuffPost) and calculated the respective p-values assuming a normal distribution in each case. Media coverage suggested that Nate Silver's intervals were too conservative; if this were the case, we would expect a higher concentration of p-values around 50%. (If too anti-conservative, the p-values would be more extreme, towards 0 or 1.)
On the contrary, the 538 distribution is nearly uniform. The closer the points are to the diagonal, the better the fit to the uniform:
Repeating the process for Votamatic:
The values are pushed towards zero and one, so the confidence intervals are far too tight: the Votamatic predictions turned out to be too overly precise.The data I used are here.
(I read the Votamatic intervals directly off the graphs; if I can get a more precise value, I'll repeat the analysis.) I'm very curious to know how the other meta-pollsters did, so if anyone has put together that data, please send it my way.