(or, in which I finally do an analysis of some 2012 election data)

Many are celebrating the success of the poll aggregators who forecasted the states won by each candidate -- many called all 50 right, including

FiveThirtyEight. No doubt Nate Silver will continue to be the world's most famous meta-analyst given this accomplishment -- even though several of his peers, such as the

Princeton Election Consortium,

Votamatic and Simon Jackman's projections for the

Huffington Post, seemed to do equally well. The strength and depth of the number of polls in swing states no doubt had a lot to do with all their successes.

How much of an accomplishment this is, of course, depends on context; the winner in most states was easily predicted ahead of time with the barest minimum of polling. Consider instead a related question: how close were the vote shares in each state to the prediction, as a function of the margin of error?

The simplest way to check this is to calculate a p-value for each prediction: for each prediction and its associated uncertainty, calculate the probability that the observed value (vote share) is greater than a simulated draw from this distribution. The key is that for a large number of independent prediction-uncertainty pairs, we should see a uniform distribution of p-values between 0 and 1.

I grabbed the estimates from FiveThirtyEight and Votamatic (at this time, I have only estimates, not uncertainties, for PEC or HuffPost) and calculated the respective p-values assuming a normal distribution in each case. Media coverage suggested that Nate Silver's intervals were too conservative; if this were the case, we would expect a higher concentration of p-values around 50%. (If too anti-conservative, the p-values would be more extreme, towards 0 or 1.)

On the contrary, the 538 distribution is nearly uniform. The closer the points are to the diagonal, the better the fit to the uniform:

Repeating the process for Votamatic:

The values are pushed towards zero and one, so the confidence intervals are far too tight: the Votamatic predictions turned out to be too overly precise.

The data I used are here. (I read the Votamatic intervals directly off the graphs; if I can get a more precise value, I'll repeat the analysis.) I'm very curious to know how the other meta-pollsters did, so if anyone has put together that data, please send it my way.