March 2010 Archives

The Statistics of the Putter's Game

When I first saw this piece on improved putting statistics in the Wall Street Journal, I put on my typical skeptic's face when it comes to science in the news, and especially when it comes to the overplay of data analysis and "new statistics" that creeps up in this kind of reporting.

On reading the actual paper by operations researchers Douglas Fearing, Jason Acimovic and Stephen Graves, I was pleasantly surprised at the care and attention they've put into the problem. Having a rich data set is essential, and they've got one in the PGA putting database; I would cringe if the authors were required to gather their own data and make overly broad conclusions on that basis.

The writers have deftly avoided the kinds of oversimplifications that make students of sports analysis cringe. Their model is simply stated -- figure out the factors that lead to making putts, and for those that weren't made, model how bad the misses are -- but the tricks to getting the computation right are subtle. Most importantly, they validate their models against data and resist the temptation to overfit, and they do well to produce a relevant quantity for each player (shots gained through putting, compared to a baseline) that can be predicted and therefore validated on a regular basis.

I have the usual gripes about graphics and other statements; I want to see error bars on graphs, for example, and I really want to know about the predictive error -- that is, how well the putts-gained-per-round statistic will predict future putting performance (within one tournament, one month, one year, etc.) -- but all in all, I'm glad I went to read the paper myself and be reassured that the PGA is advertising a worthwhile product.