April 2012 Archives

Prediction, The Big Discovery and Heartbreak

It's a year old, but I only just heard the story of what happened to a baseball researcher I first read about 10 years ago:


"Voros" McCracken had a particular insight about pitcher ability in the late 1990s, namely that a subset of the data -- "defense-independent" pitching statistics-- was not only an excellent predictor of the runs allowed by a pitcher, but is also highly persistent from year to year. Others discovered the same principle around the same time, but it was the publication of this work that got the attention: the strong claim that the differences between pitchers, on batted balls in play, were so small as to be ignored. While it wasn't quite correct -- there is some predictive power in the remaining information -- it was enough to change people's ideas about how the game works.

And now, McCracken is out of baseball, applying analytical methods to undisclosed professional soccer clubs, having made a meager living while working for the Boston Red Sox in the early 2000s. As one of the figures I read about before going to grad school in statistics, it definitely got my attention when I heard this news.

There are two things I take away from this whole story:

1) After 12 years of exposure to real data and methods, I feel like it should have been *screamingly obvious* to find the best set of predictors of success, and an assignment I could give to an undergrad with existing databases. But would it be obvious enough to construct a narrative around it, and convince the public? I can think of a good explanation after the fact, that there's a lot of general uncertainty once the bat hits the ball, like the angle of attack, but I've had years to think about it.

2) Any research I do on sports is for the sake of teaching, or just as a hobby, and the value it brings me to share something with the world is a nice bonus. I'm not under pressure to find a brilliant discovery to keep my job -- at least, not when it comes to sports.

Gelman has said that the most noteworthy discoveries these days aren't the small effects that come out with more data, but the big ones that everyone else just missed. This definitely qualified as one of those. Whatever the next big discovery, I'm sure we'll all think it was obvious years later, even if it wasn't.