Model Checking and Baseball

One of the reasons I'm in the business of stochastic modelling in sports is that with such an abundance of data, it's easy to check a model against real world scenarios. There's a really nice discussion going on at Sabermetric Research that addresses why standard linear regression alone isn't going to get a good enough picture of the causal processes in baseball, let alone the non-game-based world.

This is related to what AG has discussed on the notion of model scaffolding: that by slightly changing the specification of a model, one can gauge how believable the model is for describing the situation at hand. It's also a good warning that a regression coefficient doesn't necessarily mean what you think it does; this is illustrated clearly in the above-linked article where a poorly-chosen model suggests that hitting triples leads to a decrease in run support.

What really impresses me about this one is that the proposed solution to the analysis problem is to run a matching-like experiment -- take all base-run situations, match up those for when triples are hit to when they aren't (for baserunners, pitching scenarios, hitters, etc), then compare the runs scored in the inning afterwards to get a plausible effect size.

Recent Entries

Following the Shalizi Model for Blog Maintenance
My attempt to put up a web presence is negated by the fact that I don't make many trivial updates…
Sitting this one out
As a former resident of Massachusetts, it's been interesting for me to watch the Facebook reaction to the Coakley-Brown race…
Off the grid
For the first time since I've started using email on a regular basis, I'll be without it for the next…