More on Correlations and Language

I was delighted to see the response in AG's blog to some of my thoughts on causative language, part of the reason being that word order implicates a causative notion. As most researchers know, this isn't even the beginning of the story. Translating mathematical concepts to plain language is difficult, even when the scientist is playing honestly with the facts.

Even in a simple case like baseball, we drop terms all the time from the explanation, as Phil Birnbaum demonstrates:

We can run a simple regression, runs scored vs. triples hit. I used a dataset consisting of all full team-seasons from 1961 to 2008 (only for teams that played at least 159 games, to omit strike seasons). That was 1,121 teams. The result of the regression:

Runs = 731 - (0.44 * triples)

That's not a misprint: the regression tells us that every triple actually *costs* its team almost half a run!

Birnbaum goes on to demonstrate that a triple really does have positive value in runs through a very nice matching argument, but I still think he undersells the problem with . Just looking at this statement alone, it's worth noting in English what this mathematical statement says:

"The expected number of runs that a Major League Baseball team (one of thirty) scores in a year is negatively correlated with the number of triples hit by said team, given the population and the underlying distribution of covariates"

and not

"one additional triple results in a loss of 0.44 runs".

In short, simplifying the language can strip out the details that matter most.

Recent Entries

Following the Shalizi Model for Blog Maintenance
My attempt to put up a web presence is negated by the fact that I don't make many trivial updates…
Sitting this one out
As a former resident of Massachusetts, it's been interesting for me to watch the Facebook reaction to the Coakley-Brown race…
Off the grid
For the first time since I've started using email on a regular basis, I'll be without it for the next…