Statistics and Scrabble, Together At Last

Sitting on my to-do list for a while now has been an exploration of Scrabble from an experimental design point of view; how to better design a tournament to make the variance as small as possible while still preserving the appearance of the home game to its players. One goal was to figure out a way to carry out a true "duplicate" version of Scrabble so that multiple pairs could have access to the same tiles, rather than the currently popular version in Europe that has no defensive element to it.

I'm proud (relieved?) to say that I've finally finished the first draft of this work for two-player head-to-head games, with a duplication method that ensures that if the game were repeated, each player would receive tiles from the reserve in the same sequence: think of the tiles being laid out in order (but unseen to the players), so that one player draws from the front and the other draws from the back. Like Lady and the Tramp with spaghetti:

tramp.jpg

I modified the Scrabble simulator Quackle to accept a predetermined tile order, then simulated over one million matches between Quackle's "Speedy Player"s using each of 10,600 tile orders 100 times. One goal of this was to figure out how much of the variance in score comes from the tile order and how much comes from the board, given that a tile order would be expected. It turns out to be about half-bag, half-board, so that if this scheme could be used in tournaments, it would visibly cut down the number of matches needed to figure out the best player (though it would need a Goldbergian apparatus to implement in live games.)

Some other findings from the simulations:

  • The blank is worth about 30 points to a good player, each S about 10.
  • The Q is a burden to whichever player receives it, effectively serving as a 5 point penalty for having to deal with it due to its effect in reducing bingo opportunities, needing either a U or a blank for a chance at a bingo and a 50-point bonus.
  • The J is essentially neutral pointwise.
  • The X and the Z are each worth about 3-5 extra points to the player who receives them. Their difficulty in playing in bingoes is mitigated by their usefulness in other short words.
I have yet to make any other conclusions about how I think the game should be modified, mainly because it's premature without testing these ideas out on human players. Any volunteers?

Geek talk: Running R processes remotely through ESS

I've been wondering for a while how to use the convenience of Emacs on a local machine, while running R processors on a server remotely. According to http://www.xemacs.org/Documentation/packages/html/ess_3.html, this can be done with the ess-remote environment.

1) Use M-x shell to open a shell environment.

2) Use ssh to connect to the server of interest. If desired, use "screen" to make the process resilient to disconnection.

3) Run R.

4) Use M-x ess-remote to enable the shell as an ESS process.

The advantages of this process are the added convenience of an environment that allows instant execution of lines of R code. Files still must be loaded on the remote server, rather than referenced on the local machine, but it's still an improvement.

Big chances to change redistricting this year...

...assuming we can take advantage of them.

State legislatures are preparing to conduct their decennial redistricting processes, now that the data from the last census has been processed. In states where the legislature does it directly, the result is known to be far less than fair for its constituents, in that either one party tries to take control of an "unfair" number of seats (remember the Oklahoma exodus!) or the incumbents on both sides work to protect their own re-election prospects (some people have all the luck).

There is hope that a bipartisan commission, whose members cannot run for office in their newly drawn districts, would be able to break the easily recognizable incumbency advantage and, from there, create a map that would be fairer to all parties. But how likely is this? In fact, there seems to have been very little difference in the maps produced by commissions and by legislatures in terms of the absolute performance of electoral systems, or in the change in their performance after redistricting has taken place.

Read all about it in my "editorial" paper that proposes questions about the redistricting process that commissions should be asking to guide their work this year. Not easy to publish a paper of questions unless you're near or past retirement age, let alone when those questions come from null results!

Catcher Spotting Data Now Available

Thanks to all those who took part in my trial of Catcher Spotting utilities. As promised, I've posted the data for all those who want to work with it.

The paper is posted here and is in submission for the JQAS special issue on the NCSSORS conference.
Summarizing my latest sports piece, and first peer-reviewed publication on baseball. Get it for free here.

Even after nearly a century and a half of major league play, baseball still has no shortage of great questions and puzzles, one of them being the phenomenon of the streak. Just how remarkable was Joe DiMaggio's 56-game hitting streak, and Ted Williams's less celebrated 84 consecutive games reaching base? If we could rerun the past 139 seasons of baseball, how likely would we be to see a streak of that length (or longer) again? And how can we trust that the answer we get back is in any way a reliable one, without the use of a time machine?

A Catcher Spotting Tool: "Hot Or Not?" For Baseball Pitches

Catcher Spotting is a project I've been working on casually for about 4 years, starting when I got curious about the last uncaptured bit of data from a baseball game: the set-up of the catcher, implying the intended target of each pitch before it's thrown by the pitcher. Every commentator knows that when a pitcher is "missing his spots", it's evidence of a loss of control, but attempts to measure the impact of this loss are utterly stymied by the lack of quality data out there.

That's where the Catcher Spotting project comes in: I am very interested in figuring out exactly how this data can be collected on a wide scale. While commentators can be in agreement about whether a pitcher has missed his spot, codifying this hasn't been established yet. And I'm far from convinced that technology can do it alone through video analysis, especially in cases of "intentional deception" like when a runner is on second. Crowdsourcing seems to be the obvious solution -- how well can you distribute the task to many different coders?

To that end, I've built an applet that tries to answer that question: users simply click their mouse to indicate where the catcher has set up, and where the ball actually goes. By collecting data from (hopefully) many users, we should know how many different human coders would be needed to get a reliable sense of a pitcher's intent, as signalled through a catcher. The idea is the same as sites like Hot Or Not, except that I'm explicitly concerned with how the same rater will judge different pitches, so that we can know how trustworthy a single rater could be.

Please give the applet a try!
From an official response to the CRU email incident, largely clearing the actions of climate scientists:

The Report points out where things might have been done better. One is to engage more with professional statisticians in the analysis of data. Another, related, point is that more efficacious statistical techniques might have been employed in some instances (although it was pointed out that different methods may not have produced different results). Specialists in many areas of research acquire and develop the statistical skills pertinent to their own particular data analysis requirements. However, we do see the sense in engaging more fully with the wider statistics community to ensure that the most effective and up-to-date statistical techniques are adopted and will now consider further how best to achieve this.
I'm not sure anything that was said in the report will placate the denier crowd, but this part here gives me hope. There are plenty of people out there who are well-experienced in cross-field statistical analysis that would love to have sit-downs with real data and provide a sounding board. Indeed, it's one of the founding purposes of my department.

Because Cosma can say it better, and longer, than I can

Our paper is posted: how homophily and contagion in social networks are confounded, given the impact of latent variables on past and present outcomes, as well as network ties, in social network analysis.

A Worthy Repost of Someone Else's Material

How To Publish A Comment has come up twice in the last week: once at a conference, once at a blog post. Boy, I can't wait for this to happen to me.

The Statistics of the Putter's Game

When I first saw this piece on improved putting statistics in the Wall Street Journal, I put on my typical skeptic's face when it comes to science in the news, and especially when it comes to the overplay of data analysis and "new statistics" that creeps up in this kind of reporting.

On reading the actual paper by operations researchers Douglas Fearing, Jason Acimovic and Stephen Graves, I was pleasantly surprised at the care and attention they've put into the problem. Having a rich data set is essential, and they've got one in the PGA putting database; I would cringe if the authors were required to gather their own data and make overly broad conclusions on that basis.

The writers have deftly avoided the kinds of oversimplifications that make students of sports analysis cringe. Their model is simply stated -- figure out the factors that lead to making putts, and for those that weren't made, model how bad the misses are -- but the tricks to getting the computation right are subtle. Most importantly, they validate their models against data and resist the temptation to overfit, and they do well to produce a relevant quantity for each player (shots gained through putting, compared to a baseline) that can be predicted and therefore validated on a regular basis.

I have the usual gripes about graphics and other statements; I want to see error bars on graphs, for example, and I really want to know about the predictive error -- that is, how well the putts-gained-per-round statistic will predict future putting performance (within one tournament, one month, one year, etc.) -- but all in all, I'm glad I went to read the paper myself and be reassured that the PGA is advertising a worthwhile product.