August 2012 Archives

Scrabble Cheating

News of a cheating scandal in Scrabble has rippled through the community, after a competitor (proverbially) hid the blanks up his sleeve during matches, leading to his subsequent disqualification. As he is a minor, his name is not being shared, thereby preventing us from asking why if he was going to cheat, why he couldn't have done a better job of it.

Let me take the opportunity to remind tournament organizers everywhere that the latent tile order design mechanism could have prevented this travesty from happening. And all they would have had to do was spend tens of thousands of dollars to design and build the physical apparatus to make it happen, and tens of thousands more to outfit the entire tournament with them. But in the long run, shouldn't we do everything we can for the children?

Following Sam's presentation at this year's JSM, we are proud to release our preprint for consumption:

A.C. Thomas, Samuel L. Ventura, Shane Jensen, Stephen Ma, "Competing Process Hazard Function Models for Player Ratings in Ice Hockey", available from arXiv.
Abstract: Evaluating the overall ability of players in the National Hockey League (NHL) is a difficult task. Existing methods such as the famous "plus/minus" statistic have many shortcomings. Standard linear regression methods work well when player substitutions are relatively uncommon and scoring events are relatively common, such as in basketball, but as neither of these conditions exists for hockey, we use an approach that embraces these characteristics. We model the scoring rate for each team as its own semi-Markov process, with hazard functions for each process that depend on the players on the ice. This method yields offensive and defensive player ability ratings which take into account quality of teammates and opponents, the game situation, and other desired factors, that themselves have a meaningful interpretation in terms of game outcomes. Additionally, since the number of parameters in this model can be quite large, we make use of two different shrinkage methods depending on the question of interest: full Bayesian hierarchical models that partially pool parameters according to player position, and penalized maximum likelihood estimation to select a smaller number of parameters that stand out as being substantially different from average. We demonstrate this on games through five NHL seasons.
Our ultimate goal for this project was to first come up with a mathematically rigorous method for determining how players affected the outcomes of hockey games. As we are stochastic modellers by training, this to us meant finding a generative probability model for how these games may come to be. (I am a fan of this approach in hockey for several reasons.) We took the Rosenbaum/Macdonald approach of dividing the game into shifts, so that no players substitute for each other during each observational unit. We then took the outcome of each event to be whether or not one team scored a goal (or changed off some of their players) and automatically factored in how much time had elapsed. We also adjust for the fact that some players play much of their time together, and that some players play very little.

There are a lot of things we can actually put in this model beyond player identifiers, like teams or pairs of players together -- so long as we're willing to wait for the solution to compute, which for all players over 5 seasons, can be on the order of a day using our current code. We discovered a few things that are of interest to hockey fans as well as statisticians and probabilists, but two jump out especially to me:

  • Defencemen as a group are far more interchangeable than goalies or forwards are, at even strength. This is likely because they share most of their prime duty -- defence -- with the goaltender, who show much more variety in ability, whereas most of the burden on scoring belongs to the forwards.
  • There are a few player pair combinations that are just plain awful together (rather than when they play apart) such as when Sidney Crosby and Evgeni Malkin played on the same line. The additional deficit to team defence was so big compared to any extra gain in offensive ability that it would be much more worth playing them separately.
Further results -- and plenty of tables! -- are available in the paper.