Following
Sam's presentation at this year's JSM, we are proud to release our preprint for consumption:
A.C. Thomas, Samuel L. Ventura, Shane Jensen, Stephen Ma, "Competing Process Hazard Function Models for Player Ratings in Ice Hockey",
available from arXiv.
Abstract: Evaluating the overall ability of players in the National Hockey League (NHL)
is a difficult task. Existing methods such as the famous "plus/minus" statistic
have many shortcomings. Standard linear regression methods work well when
player substitutions are relatively uncommon and scoring events are relatively
common, such as in basketball, but as neither of these conditions exists for
hockey, we use an approach that embraces these characteristics. We model the
scoring rate for each team as its own semi-Markov process, with hazard
functions for each process that depend on the players on the ice. This method
yields offensive and defensive player ability ratings which take into account
quality of teammates and opponents, the game situation, and other desired
factors, that themselves have a meaningful interpretation in terms of game
outcomes. Additionally, since the number of parameters in this model can be
quite large, we make use of two different shrinkage methods depending on the
question of interest: full Bayesian hierarchical models that partially pool
parameters according to player position, and penalized maximum likelihood
estimation to select a smaller number of parameters that stand out as being
substantially different from average. We demonstrate this on games through five
NHL seasons.
Our ultimate goal for this project was to first come up with a mathematically rigorous method for determining how players affected the outcomes of hockey games. As we are stochastic modellers by training, this to us meant finding a generative probability model for how these games may come to be. (I am a
fan of this approach in hockey
for several reasons.) We took the
Rosenbaum/
Macdonald approach of dividing the game into shifts, so that no players substitute for each other during each observational unit. We then took the outcome of each event to be whether or not one team scored a goal (or changed off some of their players) and automatically factored in how much time had elapsed. We also adjust for the fact that some players play much of their time together, and that some players play very little.
There are a lot of things we can actually put in this model beyond player identifiers, like teams or pairs of players together -- so long as we're willing to wait for the solution to compute, which for all players over 5 seasons, can be on the order of a day using our current code. We discovered a few things that are of interest to hockey fans as well as statisticians and probabilists, but two jump out especially to me:
- Defencemen as a group are far more interchangeable than goalies or forwards are, at even strength. This is likely because they share most of their prime duty -- defence -- with the goaltender, who show much more variety in ability, whereas most of the burden on scoring belongs to the forwards.
- There are a few player pair combinations that are just plain awful together (rather than when they play apart) such as when Sidney Crosby and Evgeni Malkin played on the same line. The additional deficit to team defence was so big compared to any extra gain in offensive ability that it would be much more worth playing them separately.
Further results -- and plenty of tables! -- are available in the paper.