Pucksberry: Adapting Hexagonal Bin Plots for NHL Display

One of my favorite classes of statistical graphics in sports media is the hexagonal bin plot, used by Grantland's Kirk Goldsberry to illuminate the shooting patterns and successes of shooters in the NBA. Combined with his access to the luxuriously rich SportVU data, Goldsberry has made a second career using a single graphic to tell stories (he's also a geography professor). 

As of this NBA season, SportVU gives the x-y locations of all shots taken along with their success or failure in scoring, so Goldsberry has two variables to plot: the relative location of shots, and the proportion of shots that go in. These make for glorious comparisons to make a point, like how the Spurs dominated during their winning streak:


So of course, as a statistics professor who teaches graphics and does research on hockey, my first instinct is to steal it for a massive profit see how I can adopt, adapt and improve this method for the hockey community at large, particularly since x-y data on shot attempts has been available for the NHL since 2008

So what are the big differences between NBA and NHL data that we have to bear in mind? And when do we get to see some pretty pictures? (The answer to both after the jump.)

1) Star players in the NBA take far more shots per game than star players in the NHL. Proportionally, we may not have enough information to make a decision on a player's shot locations. Beyond that, the locations a player shoots from are often far more rigidly defined by their position -- defensemen will shoot from the extremities and typically from their standard side. 

We're going to get the most leverage, then, out of applying these plots not to individual shooters but to full teams.

2) Among other things, the presence of a goaltender and the relative difference in puck travel time means that there's way more variability in shot success by location -- between 2 and 20 percent on average, rather than a spread of 30 to 70 percent in the NBA. Combined with the low success rate, it will be more difficult to establish any meaningful differences.

3) The NBA has two resolutions of its shot attempts: a miss and a basket. In contrast, the NHL has four -- shots that are blocked by players on the way to the net (BLOCK), shots that make it the distance but miss the net (MISS), shots that make contact with the goaltender (SHOT -- though a "save" is credited even if the shot would have missed the net), and goals. 

Here's where the NHL data starts to show its seams:

  • GOAL and SHOT have both distance to the net and x-y coordinates.
  • MISS has distance, but no coordinates.
  • BLOCK has neither distance nor coordinates.
For the sake of these plots, we can impute x-y coordinates for MISS based on the SHOT distribution, but blocked shots have to be omitted from this plot. 

4) Smoothing. We can color our bins by relative success, but the small number of successes will mean that strong colors will appear sporadically. We're better off either smoothing over the surface continuously, or assembling a secondary binning system. Goldsberry does the latter, which makes the most sense here because...

5) We'd like to know if any deviations in either count or success rate have any statistical significance, and it's way easier to establish this for a discrete set of bins -- the number of shots in a bin is best modelled as a Poisson distribution, and conditional on the total number of shots, the number of successes is Binomial. So we can then pick a series of regions on the ice that correspond both to known roles and to big changes in success.

6) Finally, NBA shooting assumes that possession between scoring attempts alternates, which means that overall shooting rates are roughly equal. A lot of emphasis has been placed on statistics that measure relative shooting rates, which means we'd be remiss in ignoring how these rates differ by zone. We can still compare relative successes, but comparing both of these will indicate what part of a team's success is true signal and which is noise.

With these priorities in mind, let's test out some examples on this past season at even strength. The unit of interest is the z-score of the rate or success probability, so that brighter colors mean stronger signals -- red is high, blue is low and grey is average.

Toronto Maple Leafs -- Shot Rates By Region


The Leafs were the second-most shot-upon team in the NHL this year, and it's clear that while much of that damage was on the perimeter -- low-probability shots were greatly higher than average, but in no region did the Leafs allow substantially less than the league average of shots per minute. And they attempt a very low number of shots compared to league average in the low slot, where success probabilities are the highest.

Toronto Maple Leafs -- Success Probabilities By Region

In no individual region do we have a shooting percentage that's significantly different from the league average. But the right plot is a rough proxy for goaltender ability, and Bernier/Reimer did mildly better than league average at stopping close-up shots.

Ottawa Senators -- Shot Rates and Success Probabilities By Region


Ottawa both attempted and allowed a large number of shots, though Ottawa's shots were on the perimeter -- the clear difference is in the scoring chances allowed in which they were dominated. The relative flatness of the shooting percentages suggests that this isn't a scorer's bias in awarding extra shot attempts.

In purely trivial matters, Ottawa allowed very few goals from their left point, but the threshold for "significant" would be crossed with roughly 3 more in total.

Los Angeles Kings -- Shot Rates and Success Probabilities By Region

LA's strength this season has been a uniform and relative decrease in shots allowed in all regions. If there is a slight overcount of shot attempts at the Staples center (as suggested by the low success rate for either team) then the bias isn't a major factor. I now kind of wish I had picked them in my pool to beat the Sharks in Round 1.

New Jersey Devils -- Shot Rates and Success Probabilities By Region


The Devils were also excellent at preventing shot attempts especially in the vulnerable slot area, and slightly above average at getting shot attempts in the slot themselves. They showed a surprisingly poor shooting percentage in the mid-slot region, also in the previous season. Goaltending was at just about the league average for slot shooting.

Further Examples

The applet used to create these plots was built in R Shiny, and is currently running here for anyone who would like to try the demo. The data extends back to the 2008-2009 season and has both event strength and 5-on-4 power-play and short-handed situations.

Update 4-30-14 10:05 AM: There have (of course) been other plots out there for this data. Brian Macdonald points out a few of these:

Update 4-30-14 1:42 PM: Schuckers has links to his spatially smoothed versions of save percentages, which he presented at Sloan in a previous year. 


Recent Entries

Whither Location? Shot-Based Statistics Don't Just Measure Possession in Hockey
Hockey writing these days is peppered with references to Corsi and Fenwick, which are fancier names for the differential of…
Pucksberry: Adapting Hexagonal Bin Plots for NHL Display
One of my favorite classes of statistical graphics in sports media is the hexagonal bin plot, used by Grantland's Kirk…
nhlscrapr: An R package whose purpose is right there in the name
In putting together the game data from the NHL for the games we needed, my students and I (namely Sam…