What's Pulling The Goalie Actually Worth To A Team?

Summary: It's an innovation with no monetary cost, so let's figure out the actual gain to pulling the goalie earlier, and show that there's little harm to changing.

So apparently I can complain about progress a little bit. Pulling the goalie is one of my favorite topics in analytics for many reasons, but the biggest is that it feels like the easiest sell to make to teams as to why they should trust data-driven analysis: a change in strategy that costs no money to implement, no new assets to acquire and no new technology to trust.

When I deconstructed the pulled-goalie timing data even further, it became clear that the driving force was not earlier pulls in one-goal games but in two-goal games. Here are all the teams' average times divided by season and score differential when the goalie was pulled at even strength:

"Data Science" Is A Useful Label, Even If It's Usually 5% Science

(Summary: I embrace the term Data Science because it lets us nurture a number of underappreciated talents in our students.)

One of the least developed skills that you'll find in the profession of statistics is how to name something appealingly. The discipline itself is a victim of that; not only is it less sexy than most of its competitors, the word has both plural and singular meanings. The plural is how the discipline is seen from the outside: a dry collection of summaries and figures boiled down to fit on the back of a baseball card, rather than its "singular" meaning: how the deal with uncertainty in data collection in a principled manner, which is a skill that, frankly, everyone we know can use

The meaning comes out a bit better when you call it "the discipline of statistics", or "probability and statistics", which is connected but not identical, or the sleep-inducing "theoretical statistics" or seemingly redundant (but far cooler) "applied statistics". The buzz 10 years ago was to call it "statistical science", as if our whole process was governed by the scientific method, when math is developed by proof and construction and rarely by experiment or clinical observation.

We're seeing the whole thing cook up now with the emergence of the term Data Science, which again seems to have multiple meanings, depending on who you ask:

1) "Data Science" is a catch-all term for probabilistic inference and prediction, emerging as a kind of compromise to the statistics and machine learning communities. An expert in this kind of data science should be familiar with both inference and prediction as the end goal. This seems to be the term favored by academics, particularly in how they market these tools as the curriculum for Master's programs.

2) A "data scientist" is a professional who can manage the flow of data from its collection and initial processing into a form usable for standard inference and prediction routines, then report the results of these routines to a decision maker. This definition of "data science" as the process by which this happens is favored by people in industry. The idea that the source of this data should be "Big" is often assumed but not necessary.

It also doesn't help that the term has been coined at least 3 times in the past 10 years by 4 different people, each with a stake in making their definition stick; and as I will hammer in, isn't really science, but is so essential *to* good science that I'm willing to give it a mulligan.

So why would I step into what looks like a silly semantic debate? Partly because I'm paid to. I'm teaching these skills to multiple audiences, and over the course of the past year, two books by colleagues of mine have been published by O'Reilly: "Data Science for Business" by NYU professor Foster Provost and quasi-academic Tom Fawcett, and "Doing Data Science" by industry authorities Rachel Schutt and Cathy O'Neil. Both came about because of courses with the words "Data Science" in the title, at NYU and Columbia respectively; both make excellent reading for people who want to work with data in any meaningful capacity but like me prefer an informal style; and both will be on the recommended list when I teach R for Data Science again in the spring of 2014. It is also no accident that the content of Data Science for Business hews closer to the academic definition, and Doing Data Science, with its multiple contributions from industry specialists, lines right up with the industry definition.

The fact that I teach such a broad range of students, many of whom are very smart but technically inxperienced, is what's motivated me to think more deeply about process and less about particular skills. I'd have to guess that at best, the work I can do that I would call "science" is no more than a quarter of my total output. Yes, I build models, make inferences and predictions and design experiments, but the actual engineering I do is the clear dominating factor; I write code according to design principles as much as scientific thinking -- if I know a quick routine will take one-tenth the time but be 95% as accurate as a slower but more correct routine, I'll weigh which method to use in the long run by some other function.

For all these reasons, we should probably call it Data Engineering (or Data Flow Management) but we're stuck with Data Science as a popular, job defining label. Far from an embarrassment of language (says the man who has effectively admitted that his blog's name is exaggerated by a factor of four), my preferred interpretation of a Data Scientist takes the best part of the previous two:

3) Someone who is *trained* to examine unprocessed data, learn something about its underlying structural properties, construct the appropriate structured data set(s), uses those to fit inferential or predictive models (possibly of their own design) and effectively report on the consequences is someone who has earned the title of Data Scientist.

What I've seen in all my time in academia is the assumption that these ancillary skills are necessary but can -- if not should -- be self-taught, particularly for PhD students but even for MS students and undergraduates. Cosma's got it exactly right that any self-respecting graduate of our department should have those skills, but we never explicitly test them on it or venerate those students who prove it. And if the problem is getting rid of the posers, we need to do a lot better when it comes to emphasizing this in our culture. To add another term to the stew, do we need to emphasize Data Literacy as an explicit skill? Or would it not be easier to appropriate Data Science as a term that gets down to brass tacks?

Skating Toward Progress, 2.5 seconds Per Year

I tuned in during third period action to watch the Avalanche play the Devils last night, while the Avs trailed 1-0 and realized I might see something special: Avs coach Patrick Roy pulling his goaltender earlier than other coaches would do. And of course, I looked away too early to see it actually happen, but there it was: Roy pulled J.S. Giguere with two and a half minutes to go in regulation, the Avs tied the game and won it in overtime. As someone convinced that NHL teams are far too conservative when it comes to pulling the goalie, that's one data point of vindication for pulling the goaltender earlier in the game! Right?

Well, sort of. While Roy's been known to pull the trigger far earlier than most, in his postgame comments he credited it to his instincts rather than his calculations: "sometimes you go with your feeling when to pull the goalie and fortunately it worked for us."

Still, Roy's Avalanche easily have the earliest empty-net trigger of any team in the last decade when trailing by a single goal in any end game situation:


The mean pull-time has also increased over the decade, from 61 seconds in 2002-2003 to 86 seconds through this season (not including last night's game), but no team has yet to approach the 3-minute mark in their average empty-net time -- the amount of time that most average Poisson-type models suggest is the minimum for this situation -- and only two are over the 2-minute mark at all. Still, I can't complain about progress!

Reflections on Teaching: Fall 2013

I last wrote a teaching statement 3 years ago, and the number of things that has changed in the meantime is considerable. I've now taught lecture classes for undergrads, master's students and doctoral candidates, supervised individual projects, served on dissertation committees in several departments and co-authored multiple papers with students. As I think across all those experiences, there are things I've taken to heart and others I've considered and discarded; times I've taken chances and times I've played it safe.

Beyond that, technology has come a long way since then in terms of its immediate appli-
cability in the classroom, and when to take advantage of that has also become a key question. What follows is my experiences in that time and how they've affected my perspectives, with
examples from the classes I've taught - particularly the two courses I recently concluded
teaching, in Statistical Graphics and Programming in R.

Elsevier Bought Mendeley; Internet Freaks Out; I'm Barely Surprised

I love it when my nerdiest pastime and professional interest -- bibliometrics and academic paper management -- makes the news in a big way. I like it more when it's direct evidence of all the issues that academia faces as a public good.

Mendeley is a  "freemium" service for managing collections of academic papers, offering a cloud-based storage service for personal libraries. Its users have considerable affection for the service, whose management team has proclaimed their dedication to the Open Access movement. In the process, and in contrast, the company has built an impressively large database on user activity, one that was kept to itself rather than being available to its users.

Which is why the backlash to its purchase by Elsevier, a company that takes advantage of our public good for its private enirchment, strikes me as extremely naive. Mendeley's supposed commitment to an open access movement was already betrayed by their Facebook-like business model.

I'm less shocked since this is only the latest in a series of "betrayals" by companies supposedly behind principles of openness:

Combine this with the recent rise of "predatory" journals, and you can see why my worry has less to do with any individual companies and much more about the need to solidify the process of scientific communication as a public good.

Resigned To Change

What follows: I resign from two editorial boards on principle. I don't feel heroic about it, but it had to be done.

Last year, I signed the Elsevier boycott as soon as it was announced. I firmly believed at the time that the principles of the boycott were sound: this was a company that had historically charged obscene prices, and made extreme profits, by selling other people's work with cartel-like levels of market control. I knew how this made sense in the past -- as both a filter and a distribution source, academics had little choice but to work with for-profit publishing companies. But now, the situation borders on the absurd. To make an example out of one of the biggest publishers seemed almost automatic, and I joined the official boycott without hesitation, in addition to years of avoiding Elsevier journals to publish my own work.

All that's needed for the system to work without big publishing companies is an environment of open publication, and so I've enthusiastically submitted my work to society journals and others with principles of openness. One of these was the Berkeley Electronic Press (bepress), which as a non-profit electronic publisher, committed to open access, promised a way forward: with the Internet as the ultimate distribution venue, all that would be needed is an editorial structure, handled as it has been by academics, the vast majority of whom work pro bono.

And so I joined two such efforts; first, the nascent journal Statistics, Politics and Policy, still in its infancy, in 2010; and second, the slightly more venerable Journal of Quantitative Analysis in Sports, which (to my delight, as a long time author and reader) I was asked to join roughly a year ago. Both have sterling editorial boards (aside from me) and I've enjoyed my time and efforts with both groups. But things got complicated in September 2011, when for-profit publisher De Gruyter announced that it was buying many bepress journals, including both SPP and JQAS. Originally it seemed as though little would change; my back-channel inquiries suggested that the new bosses wanted to change very little from the original bepress setup, which is why I was comfortable joining JQAS after the transition.

The Statistical Properties of the Electoral College Are Perfectly Bearable

What follows: I give a not-so-ringing endorsement of the Electoral College, by showing that the current mode has reasonable partisan symmetry. I'd still prefer a scheme with the national popular vote, but what we've got ain't so broke.

Andrew Gelman
, Gary King, Jonathan Katz and I published an article on the Electoral College just in time to miss the 2012 US Presidentlal election (here from SSRN and here from the journal website) but apparently just in time to catch the reactions of people complaining about how the election went. Last week, news broke that a group of Virginia politicians wanted to reapportion their state's electoral votes by congressional district, echoing similar attempts in Pennsylvania in 2012 and California in 2008, making it clear that the issue isn't going away any time soon.

In brief, we quantified how much partisan bias there has been in the electoral college system as it stands today (essentially none), if certain states reapportioned in this matter (it depends on the state), and if all states did so (it would have been substantially biased towards the Republicans). In extending the analysis for this post, we find that the Electoral College had no meaningful partisan bias in the 2012 election either.

Digital Publishing Isn't Harming Science, It's Liberating It

It's somewhat appropriate that a complaint from a scientific authority on the decay of scientific publishing should be circulated on the Huffington Post, whose legions of unpaid bloggers gain only exposure for their efforts; how closely it parallels the history of scientists, working without pay, as both content producers and vetters, and what it means for the future. Douglas Fields' comment on scientific publishing (thanks, Simply Statistics!) has the facts right, but the conclusions he draws are contradicted by the very nature of the system he's trying to assault.

The key to it all is the nature of peer review:

538's Uncertainty Estimates Are As Good As They Get

(or, in which I finally do an analysis of some 2012 election data)

Many are celebrating the success of the poll aggregators who forecasted the states won by each candidate -- many called all 50 right, including FiveThirtyEight. No doubt Nate Silver will continue to be the world's most famous meta-analyst given this accomplishment -- even though several of his peers, such as the Princeton Election Consortium, Votamatic and Simon Jackman's projections for the Huffington Post, seemed to do equally well. The strength and depth of the number of polls in swing states no doubt had a lot to do with all their successes.

How much of an accomplishment this is, of course, depends on context; the winner in most states was easily predicted ahead of time with the barest minimum of polling. Consider instead a related question: how close were the vote shares in each state to the prediction, as a function of the margin of error?

The simplest way to check this is to calculate a p-value for each prediction: for each prediction and its associated uncertainty, calculate the probability that the observed value (vote share) is greater than a simulated draw from this distribution. The key is that for a large number of independent prediction-uncertainty pairs, we should see a uniform distribution of p-values between 0 and 1.

I grabbed the estimates from FiveThirtyEight and Votamatic (at this time, I have only estimates, not uncertainties, for PEC or HuffPost) and calculated the respective p-values assuming a normal distribution in each case. Media coverage suggested that Nate Silver's intervals were too conservative; if this were the case, we would expect a higher concentration of p-values around 50%. (If too anti-conservative, the p-values would be more extreme, towards 0 or 1.)

On the contrary, the 538 distribution is nearly uniform. The closer the points are to the diagonal, the better the fit to the uniform:

Repeating the process for Votamatic:

The values are pushed towards zero and one, so the confidence intervals are far too tight: the Votamatic predictions turned out to be too overly precise.

The data I used are here. (I read the Votamatic intervals directly off the graphs; if I can get a more precise value, I'll repeat the analysis.) I'm very curious to know how the other meta-pollsters did, so if anyone has put together that data, please send it my way.

The Journal System and Statistical Publishing

David Banks has some notions about how to evolve the peer review system, specifically for publishing in statistics. Not surprisingly, I agree with him about most things, namely the rise of the Internet as giving rise to many more creative options for outlet.

One of the trickier things to figure out is whether or not article quality would be upheld under a new system. Quoth Banks:

Article quality can be signaled in multiple ways, either by conventional review or by ungameable rating systems, similar to page-ranking algorithms.
Conventional review has its benefits, but I'm not sure we have a good way of instituting this yet. And no system is ungameable, even PageRank (think "miserable failure"), but as long as there's effort put into it by the community, there's hope.


Recent Assets

  • goalie-time-2.png
  • goalie-time-1.png
  • goalie-time.png
  • 4-pbplot.png
  • 3-pbplot.png
  • 2-pbplot.png
  • 1-pbplot.png
  • votamatic-uniform-plots.png
  • 538-uniform-plots.png
  • sim-graphs.png


Recent Entries

Scrabble Cheating
News of a cheating scandal in Scrabble has rippled through the community, after a competitor (proverbially) hid the blanks up…
Our New Hockey Modelling Paper: How Much Better Are Some Players Than Others?
Following Sam's presentation at this year's JSM, we are proud to release our preprint for consumption:A.C. Thomas, Samuel L. Ventura,…
Well, This Is Embarrassing...
I posted a preprint earlier today for a comment on a discussion paper on social network outcomes, but found out…