<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
    <title>A.C. Thomas, Scientist</title>
    <link rel="alternate" type="text/html" href="http://www.acthomas.ca/comment/" />
    <link rel="self" type="application/atom+xml" href="http://www.acthomas.ca/comment/atom.xml" />
    <id>tag:www.acthomas.ca,2009-10-17:/comment//1</id>
    <updated>2013-04-10T22:41:08Z</updated>
    
    <generator uri="http://www.sixapart.com/movabletype/">Movable Type Pro 4.32-en</generator>

<entry>
    <title>Elsevier Bought Mendeley; Internet Freaks Out; I&apos;m Barely Surprised</title>
    <link rel="alternate" type="text/html" href="http://www.acthomas.ca/comment/2013/04/elsevier-bought-mendeley-internet-freaks-out-im-barely-surprised.html" />
    <id>tag:www.acthomas.ca,2013:/comment//1.57</id>

    <published>2013-04-09T23:07:42Z</published>
    <updated>2013-04-10T22:41:08Z</updated>

    <summary>I love it when my nerdiest pastime and professional interest -- bibliometrics and academic paper management -- makes the news in a big way. I like it more when it&apos;s direct evidence of all the issues that academia faces as...</summary>
    <author>
        <name>Andrew C. Thomas</name>
        
    </author>
    
    
    <content type="html" xml:lang="en-US" xml:base="http://www.acthomas.ca/comment/">
        <![CDATA[I love it when my nerdiest pastime and professional interest -- bibliometrics and academic paper management -- makes the news in a big way. I like it more when it's direct evidence of all the issues that academia faces as a public good. <br /><br /><a href="http://www.mendeley.com/">Mendeley</a> is a&nbsp; "freemium" service for managing collections of academic papers, offering a cloud-based storage service for personal libraries. Its users have considerable affection for the service, whose management team has <a href="http://blog.mendeley.com/tag/open-access/">proclaimed their dedication to the Open Access</a> movement. In the process, and in contrast, the company has built an impressively large database on user activity, one that was kept to itself rather than being available to its users. <br /><br />Which is why <a href="http://blog.mendeley.com/start-up-life/team-mendeley-is-joining-elsevier/">the backlash to its purchase by Elsevier</a>, a company that takes advantage of our public good for its private enirchment, strikes me as extremely naive. Mendeley's supposed commitment to an open access movement was already betrayed by their Facebook-like business model.<br /><br />I'm less shocked since this is only the latest in a series of "betrayals" by companies supposedly behind principles of openness:<br /><br /><ul><li><a href="http://news.papersapp.com/2012/11/papers-springing-into-the-future/">Papers, a key Mendeley competitor, was bought by Springer, a key Elsevier rival.</a></li><li><a href="http://blog.uta.edu/~bradley/2011/09/19/bepress-sells-out/">Berkeley Electronic Press, a non-profit publisher, sold their journal stable to Elsevier wannabe De Gruyter.</a></li></ul><p>Combine this with the recent rise of <a href="http://www.nytimes.com/2013/04/08/health/for-scientists-an-exploding-world-of-pseudo-academia.html">"predatory" journals</a>, and you can see why my worry has less to do with any individual companies and much more about the need to solidify the process of scientific communication as a public good. <br /></p>]]>
        
    </content>
</entry>

<entry>
    <title>Resigned To Change</title>
    <link rel="alternate" type="text/html" href="http://www.acthomas.ca/comment/2013/02/resigned-to-change.html" />
    <id>tag:www.acthomas.ca,2013:/comment//1.56</id>

    <published>2013-02-04T19:41:42Z</published>
    <updated>2013-02-10T19:43:07Z</updated>

    <summary>What follows: I resign from two editorial boards on principle. I don&apos;t feel heroic about it, but it had to be done. Last year, I signed the Elsevier boycott as soon as it was announced. I firmly believed at the...</summary>
    <author>
        <name>Andrew C. Thomas</name>
        
    </author>
    
    
    <content type="html" xml:lang="en-US" xml:base="http://www.acthomas.ca/comment/">
        <![CDATA[<i>What follows: I resign from two editorial boards on principle. I don't feel heroic about it, but it had to be done.</i> <br /><br />Last year, I signed <a href="http://thecostofknowledge.com/">the Elsevier boycott</a> as soon as it was announced. I firmly believed at the time that the principles of the boycott were sound: this was a company that had historically charged obscene prices, and made extreme profits, by selling other people's work with cartel-like levels of market control. I knew how this made sense in the past -- as both a filter and a distribution source, academics had little choice but to work with for-profit publishing companies. But now, the situation borders on the absurd. To make an example out of one of the biggest publishers seemed almost automatic, and I joined the official boycott without hesitation, in addition to years of avoiding Elsevier journals to publish my own work.<br /><br />All that's needed for the system to work without big publishing companies is an environment of open publication, and so I've enthusiastically submitted my work to society journals and others with principles of openness. One of these was the Berkeley Electronic Press (bepress), which as a non-profit electronic publisher, committed to open access, promised a way forward: with the Internet as the ultimate distribution venue, all that would be needed is an editorial structure, handled as it has been by academics, the vast majority of whom work <i>pro bono</i>.<br /><br />And so I joined two such efforts; first, the nascent journal Statistics, Politics and Policy, still in its infancy, in 2010; and second, the slightly more venerable Journal of Quantitative Analysis in Sports, which (to my delight, as a long time author and reader) I was asked to join roughly a year ago. Both have sterling editorial boards (aside from me) and I've enjoyed my time and efforts with both groups. But things got complicated in September 2011, when for-profit publisher De Gruyter announced that it was buying many bepress journals, including both SPP and JQAS. Originally it seemed as though little would change; my back-channel inquiries suggested that the new bosses wanted to change very little from the original bepress setup, which is why I was comfortable joining JQAS after the transition.<br />]]>
        <![CDATA[Now that I've observed the whole process for the past year, I've come to believe that de Gruyter's ownership of these journals does far more harm than good to the academic world. The entire pre-acceptance editorial operation of these journals is done by academic volunteers, and distribution no longer requires hefty postage -- even the most basic online storage is inexpensive. I have one article that went through the publication process at SPP, and to my eye, the folks who typeset it made it worse than my original LaTeX PDF proof.<br /><br />The sticking point for me came in July when we discussed the company's policy on preprints at the editorial board meetings. If the main contribution of a journal today is to improve the appearance and readability of an article, then it makes sense that a journal should allow preprints of work to be held by sites like arxiv.org, since that increases the incentive to get a fresh, improved copy from a journal website. (In fact, I rather like the idea that copyright in the content should be separate from copyright of the presentation -- it makes it clear who did what work.) <br /><br />I was surprised to learn the de Gruyter policy on this: that they would prefer no traces of the article as preprint to be in the wild, and that they would permit a personal copy of the final proof to be on an individual's website -- but not on any public archive sites. In today's academic climate, restricting the preprint market like this is detrimental to new science. Individual websites are rough and poorly indexed; public archive sites, claiming only the barest of rights, can handle much of this burden at a low cost. The rise in importance of arxiv.org as a home for department tech reports (including <a href="http://www.stat.cmu.edu/">my current employer</a>) accentuates this.<br /><br />Helping de Gruyter make money off my volunteer work might have advantages for my career, but it's bad for science to keep this kind of power in the hands of for-profit companies when the alternatives are so compelling, and when so much of the funding for our work comes from public sources. I know that it has to be a lot cheaper than the current system; we just need to figure out how to make it work on a grander scale.<br /><br />In the end, as an early-career scientist, there's little I can do to change the course of the journals from within. I'd rather focus my efforts onto matters that can push academic publishing towards more open publication -- keeping more rights for authors and the people who pay for the research, and. And so I decided to resign from the boards of both SPP and JQAS as of the end of last year, not because of any of the people involved -- I have warm feelings for both editorial boards, and I personally like the de Gruyter reps we've worked with -- but because it's counterproductive for me to continue in that capacity.<br /><br /><b>Update:</b> To my astonishment, I received notice this week that the senior editors of SPP are all resigning from the journal and are trying to found a new, similar journal under the auspices of the American Statistical Association. I'm fairly certain that my resignation didn't push anyone on this, but it was very comforting to know that they had similar misgivings to the arrangement as I did.<br />]]>
    </content>
</entry>

<entry>
    <title>The Statistical Properties of the Electoral College Are Perfectly Bearable</title>
    <link rel="alternate" type="text/html" href="http://www.acthomas.ca/comment/2013/01/the-statistical-properties-of-the-electoral-college-are-perfectly-bearable.html" />
    <id>tag:www.acthomas.ca,2013:/comment//1.55</id>

    <published>2013-01-28T17:17:08Z</published>
    <updated>2013-01-28T23:12:35Z</updated>

    <summary>What follows: I give a not-so-ringing endorsement of the Electoral College, by showing that the current mode has reasonable partisan symmetry. I&apos;d still prefer a scheme with the national popular vote, but what we&apos;ve got ain&apos;t so broke.Andrew Gelman, Gary...</summary>
    <author>
        <name>Andrew C. Thomas</name>
        
    </author>
    
    
    <content type="html" xml:lang="en-US" xml:base="http://www.acthomas.ca/comment/">
        <![CDATA[<i>What follows: I give a not-so-ringing endorsement of the Electoral College, by showing that the current mode has reasonable partisan symmetry. I'd still prefer a scheme with the national popular vote, but what we've got ain't so broke.</i><br /><a href="http://www.andrewgelman.com/"><br />Andrew Gelman</a>, <a href="http://gking.harvard.edu/">Gary King</a>, <a href="http://jkatz.caltech.edu/">Jonathan Katz</a> and I published an article on the Electoral College just in time to miss the 2012 US Presidentlal election (<a href="http://www.washingtonpost.com/politics/2013/01/24/430096e6-6654-11e2-85f5-a8a9228e55e7_story.html">here from SSRN</a> and <a href="http://www.degruyter.com/view/j/spp.2013.3.issue-3/issue-files/spp.2013.3.issue-3.xml">here from the journal website</a>) but apparently just in time to catch the reactions of people complaining about how the election went. Last week, news broke that a group of Virginia politicians wanted to reapportion their state's <a href="http://www.washingtonpost.com/politics/2013/01/24/430096e6-6654-11e2-85f5-a8a9228e55e7_story.html">electoral votes by congressional district</a>, echoing similar attempts in Pennsylvania in 2012 and California in 2008, making it clear that the issue isn't going away any time soon.<br /><br />In brief, we quantified how much partisan bias there has been in the electoral college system as it stands today (essentially none), if certain states reapportioned in this matter (it depends on the state), and if<i> all states</i> did so (it would have been substantially biased towards the Republicans). In extending the analysis for this post, we find that the Electoral College had no meaningful partisan bias in the 2012 election either.<br />]]>
        <![CDATA[These recent reapportionment efforts have undertaken by Republican operatives in states 
where the Democratic candidate has frequently won the state, effectively
 unbalancing the whole system under the guise of balancing a single 
state. Our reaction to this wasn't so much whether we thought it was a 
blatant but short-sighted partisan power grab (<a href="http://www.washingtonmonthly.com/ten-miles-square/2013/01/rigging_the_electoral_college042666.php">of course it is</a>)
 but what its effect would be on the entire system if more states did 
this. More to the point, we wanted to check the state of the 
system as it was at each election, according to the simple question of partisan 
symmetry:<br /><br /><blockquote>If one party in a two-party system receives X% of the vote and Y% of the seats, then in the hypothetical situation that the other candidate receives X% of the vote, they should also receive Y% of the seats.</blockquote>Replace
 "party" with "presidential candidate" and "seats" with "electoral 
votes" and you get to&nbsp; the heart of it; see the paper for details on how we estimate partisan bias. The easiest application of this is 
if the overall popular vote is tied, then they should expect to receive 
an equal number of electoral votes. Is this condition present now? Would
 it be if California or other states split their votes by district? What
 if <i>every state</i> did it that way?<br /><br />The paper contains an analysis for each election between 1956 and 2008; for this post, I re-ran the analysis adding <a href="http://www.dailykos.com/story/2012/11/19/1163009/-Daily-Kos-Elections-presidential-results-by-congressional-district-for-the-2012-2008-elections">preliminary data from 2012</a>
 (with a little imputation for as-yet unreported districts) and 
calculated the effective partisan bias for the election. Zero indicates a
 symmetric system; a bias of 1 (or -1) would indicate that if the vote 
were split evenly, the Democratic (or Republican) candidate would win 
all of the electoral votes available. <br /><br />As the system stands right now, things are basically fair, and have been from 1980 onwards -- closest to zero is fairest:<br /><br /><a href="http://www.acthomas.ca/comment/assets_c/2013/01/1-pbplot-11.html" onclick="window.open('http://www.acthomas.ca/comment/assets_c/2013/01/1-pbplot-11.html','popup','width=1000,height=400,scrollbars=no,resizable=no,toolbar=no,directories=no,location=no,menubar=no,status=no,left=0,top=0'); return false"><img src="http://www.acthomas.ca/comment/assets_c/2013/01/1-pbplot-thumb-550x220-11.png" alt="1-pbplot.png" class="mt-image-none" style="" height="220" width="550" /></a><br /><br /><div>If California's electoral votes had been split by congressional district, there would have been some interesting consequences -- not nearly enough of a bump in 1980 to re-elect Jimmy Carter, but a consistent Republican edge ever since.<br /><br /><a href="http://www.acthomas.ca/comment/assets_c/2013/01/2-pbplot-14.html" onclick="window.open('http://www.acthomas.ca/comment/assets_c/2013/01/2-pbplot-14.html','popup','width=1000,height=400,scrollbars=no,resizable=no,toolbar=no,directories=no,location=no,menubar=no,status=no,left=0,top=0'); return false"><img src="http://www.acthomas.ca/comment/assets_c/2013/01/2-pbplot-thumb-550x220-14.png" alt="2-pbplot.png" class="mt-image-none" style="" height="220" width="550" /></a><br /><br />Suppose we counterweighed this by changing historically Republican Texas to Congressional district apportionment. It would have helped, but not nearly enough to balance the scale:<br /><br /><a href="http://www.acthomas.ca/comment/assets_c/2013/01/3-pbplot-18.html" onclick="window.open('http://www.acthomas.ca/comment/assets_c/2013/01/3-pbplot-18.html','popup','width=1000,height=400,scrollbars=no,resizable=no,toolbar=no,directories=no,location=no,menubar=no,status=no,left=0,top=0'); return false"><img src="http://www.acthomas.ca/comment/assets_c/2013/01/3-pbplot-thumb-550x220-18.png" alt="3-pbplot.png" class="mt-image-none" style="" height="220" width="550" /></a><br /></div><div>Now, suppose every state split their electoral votes by Congressional district. The edge is consistently Republican, even today:<br /><br /><a href="http://www.acthomas.ca/comment/assets_c/2013/01/4-pbplot-21.html" onclick="window.open('http://www.acthomas.ca/comment/assets_c/2013/01/4-pbplot-21.html','popup','width=1000,height=400,scrollbars=no,resizable=no,toolbar=no,directories=no,location=no,menubar=no,status=no,left=0,top=0'); return false"><img src="http://www.acthomas.ca/comment/assets_c/2013/01/4-pbplot-thumb-550x220-21.png" alt="4-pbplot.png" class="mt-image-none" style="" height="220" width="550" /></a><br />In the end, even if other states counter-balanced each other to try and even things out, it would probably make things worse. As things stand, the status quo of the Electoral College is adequate without any kind of large-scale modification, so far as we can predict.<br /></div>]]>
    </content>
</entry>

<entry>
    <title>Digital Publishing Isn&apos;t Harming Science, It&apos;s Liberating It</title>
    <link rel="alternate" type="text/html" href="http://www.acthomas.ca/comment/2012/11/digital-publishing-isnt-harming-science.html" />
    <id>tag:www.acthomas.ca,2012:/comment//1.54</id>

    <published>2012-11-27T04:20:48Z</published>
    <updated>2012-11-27T08:25:30Z</updated>

    <summary>It&apos;s somewhat appropriate that a complaint from a scientific authority on the decay of scientific publishing should be circulated on the Huffington Post, whose legions of unpaid bloggers gain only exposure for their efforts; how closely it parallels the history...</summary>
    <author>
        <name>Andrew C. Thomas</name>
        
    </author>
    
    
    <content type="html" xml:lang="en-US" xml:base="http://www.acthomas.ca/comment/">
        <![CDATA[It's somewhat appropriate that a complaint from a scientific authority on the decay of scientific publishing should be circulated on the Huffington Post, whose legions of unpaid bloggers gain only exposure for their efforts; how closely it parallels the history of scientists, working without pay, as both content producers and vetters, and what it means for the future. <a href="http://www.huffingtonpost.com/dr-douglas-fields/50-shades-of-grey-in-scientific-publication-how-digital-publishing-is-harming-science_b_2155760.html">Douglas Fields' comment on scientific publishing</a> (thanks, <a href="http://simplystatistics.org/2012/11/25/sunday-datastatistics-link-roundup-11252012">Simply Statistics</a>!) has the facts right, but the conclusions he draws are contradicted by the very nature of the system he's trying to assault.<br /><br />The key to it all is the nature of peer review:<br />]]>
        <![CDATA[<br /><blockquote><p>A scientific discovery is useless if it is not communicated with 
authority to the scientific community.  For centuries scientists 
submitted their research findings for publication in scientific journals
 that were run by the leading scientists with expertise in a specialized
 field who served as journal editors.  The editors evaluated the 
submission, and if the findings appeared to be important and technically
 sound, they sought out other scientists around the world with 
recognized expertise in the area to read the manuscript critically and 
advise the editor and authors (anonymously) on its suitability for 
publication.  </p>

This process is essential to root out poor science and pseudoscience,
 and to prevent bogging down the advancement of science by cluttering 
the literature with contradictory and erroneous findings.  The expert 
peer reviewers evaluated the potential strengths, weaknesses, technical 
flaws, significance and novelty of the finding, and they suggested the 
need for further experiments.  If the study failed to be accepted for 
publication by the editor, the authors benefited from the editorial 
review process, and they revised their work for submission to another 
journal.<br /></blockquote>I'm with you, sir! This is the beauty of the 
peer review system, and the source of it isn't the paper it's printed 
on; it's the stamp of approval of the editorial board that matters. A 
quality board is a collection of distinguished members with noteworthy 
professional experience, combined with their past record of approving 
meaningful publications.<br /><br /><blockquote>Recent government-mandated changes in scientific publishing are 
undermining this critical process of validation in scientific 
publication.<br /></blockquote>And now he's lost me. Scientific validation
 is carried out by the editorial board and its referees -- the vast 
majority of whom are unpaid volunteers -- and abetted by a publisher, 
not controlled by one. The sharp drop in publishing costs from online 
publishing will only put more control in the hands of the academics who decide what's truly important.<br /><br />The
 first change to which he speaks -- a mandate that all papers should be 
openly accessible for all readers, if their research was funded by 
federal grants -- affects publishers, not editorial boards. While Fields
 defends the necessity of the publisher as the producer, editor and 
disseminator of research, he seems to underplay his own role as the <a href="http://journals.cambridge.org/action/displayJournal?jid=ngb">editor-in-chief of a journal</a>,
 one with the responsibility of seeking out the editorial board, 
ensuring the quality of the process, and so forth. It's true he doesn't 
copy-edit or type-set, but these tasks are getting cheaper all the time,
 and arguably, current publishers don't do that great a job of it.<br /><br />Fields
 is also conflating the two major models of Open Access publishing: 
"Green OA", which says that authors should archive their preprints on 
public sites (at little cost) is the PubMed approach, and doesn't take away whatever value that copy-editing, type-setting and large-scale printing adds."Gold OA", in which authors pay the 
publishers for the dissemination of their work, is the model pursued both by top-quality outfits (including CUP) and spamming bottom-feeders. That's why his
 second point -- that electronic publishing decreases the cost barriers 
to entry -- is on the mark. But I'm baffled by what follows in his personal testimonial: <br /><blockquote><br /> <em>Neuron Glia Biology</em> was a scientific journal that was launched
 in 2004 by me and like-minded scientists to advance scientific research
 on neuron-glia interactions, and it was published by Cambridge 
University Press until this year.    <em>Neuron Glia Biology</em> 
provided the opportunity for 1,400 authors to introduce their new 
research on neuron-glia interactions into the scientific literature, and
 it helped advance a new field of science, but no longer.<br /></blockquote>Again,
 I say: this commentary was published on the Huffington Post. For free. Whether or not it was more visible because of this service, the real stamp of 
approval comes not from being on this website, but from 
your peers in the community who judge your work. And those 1,400 authors 
will not stop writing, the editorial board of <i>Neuron Glia Biology</i>
 will still believe in their mission, and if it comes to it, finding an 
online-only home for a format won't change that -- I know it's easier in the mathematical sciences, but <a href="https://peerj.com/">biology isn't far behind.</a> The success of the enterprise comes down to the acceptance of the community first.<br /><br />Vanity
 journals might be going for a money grab, but so are Elsevier and 
Nature, both of which are hideously profitable thanks to their 
monopolistic tactics and reliance on free labour -- not to mention that 
CUP <a href="http://www.businessweekly.co.uk/printing-and-publishing/14815-cup-notches-10th-successive-year-of-growth">seems to be doing all right for itself</a>.
 The pressure from the community is exactly why I doubt that most 
scholars will fall for bad articles in true vanity journals -- and thanks to the exact peer forces that propel academia, if they do get any attention, the end result will be a humiliated, slightly poorer academic, not the end of the discipline as we know it.<br /><br />I sympathize with Dr. Fields' anxieties about the state of academic publication today, but I'm far more excited by the premise of technology to keep things fresh than I am about a corporate/government takeover of science. We just have to remember that we're still the ones in charge.<br />]]>
    </content>
</entry>

<entry>
    <title>538&apos;s Uncertainty Estimates Are As Good As They Get</title>
    <link rel="alternate" type="text/html" href="http://www.acthomas.ca/comment/2012/11/538s-uncertainty-estimates-are-as-good-as-they-get.html" />
    <id>tag:www.acthomas.ca,2012:/comment//1.53</id>

    <published>2012-11-07T13:01:35Z</published>
    <updated>2012-11-07T13:57:50Z</updated>

    <summary>(or, in which I finally do an analysis of some 2012 election data)Many are celebrating the success of the poll aggregators who forecasted the states won by each candidate -- many called all 50 right, including FiveThirtyEight. No doubt Nate...</summary>
    <author>
        <name>Andrew C. Thomas</name>
        
    </author>
    
    
    <content type="html" xml:lang="en-US" xml:base="http://www.acthomas.ca/comment/">
        <![CDATA[(or, in which I finally do an analysis of some 2012 election data)<br /><br />Many are celebrating the success of the poll aggregators who forecasted the states won by each candidate -- many called all 50 right, including <a href="http://fivethirtyeight.blogs.nytimes.com/">FiveThirtyEight</a>. No doubt Nate Silver will continue to be the world's most famous meta-analyst given this accomplishment -- even though several of his peers, such as the <a href="http://election.princeton.edu/">Princeton Election Consortium</a>, <a href="http://votamatic.org/">Votamatic</a> and Simon Jackman's projections for the <a href="http://www.huffingtonpost.com/simon-jackman">Huffington Post</a>, seemed to do equally well. The strength and depth of the number of polls in swing states no doubt had a lot to do with all their successes.<br /><br />How much of an accomplishment this is, of course, depends on context; the winner in most states was easily predicted ahead of time with the barest minimum of polling. Consider instead a related question: how close were the vote shares in each state to the prediction, as a function of the margin of error?<br /><br />The simplest way to check this is to calculate a p-value for each prediction: for each prediction and its associated uncertainty, calculate the probability that the observed value (vote share) is greater than a simulated draw from this distribution. The key is that for a large number of independent prediction-uncertainty pairs, we should see a uniform distribution of p-values between 0 and 1.<br /><br />I grabbed the estimates from FiveThirtyEight and Votamatic (at this time, I have only estimates, not uncertainties, for PEC or HuffPost) and calculated the respective p-values assuming a normal distribution in each case. Media coverage suggested that Nate Silver's intervals were too conservative; if this were the case, we would expect a higher concentration of p-values around 50%. (If too anti-conservative, the p-values would be more extreme, towards 0 or 1.)<br /><br />On the contrary, the 538 distribution is nearly uniform. The closer the points are to the diagonal, the better the fit to the uniform:<br /><br /><img alt="538-uniform-plots.png" src="http://www.acthomas.ca/comment/538-uniform-plots.png" class="mt-image-none" style="" height="400" width="400" /><br />Repeating the process for Votamatic:<br /><br /><img alt="votamatic-uniform-plots.png" src="http://www.acthomas.ca/comment/votamatic-uniform-plots.png" class="mt-image-none" style="" height="400" width="400" /><br /><div>The values are pushed towards zero and one, so the confidence intervals are far too tight: the Votamatic predictions turned out to be too overly precise.<br /><br /><a href="http://www.acthomas.ca/comment/results-with-error.csv">The data I used are here.</a> (I read the Votamatic intervals directly off the graphs; if I can get a more precise value, I'll repeat the analysis.) I'm very curious to know how the other meta-pollsters did, so if anyone has put together that data, please send it my way.<br /><br /></div>]]>
        
    </content>
</entry>

<entry>
    <title>The Journal System and Statistical Publishing</title>
    <link rel="alternate" type="text/html" href="http://www.acthomas.ca/comment/2012/10/the-journal-system-and-statistical-publishing.html" />
    <id>tag:www.acthomas.ca,2012:/comment//1.52</id>

    <published>2012-10-11T05:35:55Z</published>
    <updated>2012-10-11T14:23:47Z</updated>

    <summary>David Banks has some notions about how to evolve the peer review system, specifically for publishing in statistics. Not surprisingly, I agree with him about most things, namely the rise of the Internet as giving rise to many more creative...</summary>
    <author>
        <name>Andrew C. Thomas</name>
        
    </author>
    
    
    <content type="html" xml:lang="en-US" xml:base="http://www.acthomas.ca/comment/">
        <![CDATA[David Banks <a href="http://magazine.amstat.org/blog/2012/10/01/175-oct12/">has some notions about how to evolve the peer review system</a>, specifically for publishing in statistics. Not surprisingly, I agree with him about most things, namely the rise of the Internet as giving rise to many more creative options for outlet.<br /><br />One of the trickier things to figure out is whether or not article quality would be upheld under a new system. Quoth Banks:<br /><br /><blockquote>Article quality can be signaled in multiple ways, either by conventional
 review or by ungameable rating systems, similar to page-ranking 
algorithms.<br /></blockquote>Conventional review has its benefits, but I'm not sure we have a good way of instituting this yet. And no system is ungameable, even PageRank (think "<a href="http://en.wikipedia.org/wiki/Google_bomb">miserable failure</a>"), but as long as there's effort put into it by the community, there's hope.<br /><br /><br /><br />]]>
        
    </content>
</entry>

<entry>
    <title>Scrabble Cheating</title>
    <link rel="alternate" type="text/html" href="http://www.acthomas.ca/comment/2012/08/scrabble-cheating.html" />
    <id>tag:www.acthomas.ca,2012:/comment//1.51</id>

    <published>2012-08-16T05:53:51Z</published>
    <updated>2012-08-16T06:05:43Z</updated>

    <summary>News of a cheating scandal in Scrabble has rippled through the community, after a competitor (proverbially) hid the blanks up his sleeve during matches, leading to his subsequent disqualification. As he is a minor, his name is not being shared,...</summary>
    <author>
        <name>Andrew C. Thomas</name>
        
    </author>
    
    <category term="scrabble" label="Scrabble" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en-US" xml:base="http://www.acthomas.ca/comment/">
        <![CDATA[News of a <a href="http://www.nytimes.com/2012/08/16/sports/scrabble-community-rocked-by-cheating-at-tournament.html">cheating scandal in Scrabble</a> has rippled through the community, after a competitor (proverbially) hid the blanks up his sleeve during matches, leading to his subsequent disqualification. As he is a minor, his name is not being shared, thereby preventing us from asking why if he was going to cheat, why he couldn't have done a better job of it.<br /><br />Let me take the opportunity to remind tournament organizers everywhere that <a href="http://www.acthomas.ca/comment/2011/07/statistics-and-scrabble-together-at-last.html">the latent tile order design mechanism</a> could have prevented this travesty from happening. And all they would have had to do was spend tens of thousands of dollars to design and build the physical apparatus to make it happen, and tens of thousands more to outfit the entire tournament with them. But in the long run, shouldn't we do everything we can for the children?<br /><br />]]>
        
    </content>
</entry>

<entry>
    <title>Our New Hockey Modelling Paper: How Much Better Are Some Players Than Others?</title>
    <link rel="alternate" type="text/html" href="http://www.acthomas.ca/comment/2012/08/our-new-hockey-modelling-paper-how-much-better-are-some-players-than-others.html" />
    <id>tag:www.acthomas.ca,2012:/comment//1.50</id>

    <published>2012-08-09T04:40:14Z</published>
    <updated>2012-08-09T06:45:11Z</updated>

    <summary>Following Sam&apos;s presentation at this year&apos;s JSM, we are proud to release our preprint for consumption:A.C. Thomas, Samuel L. Ventura, Shane Jensen, Stephen Ma, &quot;Competing Process Hazard Function Models for Player Ratings in Ice Hockey&quot;, available from arXiv.Abstract: Evaluating the...</summary>
    <author>
        <name>Andrew C. Thomas</name>
        
    </author>
    
    <category term="appliedstatistics" label="applied statistics" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="hockey" label="hockey" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="nhl" label="NHL" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="playerratings" label="player ratings" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="stochasticprocess" label="stochastic process" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en-US" xml:base="http://www.acthomas.ca/comment/">
        <![CDATA[Following <a href="http://www.stat.cmu.edu/%7Esventura/">Sam's</a> presentation at this year's JSM, we are proud to release our preprint for consumption:<br /><br />A.C. Thomas, Samuel L. Ventura, Shane Jensen, Stephen Ma, "Competing Process Hazard Function Models for Player Ratings in Ice Hockey", <a href="http://arxiv.org/abs/1208.0799">available from arXiv</a>.<br /><blockquote><i>Abstract:</i> Evaluating the overall ability of players in the National Hockey League (NHL)
is a difficult task. Existing methods such as the famous "plus/minus" statistic
have many shortcomings. Standard linear regression methods work well when
player substitutions are relatively uncommon and scoring events are relatively
common, such as in basketball, but as neither of these conditions exists for
hockey, we use an approach that embraces these characteristics. We model the
scoring rate for each team as its own semi-Markov process, with hazard
functions for each process that depend on the players on the ice. This method
yields offensive and defensive player ability ratings which take into account
quality of teammates and opponents, the game situation, and other desired
factors, that themselves have a meaningful interpretation in terms of game
outcomes. Additionally, since the number of parameters in this model can be
quite large, we make use of two different shrinkage methods depending on the
question of interest: full Bayesian hierarchical models that partially pool
parameters according to player position, and penalized maximum likelihood
estimation to select a smaller number of parameters that stand out as being
substantially different from average. We demonstrate this on games through five
NHL seasons.
</blockquote>Our ultimate goal for this project was to first come up with a mathematically rigorous method for determining how players affected the outcomes of hockey games. As we are stochastic modellers by training, this to us meant finding a generative probability model for how these games may come to be. (I am a <a href="http://www.acthomas.ca/papers/act-jqas-2-1.pdf">fan of this approach</a> in hockey <a href="http://www.acthomas.ca/papers/act-jqas-3-3.pdf">for several reasons</a>.) We took the <a href="http://www.82games.com/comm30.htm">Rosenbaum</a>/<a href="http://www.math.usma.edu/people/Macdonald/hockey.php">Macdonald</a> approach of dividing the game into shifts, so that no players substitute for each other during each observational unit. We then took the outcome of each event to be whether or not one team scored a goal (or changed off some of their players) and automatically factored in how much time had elapsed. We also adjust for the fact that some players play much of their time together, and that some players play very little.<br /><br />There are a lot of things we can actually put in this model beyond player identifiers, like teams or pairs of players together -- so long as we're willing to wait for the solution to compute, which for all players over 5 seasons, can be on the order of a day using our current code. We discovered a few things that are of interest to hockey fans as well as statisticians and probabilists, but two jump out especially to me:<br /><br /><ul><li>Defencemen as a group are far more interchangeable than goalies or forwards are, at even strength. This is likely because they share most of their prime duty -- defence -- with the goaltender, who show much more variety in ability, whereas most of the burden on scoring belongs to the forwards. <br /></li><li>There are a few player pair combinations that are just plain awful together (rather than when they play apart) such as when Sidney Crosby and Evgeni Malkin played on the same line. The additional deficit to team defence was so big compared to any extra gain in offensive ability that it would be much more worth playing them separately.<br /></li></ul>Further results -- and plenty of tables! -- are available in the paper.<br /><br /><br /> 

]]>
        
    </content>
</entry>

<entry>
    <title>Well, This Is Embarrassing...</title>
    <link rel="alternate" type="text/html" href="http://www.acthomas.ca/comment/2012/07/well-this-is-embarrassing.html" />
    <id>tag:www.acthomas.ca,2012:/comment//1.49</id>

    <published>2012-07-11T02:04:21Z</published>
    <updated>2012-07-11T02:10:05Z</updated>

    <summary>I posted a preprint earlier today for a comment on a discussion paper on social network outcomes, but found out shortly afterward that the comments for this journal are to be embargoed until publication. So I&apos;ve complied, removed the paper...</summary>
    <author>
        <name>Andrew C. Thomas</name>
        
    </author>
    
    
    <content type="html" xml:lang="en-US" xml:base="http://www.acthomas.ca/comment/">
        <![CDATA[I posted a preprint earlier today for a comment on a discussion paper on social network outcomes, but found out shortly afterward that the comments for this journal are to be embargoed until publication. So I've complied, removed the paper from the preprint site and taken my blog post down for now. Once the embargo is lifted, I will re-post both along with other reactions. <br /><br />My sincerest apologies to those who linked to the post to now find it dead -- particularly those people I sent it to for their reaction.<br />]]>
        
    </content>
</entry>

<entry>
    <title>Prediction, The Big Discovery and Heartbreak</title>
    <link rel="alternate" type="text/html" href="http://www.acthomas.ca/comment/2012/04/prediction-the-big-discovery-and-heartbreak.html" />
    <id>tag:www.acthomas.ca,2012:/comment//1.46</id>

    <published>2012-04-08T19:23:19Z</published>
    <updated>2012-04-09T02:56:09Z</updated>

    <summary>It&apos;s a year old, but I only just heard the story of what happened to a baseball researcher I first read about 10 years ago:http://espn.go.com/blog/sweetspot/post/_/id/6835/voros-mccracken-changed-the-gamehttp://www.thepostgame.com/features/201101/sabermetrician-exile&quot;Voros&quot; McCracken had a particular insight about pitcher ability in the late 1990s, namely that a...</summary>
    <author>
        <name>Andrew C. Thomas</name>
        
    </author>
    
    
    <content type="html" xml:lang="en-US" xml:base="http://www.acthomas.ca/comment/">
        <![CDATA[It's a year old, but I only just heard the story of what happened to a baseball researcher I first read about 10 years ago:<br /><br /><a href="http://espn.go.com/blog/sweetspot/post/_/id/6835/voros-mccracken-changed-the-game">http://espn.go.com/blog/sweetspot/post/_/id/6835/voros-mccracken-changed-the-game</a><br /><a href="http://www.thepostgame.com/features/201101/sabermetrician-exile">http://www.thepostgame.com/features/201101/sabermetrician-exile</a><br /><br />"Voros" McCracken had a particular insight about pitcher ability in the late 1990s, namely that a subset of the data -- <a href="http://en.wikipedia.org/wiki/Defense_independent_pitching_statistics">"defense-independent" pitching statistics</a>-- was not only an excellent predictor of the runs allowed by a pitcher, but is also highly persistent from year to year. Others discovered the same principle around the same time, but it was <a href="http://www.baseballprospectus.com/article.php?articleid=878">the publication of this work</a> that got the attention: the strong claim that the differences between pitchers, on batted balls in play, were so small as to be ignored. While it wasn't quite correct -- there is some predictive power in the remaining information -- it was enough to change people's ideas about how the game works.<br /><br />And now, McCracken is out of baseball, applying analytical methods to undisclosed professional soccer clubs, having made a meager living while working for the Boston Red Sox in the early 2000s. As one of the figures I read about before going to grad school in statistics, it definitely got my attention when I heard this news.<br /><br />There are two things I take away from this whole story:<br /><br />1) After 12 years of exposure to real data and methods, I feel like it should have been *screamingly obvious* to find the best set of predictors of success, and an assignment I could give to an undergrad with existing databases. But would it be obvious enough to construct a narrative around it, and convince the public? I can think of a good explanation after the fact, that there's a lot of general uncertainty once the bat hits the ball, like the angle of attack, but I've had years to think about it.<br /><br /> <div>2) Any research I do on sports is for the sake of teaching, or just as a hobby, and the value it brings me to share something with the world is a nice bonus. I'm not under pressure to find a brilliant discovery to keep my job -- at least, not when it comes to sports. <br /><br /><a href="http://www.andrewgelman.com/">Gelman</a> has said that the most noteworthy discoveries these days aren't the small effects that come out with more data, but the big ones that everyone else just missed. This definitely qualified as one of those. Whatever the next big discovery, I'm sure we'll all think it was obvious years later, even if it wasn't.<br /></div>]]>
        
    </content>
</entry>

<entry>
    <title>Statistics and Scrabble, Together At Last</title>
    <link rel="alternate" type="text/html" href="http://www.acthomas.ca/comment/2011/07/statistics-and-scrabble-together-at-last.html" />
    <id>tag:www.acthomas.ca,2011:/comment//1.45</id>

    <published>2011-07-14T14:48:43Z</published>
    <updated>2011-07-14T15:15:35Z</updated>

    <summary>Sitting on my to-do list for a while now has been an exploration of Scrabble from an experimental design point of view; how to better design a tournament to make the variance as small as possible while still preserving the...</summary>
    <author>
        <name>Andrew C. Thomas</name>
        
    </author>
    
    <category term="experimentaldesign" label="experimental design" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="scrabble" label="scrabble" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en-US" xml:base="http://www.acthomas.ca/comment/">
        <![CDATA[Sitting on my to-do list for a while now has been an exploration of Scrabble from an experimental design point of view; how to better design a tournament to make the variance as small as possible while still preserving the appearance of the home game to its players. One goal was to figure out a way to carry out a true "duplicate" version of Scrabble so that multiple pairs could have access to the same tiles, rather than the <a href="http://en.wikipedia.org/wiki/Duplicate_Scrabble">currently popular version</a> in Europe that has no defensive element to it. <br /><br />I'm proud (relieved?) to say that I've finally <a href="http://arxiv.org/abs/1107.2456">finished the first draft </a>of this work for two-player head-to-head games, with a duplication method that ensures that if the game were repeated, each player would receive tiles from the reserve in the same sequence: think of the tiles being laid out in order (but unseen to the players), so that one player draws from the front and the other draws from the back. Like Lady and the Tramp with spaghetti:<br /><br /><img alt="tramp.jpg" src="http://www.acthomas.ca/comment/tramp.jpg" class="mt-image-none" style="" height="320" width="492" /><br />
     <div><br />I modified the Scrabble simulator <a href="http://www.quackle.com/">Quackle</a> to accept a predetermined tile order, then simulated over one million matches between Quackle's "Speedy Player"s using each of 10,600 tile orders 100 times. One goal of this was to figure out how much of the variance in score comes from the tile order and how much comes from the board, given that a tile order would be expected. It turns out to be about half-bag, half-board, so that if this scheme could be used in tournaments, it would visibly cut down the number of matches needed to figure out the best player (though it would need a <a href="http://en.wikipedia.org/wiki/Rube_Goldberg_machine">Goldbergian apparatus</a> to implement in live games.)<br /><br />Some other findings from the simulations:<br /><br /><ul><li>The blank is worth about 30 points to a good player, each S about 10.</li><li>The Q is a burden to whichever player receives it, effectively serving as a 5 point penalty for having to deal with it due to its effect in reducing bingo opportunities, needing either a U or a blank for a chance at a bingo and a 50-point bonus.<br /></li><li>The J is essentially neutral pointwise.<br /></li><li>The X and the Z are each worth about 3-5 extra points to the player who receives them. Their difficulty in playing in bingoes is mitigated by their usefulness in other short words.</li></ul>I have yet to make any other conclusions about how I think the game should be modified, mainly because it's premature without testing these ideas out on human players. Any volunteers?<br /></div>]]>
        
    </content>
</entry>

<entry>
    <title>Geek talk: Running R processes remotely through ESS</title>
    <link rel="alternate" type="text/html" href="http://www.acthomas.ca/comment/2011/04/geek-talk-running-r-processes-remotely-through-ess.html" />
    <id>tag:www.acthomas.ca,2011:/comment//1.44</id>

    <published>2011-04-17T04:47:14Z</published>
    <updated>2011-04-17T05:09:22Z</updated>

    <summary>I&apos;ve been wondering for a while how to use the convenience of Emacs on a local machine, while running R processors on a server remotely. According to http://www.xemacs.org/Documentation/packages/html/ess_3.html, this can be done with the ess-remote environment.1) Use M-x shell to...</summary>
    <author>
        <name>Andrew C. Thomas</name>
        
    </author>
    
    <category term="emacsspeaksstatistics" label="Emacs Speaks Statistics" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="r" label="R" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en-US" xml:base="http://www.acthomas.ca/comment/">
        <![CDATA[I've been wondering for a while how to use the convenience of Emacs on a local machine, while running R processors on a server remotely. According to <a href="http://www.xemacs.org/Documentation/packages/html/ess_3.html">http://www.xemacs.org/Documentation/packages/html/ess_3.html</a>, this can be done with the ess-remote environment.<br /><br />1) Use M-x shell to open a shell environment.<br /><br />2) Use ssh to connect to the server of interest. If desired, use "screen" to make the process resilient to disconnection.<br /><br />3) Run R.<br /><br />4) Use M-x ess-remote to enable the shell as an ESS process.<br /><br />The advantages of this process are the added convenience of an environment that allows instant execution of lines of R code. Files still must be loaded on the remote server, rather than referenced on the local machine, but it's still an improvement.<br /><br />]]>
        
    </content>
</entry>

<entry>
    <title>Big chances to change redistricting this year...</title>
    <link rel="alternate" type="text/html" href="http://www.acthomas.ca/comment/2011/02/big-chances-to-change-redistricting-this-year.html" />
    <id>tag:www.acthomas.ca,2011:/comment//1.43</id>

    <published>2011-02-06T20:20:43Z</published>
    <updated>2011-02-06T20:45:02Z</updated>

    <summary>...assuming we can take advantage of them. State legislatures are preparing to conduct their decennial redistricting processes, now that the data from the last census has been processed. In states where the legislature does it directly, the result is known...</summary>
    <author>
        <name>Andrew C. Thomas</name>
        
    </author>
    
    <category term="gelmanwasright" label="gelmanwasright" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="redistricting" label="Redistricting" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en-US" xml:base="http://www.acthomas.ca/comment/">
        <![CDATA[...assuming we can take advantage of them. <br /><br />State legislatures are preparing to conduct their decennial redistricting processes, now that the data from the last census has been processed. In states where the legislature does it directly, the result is known to be far less than fair for its constituents, in that either one party tries to take control of an "unfair" number of seats (<a href="http://en.wikipedia.org/wiki/2003_Texas_redistricting">remember the Oklahoma exodus!</a>) or the incumbents on both sides work to protect their own re-election prospects (<a href="http://en.wikipedia.org/wiki/Tip_O%27Neill">some people have all the luck</a>).<br /><br />There is hope that a bipartisan commission, whose members cannot run for office in their newly drawn districts, would be able to break the easily recognizable incumbency advantage and, from there, create a map that would be fairer to all parties. But how likely is this? In fact, there seems to have been very little difference in the maps produced by commissions and by legislatures in terms of the absolute performance of electoral systems, or in the change in their performance after redistricting has taken place.<br /><br />Read all about it in my "editorial" paper that proposes <a href="http://www.acthomas.ca/papers/redist-editorial.pdf">questions about the redistricting process that commissions should be asking</a> to guide their work this year. Not easy to publish a paper of questions unless you're near or past retirement age, let alone when those questions come from null results!<br />

<div style="margin-top: 10px; height: 15px;" class="zemanta-pixie"><br /><span class="zem-script more-related pretty-attribution"><script type="text/javascript" src="http://static.zemanta.com/readside/loader.js" defer="defer"></script></span></div>]]>
        
    </content>
</entry>

<entry>
    <title>Catcher Spotting Data Now Available</title>
    <link rel="alternate" type="text/html" href="http://www.acthomas.ca/comment/2011/01/catcher-spotting-data-now-available.html" />
    <id>tag:www.acthomas.ca,2011:/comment//1.42</id>

    <published>2011-01-03T18:22:20Z</published>
    <updated>2011-01-03T18:24:58Z</updated>

    <summary>Thanks to all those who took part in my trial of Catcher Spotting utilities. As promised, I&apos;ve posted the data for all those who want to work with it.The paper is posted here and is in submission for the JQAS...</summary>
    <author>
        <name>Andrew C. Thomas</name>
        
    </author>
    
    <category term="baseball" label="baseball" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="data" label="data" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="statistics" label="statistics" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en-US" xml:base="http://www.acthomas.ca/comment/">
        <![CDATA[Thanks to all those who took part in my trial of Catcher Spotting utilities. As promised, I've posted <a href="http://www.acthomas.ca/data/catcher-spotting-data.zip">the data</a> for all those who want to work with it.<br /><br /><a href="http://www.acthomas.ca/papers/spotting-writeup.pdf">The paper is posted here</a> and is in submission for the <a href="http://www.bepress.org/jqas/">JQAS</a> special issue on the <a href="http://ncssors.wikidot.com/">NCSSORS</a> conference.<br />]]>
        
    </content>
</entry>

<entry>
    <title>Joe And Ted&apos;s Excellent Accomplishments: Streaks and the Evolving Sport of Baseball</title>
    <link rel="alternate" type="text/html" href="http://www.acthomas.ca/comment/2010/10/joe-and-teds-excellent-accomplishments-streaks-and-the-evolving-sport-of-baseball.html" />
    <id>tag:www.acthomas.ca,2010:/comment//1.40</id>

    <published>2010-10-18T15:19:51Z</published>
    <updated>2010-10-18T15:30:11Z</updated>

    <summary>Summarizing my latest sports piece, and first peer-reviewed publication on baseball. Get it for free here.Even after nearly a century and a half of major league play, baseball still has no shortage of great questions and puzzles, one of them...</summary>
    <author>
        <name>Andrew C. Thomas</name>
        
    </author>
    
    <category term="baseball" label="baseball" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="hittingstreak" label="Hitting streak" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="statistics" label="statistics" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en-US" xml:base="http://www.acthomas.ca/comment/">
        <![CDATA[Summarizing my latest <a href="http://www.bepress.com/jqas/vol6/iss4/7/">sports piece</a>, and first peer-reviewed publication on baseball. <a href="http://www.acthomas.ca/papers/hitting-streaks-writeup.pdf">Get it for free here.</a><br /><br />Even
after nearly a century and a half of major league play, baseball still
has no shortage of great questions and puzzles, one of them being the
phenomenon of the <span class="il">streak</span>. Just how remarkable was Joe DiMaggio's
56-game hitting <span class="il">streak</span>, and Ted Williams's less celebrated 84
consecutive games reaching base? If we could rerun the past 139 seasons
of baseball, how likely would we be to see a <span class="il">streak</span> of that length (or
longer) again? And how can we trust that the answer we get back is in
any way a reliable one, without the use of a time machine?<br />
<br />]]>
        <![CDATA[Any attempt to answer these questions starts with the supposition
that there are some outcomes that would be observed every time, and we
must consider the mathematics in a way that favors their appearance.
One possibility is that even if DiMaggio's hitting <span class="il">streak</span> was somehow
magical, other lesser streaks would be expected. After all, while it
might not be all that likely that Ty Cobb, Paul Molitor and Jimmy
Rollins would run up hitting streaks of 40, 39 and 38 games
respectively as their career bests, it isn't all that unusual to say
that in these hypothetical do-overs, someone would have done as well in
each person's place, and that streaks of roughly these lengths would
end up as numbers 6, 7 and 8 on the all-time list.<br />

<br />
To test these outcomes, I started with a publicly available
database on yearly player outcomes, and assumed that each player would
have the same number of games and plate appearances in each year. I
then produced several plate appearances for each player in each
hypothetical game by, essentially, the Strat-O-Matic or "Wheel of
Fortune" method - spin a (computer-based) wheel marked with "hit",
"walk" and "out", where each wedge on the wheel has as much space as
the probability of each event - and noted whether a hit (or a walk) was
recorded in the game. (This, of course, assumes that there is no "hot
hand" effect that would cause players on a <span class="il">streak</span> to keep performing
above their expected ability.) This is done repeatedly to get a large
number of histories to compare to the real McCoy.<br />

<br />
For hitting streaks, it turns out that this method does a great job
for creating those lesser streaks for 1950 and onwards, but
overpredicts how long these high-ranked streaks would be from 1900
until 1950, and is even more wildly high for the corresponding streaks
in the 19th century. An easy fix for this is to allow the hit
probability to vary from day to day in these early eras - on some days
the batter has a higher average than normal, on some days a lower
average - and by choosing the right variabilities, those lesser streaks
in simulation for each era line up with their "real" counterparts. This
produces a highly educated guess about how rarely we would see a
56-game <span class="il">streak</span> or more: a little less than 5% of the time since 1901, a
<span class="il">streak</span> of the same length as DiMaggio's would be observed. In my eyes,
that's certainly a&nbsp; remarkable record, whatever factors led to it.<br />

<br />
But even this easy tweak opens up a lot of questions. If this extra
fluctuation is enough to make the model look like the real world, what
real-world factors would produce it? One possibility is that games
pre-1940 were called due to darkness more often; another is that while
on the road, players would encounter many different types of ballparks;
most likely of all, the quality of opposing pitching was far more
variable than today, and that a few good (or lucky) pitchers were far
better at stopping streaks than others.<br />

<br />
If this explanation is true, then we have a new issue to contend
with: on the aggregate, the past 60 years of streaks needed no extra
variability in day-to-day hitting to produce a <span class="il">streak</span> list that was
comparable to reality. Does that mean that the modern use of relief
pitching has created a game where a player's opposing pitching is
virtually indistinguishable from day to day? Or, more extremely, could
it mean that on the whole, any two given major league pitchers are
effectively indistinguishable in stopping hits from occurring? That's
one philosophy of baseball research, first suggested by Robert "Voros"
McCracken a little over a decade ago and still a hot topic for debate,
but it also has a taste of Stephen Jay Gould, who pointed out that the
variability between hitters has been decreasing over time. While Gould
suggested that this would mean the .400 hitter was a thing of the past,
the implication here is that this lower variability would encourage
longer hitting streaks: without pitchers that are dramatically
different from their counterparts, a player on a <span class="il">streak</span> would be less
likely to be shut down by an opposing ace today than 100 years ago.<br />

<br />
This method isn't nearly as successful for figuring out on-base
streaks; in fact, the same machinery that was just used for hits alone
grossly overestimates how long history's on-base streaks would last, no
matter how much reasonable difference in ability we estimate between
pitchers. While this certainly implies that the opposing pitchers still
vary widely in their ability to prevent bases-on-balls (the flip-side
to McCracken's argument), it gives us no shortage of alternative
explanations to consider as the season begins, and just as much
continuing mystery about this era of the game that we are privileged to
witness.]]>
    </content>
</entry>

</feed>
