<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
    <title>A.C. Thomas, Scientist</title>
    <link rel="alternate" type="text/html" href="http://www.acthomas.ca/comment/" />
    <link rel="self" type="application/atom+xml" href="http://www.acthomas.ca/comment/atom.xml" />
    <id>tag:www.acthomas.ca,2009-10-17:/comment//1</id>
    <updated>2012-04-09T02:56:09Z</updated>
    
    <generator uri="http://www.sixapart.com/movabletype/">Movable Type Pro 4.32-en</generator>

<entry>
    <title>Prediction, The Big Discovery and Heartbreak</title>
    <link rel="alternate" type="text/html" href="http://www.acthomas.ca/comment/2012/04/prediction-the-big-discovery-and-heartbreak.html" />
    <id>tag:www.acthomas.ca,2012:/comment//1.46</id>

    <published>2012-04-08T19:23:19Z</published>
    <updated>2012-04-09T02:56:09Z</updated>

    <summary>It&apos;s a year old, but I only just heard the story of what happened to a baseball researcher I first read about 10 years ago:http://espn.go.com/blog/sweetspot/post/_/id/6835/voros-mccracken-changed-the-gamehttp://www.thepostgame.com/features/201101/sabermetrician-exile&quot;Voros&quot; McCracken had a particular insight about pitcher ability in the late 1990s, namely that a...</summary>
    <author>
        <name>Andrew C. Thomas</name>
        
    </author>
    
    
    <content type="html" xml:lang="en-US" xml:base="http://www.acthomas.ca/comment/">
        <![CDATA[It's a year old, but I only just heard the story of what happened to a baseball researcher I first read about 10 years ago:<br /><br /><a href="http://espn.go.com/blog/sweetspot/post/_/id/6835/voros-mccracken-changed-the-game">http://espn.go.com/blog/sweetspot/post/_/id/6835/voros-mccracken-changed-the-game</a><br /><a href="http://www.thepostgame.com/features/201101/sabermetrician-exile">http://www.thepostgame.com/features/201101/sabermetrician-exile</a><br /><br />"Voros" McCracken had a particular insight about pitcher ability in the late 1990s, namely that a subset of the data -- <a href="http://en.wikipedia.org/wiki/Defense_independent_pitching_statistics">"defense-independent" pitching statistics</a>-- was not only an excellent predictor of the runs allowed by a pitcher, but is also highly persistent from year to year. Others discovered the same principle around the same time, but it was <a href="http://www.baseballprospectus.com/article.php?articleid=878">the publication of this work</a> that got the attention: the strong claim that the differences between pitchers, on batted balls in play, were so small as to be ignored. While it wasn't quite correct -- there is some predictive power in the remaining information -- it was enough to change people's ideas about how the game works.<br /><br />And now, McCracken is out of baseball, applying analytical methods to undisclosed professional soccer clubs, having made a meager living while working for the Boston Red Sox in the early 2000s. As one of the figures I read about before going to grad school in statistics, it definitely got my attention when I heard this news.<br /><br />There are two things I take away from this whole story:<br /><br />1) After 12 years of exposure to real data and methods, I feel like it should have been *screamingly obvious* to find the best set of predictors of success, and an assignment I could give to an undergrad with existing databases. But would it be obvious enough to construct a narrative around it, and convince the public? I can think of a good explanation after the fact, that there's a lot of general uncertainty once the bat hits the ball, like the angle of attack, but I've had years to think about it.<br /><br /> <div>2) Any research I do on sports is for the sake of teaching, or just as a hobby, and the value it brings me to share something with the world is a nice bonus. I'm not under pressure to find a brilliant discovery to keep my job -- at least, not when it comes to sports. <br /><br /><a href="http://www.andrewgelman.com/">Gelman</a> has said that the most noteworthy discoveries these days aren't the small effects that come out with more data, but the big ones that everyone else just missed. This definitely qualified as one of those. Whatever the next big discovery, I'm sure we'll all think it was obvious years later, even if it wasn't.<br /></div>]]>
        
    </content>
</entry>

<entry>
    <title>Statistics and Scrabble, Together At Last</title>
    <link rel="alternate" type="text/html" href="http://www.acthomas.ca/comment/2011/07/statistics-and-scrabble-together-at-last.html" />
    <id>tag:www.acthomas.ca,2011:/comment//1.45</id>

    <published>2011-07-14T14:48:43Z</published>
    <updated>2011-07-14T15:15:35Z</updated>

    <summary>Sitting on my to-do list for a while now has been an exploration of Scrabble from an experimental design point of view; how to better design a tournament to make the variance as small as possible while still preserving the...</summary>
    <author>
        <name>Andrew C. Thomas</name>
        
    </author>
    
    <category term="experimentaldesign" label="experimental design" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="scrabble" label="scrabble" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en-US" xml:base="http://www.acthomas.ca/comment/">
        <![CDATA[Sitting on my to-do list for a while now has been an exploration of Scrabble from an experimental design point of view; how to better design a tournament to make the variance as small as possible while still preserving the appearance of the home game to its players. One goal was to figure out a way to carry out a true "duplicate" version of Scrabble so that multiple pairs could have access to the same tiles, rather than the <a href="http://en.wikipedia.org/wiki/Duplicate_Scrabble">currently popular version</a> in Europe that has no defensive element to it. <br /><br />I'm proud (relieved?) to say that I've finally <a href="http://arxiv.org/abs/1107.2456">finished the first draft </a>of this work for two-player head-to-head games, with a duplication method that ensures that if the game were repeated, each player would receive tiles from the reserve in the same sequence: think of the tiles being laid out in order (but unseen to the players), so that one player draws from the front and the other draws from the back. Like Lady and the Tramp with spaghetti:<br /><br /><img alt="tramp.jpg" src="http://www.acthomas.ca/comment/tramp.jpg" class="mt-image-none" style="" height="320" width="492" /><br />
     <div><br />I modified the Scrabble simulator <a href="http://www.quackle.com/">Quackle</a> to accept a predetermined tile order, then simulated over one million matches between Quackle's "Speedy Player"s using each of 10,600 tile orders 100 times. One goal of this was to figure out how much of the variance in score comes from the tile order and how much comes from the board, given that a tile order would be expected. It turns out to be about half-bag, half-board, so that if this scheme could be used in tournaments, it would visibly cut down the number of matches needed to figure out the best player (though it would need a <a href="http://en.wikipedia.org/wiki/Rube_Goldberg_machine">Goldbergian apparatus</a> to implement in live games.)<br /><br />Some other findings from the simulations:<br /><br /><ul><li>The blank is worth about 30 points to a good player, each S about 10.</li><li>The Q is a burden to whichever player receives it, effectively serving as a 5 point penalty for having to deal with it due to its effect in reducing bingo opportunities, needing either a U or a blank for a chance at a bingo and a 50-point bonus.<br /></li><li>The J is essentially neutral pointwise.<br /></li><li>The X and the Z are each worth about 3-5 extra points to the player who receives them. Their difficulty in playing in bingoes is mitigated by their usefulness in other short words.</li></ul>I have yet to make any other conclusions about how I think the game should be modified, mainly because it's premature without testing these ideas out on human players. Any volunteers?<br /></div>]]>
        
    </content>
</entry>

<entry>
    <title>Geek talk: Running R processes remotely through ESS</title>
    <link rel="alternate" type="text/html" href="http://www.acthomas.ca/comment/2011/04/geek-talk-running-r-processes-remotely-through-ess.html" />
    <id>tag:www.acthomas.ca,2011:/comment//1.44</id>

    <published>2011-04-17T04:47:14Z</published>
    <updated>2011-04-17T05:09:22Z</updated>

    <summary>I&apos;ve been wondering for a while how to use the convenience of Emacs on a local machine, while running R processors on a server remotely. According to http://www.xemacs.org/Documentation/packages/html/ess_3.html, this can be done with the ess-remote environment.1) Use M-x shell to...</summary>
    <author>
        <name>Andrew C. Thomas</name>
        
    </author>
    
    <category term="emacsspeaksstatistics" label="Emacs Speaks Statistics" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="r" label="R" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en-US" xml:base="http://www.acthomas.ca/comment/">
        <![CDATA[I've been wondering for a while how to use the convenience of Emacs on a local machine, while running R processors on a server remotely. According to <a href="http://www.xemacs.org/Documentation/packages/html/ess_3.html">http://www.xemacs.org/Documentation/packages/html/ess_3.html</a>, this can be done with the ess-remote environment.<br /><br />1) Use M-x shell to open a shell environment.<br /><br />2) Use ssh to connect to the server of interest. If desired, use "screen" to make the process resilient to disconnection.<br /><br />3) Run R.<br /><br />4) Use M-x ess-remote to enable the shell as an ESS process.<br /><br />The advantages of this process are the added convenience of an environment that allows instant execution of lines of R code. Files still must be loaded on the remote server, rather than referenced on the local machine, but it's still an improvement.<br /><br />]]>
        
    </content>
</entry>

<entry>
    <title>Big chances to change redistricting this year...</title>
    <link rel="alternate" type="text/html" href="http://www.acthomas.ca/comment/2011/02/big-chances-to-change-redistricting-this-year.html" />
    <id>tag:www.acthomas.ca,2011:/comment//1.43</id>

    <published>2011-02-06T20:20:43Z</published>
    <updated>2011-02-06T20:45:02Z</updated>

    <summary>...assuming we can take advantage of them. State legislatures are preparing to conduct their decennial redistricting processes, now that the data from the last census has been processed. In states where the legislature does it directly, the result is known...</summary>
    <author>
        <name>Andrew C. Thomas</name>
        
    </author>
    
    <category term="gelmanwasright" label="gelmanwasright" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="redistricting" label="Redistricting" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en-US" xml:base="http://www.acthomas.ca/comment/">
        <![CDATA[...assuming we can take advantage of them. <br /><br />State legislatures are preparing to conduct their decennial redistricting processes, now that the data from the last census has been processed. In states where the legislature does it directly, the result is known to be far less than fair for its constituents, in that either one party tries to take control of an "unfair" number of seats (<a href="http://en.wikipedia.org/wiki/2003_Texas_redistricting">remember the Oklahoma exodus!</a>) or the incumbents on both sides work to protect their own re-election prospects (<a href="http://en.wikipedia.org/wiki/Tip_O%27Neill">some people have all the luck</a>).<br /><br />There is hope that a bipartisan commission, whose members cannot run for office in their newly drawn districts, would be able to break the easily recognizable incumbency advantage and, from there, create a map that would be fairer to all parties. But how likely is this? In fact, there seems to have been very little difference in the maps produced by commissions and by legislatures in terms of the absolute performance of electoral systems, or in the change in their performance after redistricting has taken place.<br /><br />Read all about it in my "editorial" paper that proposes <a href="http://www.acthomas.ca/papers/redist-editorial.pdf">questions about the redistricting process that commissions should be asking</a> to guide their work this year. Not easy to publish a paper of questions unless you're near or past retirement age, let alone when those questions come from null results!<br />

<div style="margin-top: 10px; height: 15px;" class="zemanta-pixie"><br /><span class="zem-script more-related pretty-attribution"><script type="text/javascript" src="http://static.zemanta.com/readside/loader.js" defer="defer"></script></span></div>]]>
        
    </content>
</entry>

<entry>
    <title>Catcher Spotting Data Now Available</title>
    <link rel="alternate" type="text/html" href="http://www.acthomas.ca/comment/2011/01/catcher-spotting-data-now-available.html" />
    <id>tag:www.acthomas.ca,2011:/comment//1.42</id>

    <published>2011-01-03T18:22:20Z</published>
    <updated>2011-01-03T18:24:58Z</updated>

    <summary>Thanks to all those who took part in my trial of Catcher Spotting utilities. As promised, I&apos;ve posted the data for all those who want to work with it.The paper is posted here and is in submission for the JQAS...</summary>
    <author>
        <name>Andrew C. Thomas</name>
        
    </author>
    
    <category term="baseball" label="baseball" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="data" label="data" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="statistics" label="statistics" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en-US" xml:base="http://www.acthomas.ca/comment/">
        <![CDATA[Thanks to all those who took part in my trial of Catcher Spotting utilities. As promised, I've posted <a href="http://www.acthomas.ca/data/catcher-spotting-data.zip">the data</a> for all those who want to work with it.<br /><br /><a href="http://www.acthomas.ca/papers/spotting-writeup.pdf">The paper is posted here</a> and is in submission for the <a href="http://www.bepress.org/jqas/">JQAS</a> special issue on the <a href="http://ncssors.wikidot.com/">NCSSORS</a> conference.<br />]]>
        
    </content>
</entry>

<entry>
    <title>Joe And Ted&apos;s Excellent Accomplishments: Streaks and the Evolving Sport of Baseball</title>
    <link rel="alternate" type="text/html" href="http://www.acthomas.ca/comment/2010/10/joe-and-teds-excellent-accomplishments-streaks-and-the-evolving-sport-of-baseball.html" />
    <id>tag:www.acthomas.ca,2010:/comment//1.40</id>

    <published>2010-10-18T15:19:51Z</published>
    <updated>2010-10-18T15:30:11Z</updated>

    <summary>Summarizing my latest sports piece, and first peer-reviewed publication on baseball. Get it for free here.Even after nearly a century and a half of major league play, baseball still has no shortage of great questions and puzzles, one of them...</summary>
    <author>
        <name>Andrew C. Thomas</name>
        
    </author>
    
    <category term="baseball" label="baseball" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="hittingstreak" label="Hitting streak" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="statistics" label="statistics" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en-US" xml:base="http://www.acthomas.ca/comment/">
        <![CDATA[Summarizing my latest <a href="http://www.bepress.com/jqas/vol6/iss4/7/">sports piece</a>, and first peer-reviewed publication on baseball. <a href="http://www.acthomas.ca/papers/hitting-streaks-writeup.pdf">Get it for free here.</a><br /><br />Even
after nearly a century and a half of major league play, baseball still
has no shortage of great questions and puzzles, one of them being the
phenomenon of the <span class="il">streak</span>. Just how remarkable was Joe DiMaggio's
56-game hitting <span class="il">streak</span>, and Ted Williams's less celebrated 84
consecutive games reaching base? If we could rerun the past 139 seasons
of baseball, how likely would we be to see a <span class="il">streak</span> of that length (or
longer) again? And how can we trust that the answer we get back is in
any way a reliable one, without the use of a time machine?<br />
<br />]]>
        <![CDATA[Any attempt to answer these questions starts with the supposition
that there are some outcomes that would be observed every time, and we
must consider the mathematics in a way that favors their appearance.
One possibility is that even if DiMaggio's hitting <span class="il">streak</span> was somehow
magical, other lesser streaks would be expected. After all, while it
might not be all that likely that Ty Cobb, Paul Molitor and Jimmy
Rollins would run up hitting streaks of 40, 39 and 38 games
respectively as their career bests, it isn't all that unusual to say
that in these hypothetical do-overs, someone would have done as well in
each person's place, and that streaks of roughly these lengths would
end up as numbers 6, 7 and 8 on the all-time list.<br />

<br />
To test these outcomes, I started with a publicly available
database on yearly player outcomes, and assumed that each player would
have the same number of games and plate appearances in each year. I
then produced several plate appearances for each player in each
hypothetical game by, essentially, the Strat-O-Matic or "Wheel of
Fortune" method - spin a (computer-based) wheel marked with "hit",
"walk" and "out", where each wedge on the wheel has as much space as
the probability of each event - and noted whether a hit (or a walk) was
recorded in the game. (This, of course, assumes that there is no "hot
hand" effect that would cause players on a <span class="il">streak</span> to keep performing
above their expected ability.) This is done repeatedly to get a large
number of histories to compare to the real McCoy.<br />

<br />
For hitting streaks, it turns out that this method does a great job
for creating those lesser streaks for 1950 and onwards, but
overpredicts how long these high-ranked streaks would be from 1900
until 1950, and is even more wildly high for the corresponding streaks
in the 19th century. An easy fix for this is to allow the hit
probability to vary from day to day in these early eras - on some days
the batter has a higher average than normal, on some days a lower
average - and by choosing the right variabilities, those lesser streaks
in simulation for each era line up with their "real" counterparts. This
produces a highly educated guess about how rarely we would see a
56-game <span class="il">streak</span> or more: a little less than 5% of the time since 1901, a
<span class="il">streak</span> of the same length as DiMaggio's would be observed. In my eyes,
that's certainly a&nbsp; remarkable record, whatever factors led to it.<br />

<br />
But even this easy tweak opens up a lot of questions. If this extra
fluctuation is enough to make the model look like the real world, what
real-world factors would produce it? One possibility is that games
pre-1940 were called due to darkness more often; another is that while
on the road, players would encounter many different types of ballparks;
most likely of all, the quality of opposing pitching was far more
variable than today, and that a few good (or lucky) pitchers were far
better at stopping streaks than others.<br />

<br />
If this explanation is true, then we have a new issue to contend
with: on the aggregate, the past 60 years of streaks needed no extra
variability in day-to-day hitting to produce a <span class="il">streak</span> list that was
comparable to reality. Does that mean that the modern use of relief
pitching has created a game where a player's opposing pitching is
virtually indistinguishable from day to day? Or, more extremely, could
it mean that on the whole, any two given major league pitchers are
effectively indistinguishable in stopping hits from occurring? That's
one philosophy of baseball research, first suggested by Robert "Voros"
McCracken a little over a decade ago and still a hot topic for debate,
but it also has a taste of Stephen Jay Gould, who pointed out that the
variability between hitters has been decreasing over time. While Gould
suggested that this would mean the .400 hitter was a thing of the past,
the implication here is that this lower variability would encourage
longer hitting streaks: without pitchers that are dramatically
different from their counterparts, a player on a <span class="il">streak</span> would be less
likely to be shut down by an opposing ace today than 100 years ago.<br />

<br />
This method isn't nearly as successful for figuring out on-base
streaks; in fact, the same machinery that was just used for hits alone
grossly overestimates how long history's on-base streaks would last, no
matter how much reasonable difference in ability we estimate between
pitchers. While this certainly implies that the opposing pitchers still
vary widely in their ability to prevent bases-on-balls (the flip-side
to McCracken's argument), it gives us no shortage of alternative
explanations to consider as the season begins, and just as much
continuing mystery about this era of the game that we are privileged to
witness.]]>
    </content>
</entry>

<entry>
    <title>A Catcher Spotting Tool: &quot;Hot Or Not?&quot; For Baseball Pitches</title>
    <link rel="alternate" type="text/html" href="http://www.acthomas.ca/comment/2010/09/a-catcher-spotting-tool-hot-or-not-for-baseball-pitches.html" />
    <id>tag:www.acthomas.ca,2010:/comment//1.39</id>

    <published>2010-09-27T16:04:57Z</published>
    <updated>2010-09-27T16:25:33Z</updated>

    <summary>Catcher Spotting is a project I&apos;ve been working on casually for about 4 years, starting when I got curious about the last uncaptured bit of data from a baseball game: the set-up of the catcher, implying the intended target of...</summary>
    <author>
        <name>Andrew C. Thomas</name>
        
    </author>
    
    
    <content type="html" xml:lang="en-US" xml:base="http://www.acthomas.ca/comment/">
        <![CDATA[Catcher Spotting is a project I've been working on casually for about 4 years, starting when I got curious about the last uncaptured bit of data from a baseball game: the set-up of the catcher, implying the intended target of each pitch before it's thrown by the pitcher. Every commentator knows that when a pitcher is "<a href="http://www.bbtia.com/home/2010/3/27/saturday-morning-rangers-notes-nine-days-and-counting.html">missing</a> <a href="http://www.nytimes.com/2010/06/14/sports/baseball/14yankees.html">his</a> <a href="http://www.usatoday.com/sports/baseball/2010-06-04-rangers-rays_N.htm">spots</a>", it's evidence of a loss of control, but attempts to measure the impact of this loss are utterly stymied by the lack of quality data out there.<br /><br />That's where the Catcher Spotting project comes in: I am very interested in figuring out exactly how this data can be collected on a wide scale. While commentators can be in agreement about whether a pitcher has missed his spot, codifying this hasn't been established yet. And I'm far from convinced that technology can do it alone through video analysis, especially in cases of "intentional deception" like when a runner is on second. Crowdsourcing seems to be the obvious solution -- how well can you distribute the task to many different coders?<br /><br />To that end, I've <a href="http://www.acthomas.ca/catcherspotting/">built an applet</a> that tries to answer that question: users simply click their mouse to indicate where the catcher has set up, and where the ball actually goes. By collecting data from (hopefully) many users, we should know how many different human coders would be needed to get a reliable sense of a pitcher's intent, as signalled through a catcher. The idea is the same as sites like Hot Or Not, except that I'm explicitly concerned with how the same rater will judge different pitches, so that we can know how trustworthy a single rater could be.<br /><br /><a href="http://www.acthomas.ca/catcherspotting/">Please give the applet a try!</a><br />]]>
        
    </content>
</entry>

<entry>
    <title>Invite Us To Your Parties, Scientists, Because Applied Statisticians Bring the Good Stuff</title>
    <link rel="alternate" type="text/html" href="http://www.acthomas.ca/comment/2010/05/invite-us-to-your-parties-scientists-because-applied-statisticians-bring-the-good-stuff.html" />
    <id>tag:www.acthomas.ca,2010:/comment//1.38</id>

    <published>2010-05-05T16:31:57Z</published>
    <updated>2010-05-05T16:45:42Z</updated>

    <summary>From an official response to the CRU email incident, largely clearing the actions of climate scientists:The Report points out where things might have been done better. One is to engage more with professional statisticians in the analysis of data. Another,...</summary>
    <author>
        <name>Andrew C. Thomas</name>
        
    </author>
    
    <category term="statistics" label="Statistics" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en-US" xml:base="http://www.acthomas.ca/comment/">
        <![CDATA[From <a href="http://www.uea.ac.uk/mac/comm/media/press/CRUstatements/oxburgh">an official response to the CRU email incident</a>, largely clearing the actions of climate scientists:<br /><br /><blockquote>The Report points out where things might have been done better. One is
to engage more with professional statisticians in the analysis of data.
Another, related, point is that more efficacious statistical techniques
might have been employed in some instances (although it was pointed out
that different methods may not have produced different results).
Specialists in many areas of research acquire and develop the
statistical skills pertinent to their own particular data analysis
requirements. However, we do see the sense in engaging more fully with
the wider statistics community to ensure that the most effective and
up-to-date statistical techniques are adopted and will now consider
further how best to achieve this.</blockquote><div>I'm not sure anything that was said in the report will placate the denier crowd, but this part here gives me hope. There are plenty of people out there who are well-experienced in cross-field <a class="zem_slink" href="http://en.wikipedia.org/wiki/Statistics" title="Statistics" rel="wikipedia">statistical analysis</a> that would love to have sit-downs with real data and provide a sounding board. Indeed, it's one of the <a href="http://www.stat.cmu.edu/">founding purposes of my department</a>.<br /></div>]]>
        
    </content>
</entry>

<entry>
    <title>Because Cosma can say it better, and longer, than I can</title>
    <link rel="alternate" type="text/html" href="http://www.acthomas.ca/comment/2010/04/because-cosma-can-say-it-better-and-longer-than-i-can.html" />
    <id>tag:www.acthomas.ca,2010:/comment//1.37</id>

    <published>2010-04-29T00:12:32Z</published>
    <updated>2010-04-29T00:17:33Z</updated>

    <summary>Our paper is posted: how homophily and contagion in social networks are confounded, given the impact of latent variables on past and present outcomes, as well as network ties, in social network analysis....</summary>
    <author>
        <name>Andrew C. Thomas</name>
        
    </author>
    
    <category term="contagion" label="contagion" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="cosmashalizi" label="Cosma Shalizi" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="homophily" label="homophily" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="socialnetworks" label="social networks" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en-US" xml:base="http://www.acthomas.ca/comment/">
        <![CDATA[<a href="http://www.cscs.umich.edu/%7Ecrshalizi/weblog/">Our</a> <a href="http://arxiv.org/abs/1004.4704">paper</a> is <a href="http://bactra.org/weblog/656.html">posted</a>: how homophily and contagion in social networks are confounded, given the impact of latent variables on past and present outcomes, as well as network ties, in social network analysis.<br />]]>
        
    </content>
</entry>

<entry>
    <title>A Worthy Repost of Someone Else&apos;s Material</title>
    <link rel="alternate" type="text/html" href="http://www.acthomas.ca/comment/2010/04/a-worthy.html" />
    <id>tag:www.acthomas.ca,2010:/comment//1.36</id>

    <published>2010-04-22T20:27:54Z</published>
    <updated>2010-04-22T20:45:53Z</updated>

    <summary>How To Publish A Comment has come up twice in the last week: once at a conference, once at a blog post. Boy, I can&apos;t wait for this to happen to me....</summary>
    <author>
        <name>Andrew C. Thomas</name>
        
    </author>
    
    
    <content type="html" xml:lang="en-US" xml:base="http://www.acthomas.ca/comment/">
        <![CDATA[<a href="http://scienceblogs.com/catdynamics/upload/2009/08/how_to_publish_a_scientific_co/How%20to%20Publish%20a%20Comment.pdf">How To Publish A Comment</a> has come up twice in the last week: once <a href="http://www.stat.harvard.edu/NESS10/">at a conference</a>, once <a href="http://cs.unm.edu/%7Eaaron/blog/archives/2010/04/people_v_the_sc.htm">at a blog post</a>. Boy, I can't wait for this to happen to me.<br />]]>
        
    </content>
</entry>

<entry>
    <title>The Statistics of the Putter&apos;s Game</title>
    <link rel="alternate" type="text/html" href="http://www.acthomas.ca/comment/2010/03/the-statistics-of-the-putters-game.html" />
    <id>tag:www.acthomas.ca,2010:/comment//1.35</id>

    <published>2010-03-15T22:18:24Z</published>
    <updated>2010-03-15T22:46:43Z</updated>

    <summary>When I first saw this piece on improved putting statistics in the Wall Street Journal, I put on my typical skeptic&apos;s face when it comes to science in the news, and especially when it comes to the overplay of data...</summary>
    <author>
        <name>Andrew C. Thomas</name>
        
    </author>
    
    
    <content type="html" xml:lang="en-US" xml:base="http://www.acthomas.ca/comment/">
        <![CDATA[When I first saw <a href="http://online.wsj.com/article/SB10001424052748703791704575114071142473884.html?mod=WSJ_hpp_RIGHTTopCarousel">this piece on improved putting statistics</a> in the Wall Street Journal, I put on my typical skeptic's face when it comes to science in the news, and especially when it comes to the overplay of data analysis and "new statistics" that creeps up in this kind of reporting. <br /><br />On reading <a href="http://web.mit.edu/dfearing/www/doc/How%20to%20Catch%20a%20Tiger.pdf">the actual paper</a> by operations researchers Douglas Fearing, Jason Acimovic and Stephen Graves, I was pleasantly surprised at the care and attention they've put into the problem. Having a rich data set is essential, and they've got one in the PGA putting database; I would cringe if the authors were required to gather their own data and make overly broad conclusions on that basis. <br /><br />The writers have deftly avoided the kinds of oversimplifications that make students of sports analysis cringe. Their model is simply stated -- figure out the factors that lead to making putts, and for those that weren't made, model how bad the misses are -- but the tricks to getting the computation right are subtle. Most importantly, they validate their models against data and resist the temptation to overfit, and they do well to produce a relevant quantity for each player (shots gained through putting, compared to a baseline) that can be predicted and therefore validated on a regular basis.<br /><br />I have the usual gripes about graphics and other statements; I want to see error bars on graphs, for example, and I really want to know about the predictive error -- that is, how well the putts-gained-per-round statistic will predict future putting performance (within one tournament, one month, one year, etc.) -- but all in all, I'm glad I went to read the paper myself and be reassured that the PGA is advertising a worthwhile product.<br />]]>
        
    </content>
</entry>

<entry>
    <title>Following the Shalizi Model for Blog Maintenance</title>
    <link rel="alternate" type="text/html" href="http://www.acthomas.ca/comment/2010/02/following-the-shalizi-model-for-blog-maintenance.html" />
    <id>tag:www.acthomas.ca,2010:/comment//1.34</id>

    <published>2010-02-08T16:43:15Z</published>
    <updated>2010-02-08T16:52:33Z</updated>

    <summary>My attempt to put up a web presence is negated by the fact that I don&apos;t make many trivial updates or statements; as a result, I&apos;m less concerned with the immediate payoff of these sorts of writings and more about...</summary>
    <author>
        <name>Andrew C. Thomas</name>
        
    </author>
    
    <category term="blog" label="Blog" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="cosmashalizi" label="Cosma Shalizi" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="spam" label="Spam" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en-US" xml:base="http://www.acthomas.ca/comment/">
        <![CDATA[My attempt to put up a web presence is negated by the fact that I don't make many trivial updates or statements; as a result, I'm less concerned with the immediate payoff of these sorts of writings and more about making a longer statement. My friend and colleague <a href="http://www.cscs.umich.edu/%7Ecrshalizi/weblog/">Cosma Shalizi</a> pointed out to me when I started this site that there are two steady states for successful blogs: <br /><br /><ul><li>those that are fast to update, have lots of constant yet ephemeral traffic, and have their spam problems mitigated by a quality comment model, and</li><li>those that are infrequently updated, have occasional yet consistent traffic, and have their spam problems eliminated by removing the ability to comment.</li></ul>I've become aware that if there's anything I want to say, I want to think carefully about it first, and make it last in the end, mirroring Cosma's model. So goodbye comments and trackbacks; if you want to respond to anything I write, you know how to find me.<br /><br />]]>
        
    </content>
</entry>

<entry>
    <title>Sitting this one out</title>
    <link rel="alternate" type="text/html" href="http://www.acthomas.ca/comment/2010/01/sitting-this-one-out.html" />
    <id>tag:www.acthomas.ca,2010:/comment//1.33</id>

    <published>2010-01-19T20:03:19Z</published>
    <updated>2010-01-19T20:07:11Z</updated>

    <summary>As a former resident of Massachusetts, it&apos;s been interesting for me to watch the Facebook reaction to the Coakley-Brown race purely from the psychology angle. I don&apos;t trust the polls or the probabilities in this case because there&apos;s so little...</summary>
    <author>
        <name>Andrew C. Thomas</name>
        
    </author>
    
    
    <content type="html" xml:lang="en-US" xml:base="http://www.acthomas.ca/comment/">
        <![CDATA[As a former resident of Massachusetts, it's been interesting for me to watch the Facebook reaction to the Coakley-Brown race purely from the psychology angle. I don't trust the polls or the probabilities in this case because there's so little prior information on their reliability before special elections that are unlikely to be replicated. So I'll just sit back and wait for the obvious narratives to come rolling in before the Daily Show tonight.<br />]]>
        
    </content>
</entry>

<entry>
    <title>Off the grid</title>
    <link rel="alternate" type="text/html" href="http://www.acthomas.ca/comment/2009/12/off-the-grid.html" />
    <id>tag:www.acthomas.ca,2009:/comment//1.32</id>

    <published>2009-12-19T22:11:35Z</published>
    <updated>2009-12-19T22:20:36Z</updated>

    <summary>For the first time since I&apos;ve started using email on a regular basis, I&apos;ll be without it for the next week as I spend quality time with the family. For that matter, it&apos;ll be my first time in a while...</summary>
    <author>
        <name>Andrew C. Thomas</name>
        
    </author>
    
    
    <content type="html" xml:lang="en-US" xml:base="http://www.acthomas.ca/comment/">
        <![CDATA[For the first time since I've started using email on a regular basis, I'll be without it for the next week as I spend quality time with the family. For that matter, it'll be my first time in a while without pointing my eyes toward a screen. If you're reading this between December 20 and 27, you're clearly not doing the same thing.<br />]]>
        
    </content>
</entry>

<entry>
    <title>Tentative syllabus for 36-724: Applied Bayesian Statistical Computing</title>
    <link rel="alternate" type="text/html" href="http://www.acthomas.ca/comment/2009/12/tentative-syllabus-for-36-724-applied-bayesian-statistical-computing.html" />
    <id>tag:www.acthomas.ca,2009:/comment//1.31</id>

    <published>2009-12-19T21:48:31Z</published>
    <updated>2009-12-19T22:11:25Z</updated>

    <summary>As previously offered, this course was a full semester 12 unit course following three semester courses in mathematical statistics, regression modelling and computation. Now, as there is room only for six weeks and no precursor course in computing, I&apos;m still...</summary>
    <author>
        <name>Andrew C. Thomas</name>
        
    </author>
    
    <category term="bayesianstatistics" label="bayesian statistics" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="statisticalcomputing" label="statistical computing" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en-US" xml:base="http://www.acthomas.ca/comment/">
        <![CDATA[As previously offered, this course was a full semester 12 unit course following three semester courses in mathematical statistics, regression modelling and computation. Now, as there is room only for six weeks and no precursor course in computing, I'm still working on how to pick the essential concepts and put them into a seven-week course. Here's what I've got so far.<br /><br />Carnegie Mellon University, Spring 2010: 36-724: Applied Bayesian Statistical Computing<br />Instructor: Andrew C. Thomas (acthomas at stat.cmu.edu)<br />Class Time/Place: MWF 11:30-12:20, CFA 211<br /><br />Required text: <br />Andrew Gelman and Jennifer Hill (2007) "<a href="http://www.stat.columbia.edu/%7Egelman/arm/">Data Analysis using Regression and Multilevel/Hierarchical Models</a>". Cambridge University Press. Buy the softcover version.<br /><br />Prerequisites: 36-705 ``Intermediate Statistics'', 36-707 ``Intermediate Regression''. If you have not taken these classes specifically, examine the syllabuses for these courses and make an appointment to see me within the first week of class.<br /><br />The goal of this course is to give a meaningful introduction and exploration of Bayesian statistical methods through computational techniques in seven weeks. We will focus on the principles of Bayesian hierarchical modelling methods that can be programmed efficiently and remain scientifically valid, and methods for debugging without pulling too much hair out. We will not be explicitly covering discriminative machine-learning topics, but we will cover the same debugging concepts that will make things easier when coding them up.<br /><br />Programming language: R will be the supported language for the course, with the possible use of WinBUGS.<br /><br />Tentative outline of the course:<br /><br />Week 1: Introductions. "Central Dogma of Generative Modelling", One-level models, prior specifications and conjugacy; introduction to sampling and simulation in R.<br />Week 2: A reintroduction to Markov Chain theory, beginning with discrete models and moving to one-dimensional continuous models.<br />Week 3: Generalized linear models. Grid sampling, the Metropolis-Hastings algorithm, Gibbs sampling.<br />Week 4: Gaussian multilevel models. Partial and full pooling of variance components; autocorrelation and cross-correlation in chains; diagnostics for convergance.<br />Week 5: Generalized multilevel models; posterior predictive checking. <br />Week 6: Varying-slope models in the multilevel context.<br />Week 7: Special topics to be determined; Bayesian graphical models, causal inference.<br /><br />If you have any suggestions for topics that ought to be considered, please let me know.<br />]]>
        
    </content>
</entry>

</feed>

