Monday, April 29, 2013

Simpson's Paradox

A North Slope real estate broker (named North) is trying to convince you that North Slope is a more affluent neighborhood than South Slope.  To prove it, he explains that professionals in North Slope earn a median income of $150,000, versus only $100,000 in South Slope.  Working class folks fare better in North Slope also, with hourly workers making $30,000 a year to South Slope's $25,000.

The South Slope real estate broker (named South) explains that North is crazy.  South Slope is much more affluent.  The median income in South Slope is $80,000 versus the North Slope median of $40,000.

Question: Who is lying, North or South?
Answer: It could be neither.
Consider the breakdown of income shown below.


We can see that North is not lying.  Half the hourly South Slope workers earn $20K and half $30K, for a median of 25K.  A similar calculation for the North Slope workers yields an hourly median of 30K.  For professionals in the South Slope, the median is $100K, with half earning $80K and half earning $120K.  In the North Slope, a similar calculation yields the median of $150,000.

South is not lying either.  For the South Slope, the median is $80,000, since more than half of the workers make less than or equal to $80,000 and more than half make $80,000 or more (according to the definition of median, at least half must be above the median and at least half must be below).  For the North Slope, the median is $40,000.

What happened here?  The problem, and the reason for the conflict between the wages according to type of work and the overall wages, is that the percentage of residents in each category does not match.  Thus, though professionals and hourly workers make more in the North Slope, there are far more hourly workers in the North Slope than in the South Slope.  Thus, the overall median (or mean) income is lower in the North Slope.

While Wikipedia has an entry for Simpson's Paradox, a specific example of which I described above, it seems that most people are unaware of it.  My motivation for writing about it is not the made-up example I present above but the fact that I encounter it so much in my everyday work.  I either make my clients very happy by explaining that the 'bad' effect they have found may well be spurious or, anger them when I explain the interesting relationship they have found is a mere statistical anomaly.

Monday, November 12, 2012

The Worst Graph

One reason for quotes like there are "there are lies, damn lies, and statistics" is because of graphs like these:

This was on the front of money.com this morning with the caption: "Huge US Oil Boom ahead: The U.S. will overtake Saudi Arabia to become the world's biggest oil producer before 2020."

I was shocked at first glance, because I thought oil production was going to go up 10 or 20-fold from the tiny amount in 2011 to the huge amount in 2015.  That does indeed sound huge.  Then I looked at the left y-axis, where I can see it is only going from 8 million to 10 million barrels a day, an increase of about 25%.  


Fine, you say, but you can still easily see that the light blue bar is above the dark blue bar starting in 2025, showing the US overtakes Saudia Arabia.  


I'm afraid not.  The two bars are not Saudia Arabia versus US production but oil versus gas production, and it is not even clear whose production is depicted.  Is the the whole world, the US, Saudi Arabia?  The article puts US production at 5.8 million barrels a day in 2011, so it appears not to be US production, but other sources put it at closer to 9 million, so maybe it is the US.  


Ok, you say, despite the poor caption, at least you can clearly see that gas production begins to top oil production (in whatever country the graph is depicting) around 2025.  


Not really.  Since oil is in millions of barrels per data and the gas is in billions of  cubic meters (per day, per month, per year, who knows?), this is actually not the case.  The year 2030 shows oil at about 10 million barrels per day and gas at nearly 800 billion cubic meters.  Which is more?  Maybe the readers of money can quickly translate these figures into BTUs or some useful measure of production output, but I sure can't tell you.


Fine, you say, but since they start at about the same level, we at least know that gas increases more than oil over the time period.  


Sorry, even that is incorrect.  Look at the scale on the left axis (oil), which starts at 8 and goes to 12, a 50% increase.  The right axes starts at 600 and goes to 800, a 25% increase. Thus, oil goes from 8 to just over 10 (more than a 25% increase) while gas goes from a little over 600 to just little under 800 (less than a 33% increase--maybe a little more than oil but maybe not).


The only thing that appears to be correct about this graph is the year, until you realize that in the first period, there are only four years (2011-2015) while in the other periods, there are five year differences.

Friday, September 21, 2012

Election Polls

With the upcoming election, I have been following my favorite prediction site: electoral-vote.com.  That site has a big map showing current predictions state-by-state as well as the overall electoral vote prediction.  It also shows the senate predictions.  It has been amazing accurate in the past (though, of course, this doesn't mean that the sites predictions wont change considerably between now and the election). The predictions are all based on some sort of averaging of polls, and the site shows the results of each poll.  What I have found interesting (and it has been noted on the site) is that some polls appear to lean toward Obama while others lean toward Romney.  In other words, the polls appear to have biases.

Why?  Theories abound about this, and much of it comes down to the polling methodology.  The most compelling reason I have seen comes from Nate Silver's blog on the New York Times site.  Silver's blog compares traditional polls, which call only land-line phones, with more modern polls, which call cell phones along with land-lines.

As shown by a chart in Silver's blog,  there is a clear and consistent difference in every swing state between the two types of polls, with modern polls leaning toward Obama.  This is consistent with the idea that younger people are both likely to vote for Obama and also more likely to not have landlines.  This issue has been pointed out before, and a Pew Research Report in 2010 noted substantial differences in party affiliation between voters who had a landline and those who only had a cell phone.

There is no doubt that the percentage of homes without landlines is rising rapidly.  See, for example, the CDC Report from last year, showing that about 30% of adults did not have a landline in 2011, about twice the percentages as 2008.  This increase in wireless-only homes does not necessarily mean an increase in bias (more and more Republicans may be shedding their landlines, and thus the bias could fall even as wireless only usage increases).  Still, the departure in the polls indicates that a bias persists.



  

Friday, November 11, 2011

Born to Run?

About a year ago, I read a book called "Born to Run," by Christopher McDougall, who last week wrote an article in the New York Times Magazine on the same subject.

McDougall's basic premise is that we were faster and less injury-prone before we started wearing all these fancy running shoes and that they are what's causing running injuries. For example, in the New York times article:

"Back in the ’60s, Americans 'ran way more and way faster in the thinnest little shoes, and we never got hurt,' Amby Burfoot, a longtime Runner’s World editor and former Boston Marathon champion, said during a talk before the Lehigh Valley Half-Marathon I attended last year. 'I never even remember talking about injuries back then,' Burfoot said. 'So you’ve got to wonder what’s changed.'"

Statistics frowns on such anecdotal evidence, though it does make a good story. Did we really run faster? There are a lot of facts that we can look at though average times aren't among them. Marathon records (shown in Wikipedia) for men have indeed only downticked a little since the sixties. In 1970, Ron Hill of the UK (close enough, runnig-shoe wise, to be considered american?) set a record of 2:09:29. This year, a new record of 2:03:38 was set (the most recent US record was 2:05:38 in 2002). Six minutes in 40 years doesnt seem like much, but is it because of the shoes or because the sport has matured? And are Americans seen less because running isnt really a big competitive sport here?

When you look at women's times, the changes are much more dramatic. Women more recently began running marathons and fewer participated in the sport in general until relatively recently. In 1970, the women's marathon record was 3:02:53 (set by an american). In 2003, Paula Radcliffe (England) ran it in 2:15:25. That's a 47 minute improvement, or nearly 2 minutes per mile. In the 2011 New York Marathon, 40 women from the US bested the 1970 record time (see marathon site here for results).

So, I can't agree that we ran "way faster" 40 years ago. This doesn't mean that bare foot runners are slower than shoed runners because changes over the last fourty years in the level of competition, and improvements in training and fitness, rather than shoes, might have been the factors contributing to improved times.

How about injury rates? Do people get more injuries with running shoes than without? Unfortunately, any data on injury rates is tainted by the changes in the makeup of the population that runs (from a small, highly fit population to a large more population more varied in fitness--think of the then-overweight President Clinton running with a stop at McDonalds post-jog), and there haven't been any studies that directly compare injuries over time for barefoot running against running shoe running. A good summary article is here.

A recent article in Nature, while not looking at historical data, supports McDougall's contention that running shoes can be more harmful than bare feet when running. The article is lead-authored by Daniel Lieberman, a big advocate of barefoot running, so his bias may have been to look at things he believed were helpful about barefoot running and not at aspects of barefoot running that may be harmful. The article looks at impact forces and not at injuries, and doesn't consider that runners with shoes may be able to change their stride to reduce the impact forces (McDougall says this is hard to do with running shoes, and, from my own experience, I tend to agree, though I don't think it is impossible).

The statistical net-net is that there is no direct evidence either way right now. I admit some bias but I would say that the lack of evidence, given the power and money behind the shoe industry, tends to make me believe that, at best, fancy shoes are no better than bare feet, because if there were an effect in favor of shoes, I would certainly think we'd have seen a study by now (this is something correctly pointed out by McDougall and other advocates of barefoot running). Therefore, don't be surprised if you see me running with feet au-naturel someday soon.

Wednesday, March 30, 2011

Detecting cheating

In my professional work, I like being the statistical sleuth, trying to figure out whether a person or company cheated, and how much they cheated. Thus it was with a lot of interest that I read a recent article in USA Today describing suspicious activity that went on some standardized tests in DC schools.

It seems that standardized tests at certain DC schools have improved dramatically. For example, the article says, "in 2008, 84% of fourth-grade math students were listed as proficient or advanced, up from 22% for the previous fourth-grade class." Of course, this could just be part of the amazing turn around.

However, the review found that this dramatic change corresponded with a another interesting statistic: the school had a very high number of erased answers that were changed from wrong answers to right answers (WTR erasures). Again, here's what the article said: "On the 2009 reading test, for example, seventh-graders in one Noyes classroom averaged 12.7 wrong-to-right erasures per student on answer sheets; the average for seventh-graders in all D.C. schools on that test was less than 1. The odds are better for winning the Powerball grand prize than having that many erasures by chance."

Here's my problem with this logic: the calculation of the chances assumes that each student is acting independently and erasing much more than usual. In other words, the chances are calculated assuming that the students are randomly grouped by school with respect to the number of WTR erasures they have, and thus no school should have a particularly high or low number of erasures: number of erasures and the associated school would be statistically independent.

This statistical independence assumption falls apart if there is cheating, wherein teachers erase wrong answers and change them to correct answers after the test is completed. However, the statistical independence assumption also could also fall apart for innocuous reasons.

Suppose the students at this school were instructed to arbitrarily fill in the last 10 questions immediately upon beginning the exam (this might be a good strategy if there is no penalty for guessing and if many students do not finish the exam). Then, for the ones who get to the end of the test, they are erasing most of their guesses. This is a completely legitimate strategy, but it would make raise the number of WTR erasures a great deal. A lot of more complicated test taking strategies would also lead to more erasures, and if this school in particular taught those strategies, there would be a very high chance that there would be far more erasures at this school than at others, and some of the people interviewed cited strategies that may have led to more erasures.

Thus, the high erasure rate, even WTR erasures, may have a relatively simple explanation: this school effectively coached the kids in test taking while other schools did not or coached the children differently.

The article provides a link to several documents summarizing the results of the analysis. What I find interesting is that the worst school, BS Monroe ES, in terms of WTR erasures, also has a lot of WTW (wrong to wrong erasures). On average, this school has about between 2 and 3 WTW erasures per student, or about 1 WTW for every 5 WTR erasures. A more interesting, and I think more revealing, analysis would be to see how this ratio compares to the normal ratio. If the normal ratio is 1 WTW to 5 WTR, it indicates cheating may not have been the reason for the erasures (unless the cheaters were purposefully erasing some and changing them to wrong answers--which seems unliklely since there is no indication potential cheaters realized erasures could be detected at all). If the general ratio is far from 5 to 1, it would be another indicator of a different process going on at BS Monroe ES, perhaps involving cheating though it is still hard to rule out other, innocuous explanations that involve test-taking strategy.

Another analysis would be to look at the WTR vs. WTW erasures student by student. Presumably, students who answered a higher percentage of un-erased problems correctly would have a better ratio of WTR to WTW erasures. If that were not true, then it would lead more clearly to the conclusion that someone else was doing the erasing.

The research revealed in the article shows the correlation of two things: a dramatic increase in test scores and a dramatic number of WTR erasures. Cheating is one explanation for these increases. Another, however, is the implementation of a smart test-taking strategy at the school, which might well be part of an overall program to increase the test scores and improve the school. A statistical test can have a seemingly dramatic result (less likely than winning the lottery), but while defeating a specific hypothesis (independence of erasures by school), it doesn't necessarily prove another hypothesis (cheating).




Tuesday, September 28, 2010

Throw away your cold medicine again?

A couple years ago, I wrote about a study that looked at the effect of a seawater nasal spray on the health of children (see that post).
Yesterday's New York Times, explored a very similar claim. Anahad O'Connor's column, "Really? The Claim: Gargling With Salt Water Can Ease Cold Symptoms," looks at a study of 387 Japanese adults aged 18 to 65 (see this page for an abstract). Treatment groups gargled with PLAIN water or a "povidone-iodine" solution. Those gargling with plain water did the best, with 0.17 URTIs (upper respiratory tract infections) every 30 person-days, meaning about 1 in 6 get a URTI per month if they gargle with water. The control group had a rate of .26, meaning about 1 in 4 got a URTI. The iodine group had a rate of .24, also meaning about 1 in 4 go a URTI.

So water looks pretty good. The only caveat, and it is the same as the issue I mentioned in the earlier post, is that the outcomes were self-measured. The people doing the gargling reported whether or not they had a URTI. IN Japan, where the study was performed, there is a strong bias toward water gargling, at least according to the abstract of the study, which says: "Gargling to wash the throat is commonly performed in Japan, and people believe that such hygienic routine, especially with gargle medicine, prevents upper respiratory tract infections (URTIs)." In fact, the article reports that those in the control group gargled one time a day on average as well l (but those in the treated group gargled around 3 times a day). This affinity for water gargling and the belief that it stops infection may result in water-gargles reporting fewer infections, thus throwing the results of the study into question.

The New York Times, by the way, gives recommendations based on an upcoming book by Philip Hagen, to gargle with *salt* water, but cites this study, which is referring to *plain* water only.

My conclusion? If you THINK it is going to work, it's fairly likely water gargling will be effective, and it is a lot cheaper than buying some kind of preventative medicine. If you don't think it will work, this study provides little help in deciding whether it actually will work.

Monday, March 15, 2010

You asked for it, you got it. Toyota!

I think that's how the ad line went. When? maybe 25 years ago.

Well, it seems to apply now. Sudden acceleration. Mention a problem with a car, any problem with any car, and people will start crawling out of the wood-work with the complaint. Why? It's a numbers game. There were more than 100,000 pri-i(?) sold in the US in 2005-9. With that many people driving them around, any tiny problem that is reported is going to be "substantiated" by others. Those of us old enough to remember the Audi 5000 found the high correlation between those Audi's with sudden acceleration and those sold to 85 year-old ladies inexplicable (studies mostly concluded it was driver error--see a recent article here in Wired).

The latest, after the brake-related Prius recall, is the claim of sudden acceleration. A guy in California managed to call 911 while it was happening--pretty amazing, huh? Unless, of course, you made it all up. Here's what the current thoughts about it are (from wikipedia):
"On March 8, 2010, a 2008 Prius allegedly uncontrollably accelerated to 94 miles per hour on a California Highway (US), and the Prius had to be stopped with the verbal assistance of the California Highway Patrol as news cameras watched [86]. Subsequent to the event, media investigations uncovered suspicious information about the alleged runaway Prius driver, 61-year old James Sikes, including false police reports, suspect insurance claims, theft and fraud allegations, television aspirations, and bankruptcy.[87][88] Sikes was found to be US$19,000 behind in his Prius car payments and had $US700,000 in accumulated debt.[87] Sikes stated he wanted a new car as compensation for the incident.[87][89] Analyses by Edmunds.com and Forbes found Sikes' acceleration claims and fears of shifting to neutral implausible, with Edmunds concluding that "in other words, this is BS",[90] and Forbes comparing it to the balloon boy hoax.[88]"

Notwithstanding the apparent CA tale above, the reality is that the rare problem is a tough nut to crack statistically. Suppose there is an issue in 1 in 10,000 Prius' and that this issue only crops up on one in 10,000 rides on those cars. Thus, it's a 1 in 100 million car rides in Prius. Even among those, it may be a very short-lived problem and not cause any injury or accident. Such a rare problem might be drowned out by other driver error problems, such as accidently hitting the gas instead of the break, perceiving that the car is accelarating when it is not, hitting both the gas and the break simultaneously in an attempt to hit the break. Each of these things can be exceedingly rare (1 in a million) and still be 100 times as common as the real problem.

There are other ways to go about teasing out rare events. In the lab, a machine could possibly simulate conditions that were occurring when the supposed sudden acceleration took place and see if it is repeatable. Yet these conditions are hard to figure out, as they are determined with the imperfect information of the person reporting the incident. As might be the case with the recent report, that person could be lying, but even if not, they are likely shooken up enough that they cannot remember the exact conditions very well. Consider airline crashes, where we often have very objective information (the black box), but it is still very difficult to figure out what happened and why.

One thing seems certain to be true: we won't know whether or not Prius cars are at fault for a long time to come, and far fewer of them will be bought in the next couple years.