Stories tagged statistics

Nov
24
2010

Gaze into the crystal ball to see what your future holds...: That's right, it holds a whole bunch of JGordon. Deal with it.
Gaze into the crystal ball to see what your future holds...: That's right, it holds a whole bunch of JGordon. Deal with it.Courtesy mararie
So…

So, Buzzketeers, you’ve been keeping something from me.

I thought we had something. I thought that we had a solid relationship built on trust, like… like a really nice but not fancy bungalow built on bedrock that doesn’t lie to you, or withhold information. Yeah, sure, I sometimes mislead you, or pass on scientifically suspect material, but that’s different. That’s for my own entertainment.

And don’t act like you don’t know what I’m going to say. That’s the whole thing, isn’t it?

Yeah, I heard. I heard that y’all can predict the future. And you didn’t tell me. Me, JGordon.

When were you going to mention that the secret’s out, and that a Cornell University scientist had demonstrated a small but statistically significant propensity for people (you) to predict the future? Were you going to wait until after the holidays, so as not to spoil the “surprise” of my gifts to you? Whatever. I got you all cashmere scarves. I thought you’d really like them. Surprise.

I hear that you were shown a list of words, and were able to recall mostly words from the list that would later be randomly selected by a computer. That’s super neat. Thanks for telling me about it.

I hear, too, that you were able to correctly able to predict 53 percent of the time when a curtained computer screen would have a sexy image on it. Cool. Maybe if you spent less time looking at sexy pictures, our relationship wouldn’t be going the way it clearly is.

3% above a 50-50 chance may not seem like much, but we both know it’s significant. Nearly as significant as the fact that you never mentioned it to me.

Do you know how that feels? I’ll show you how it feels:

1: Tomorrow, most of you American Buzzketeers will eat turkey.

2: Tomorrow, one of you will be eaten by a turkey, or turkeys.

3: Prince Philip will say something of questionable taste at his grandson’s wedding, probably to a woman or a foreign dignitary.

4. Your dad has a secret family in another state.

How do you like them apples? It’s not very fun, is it?

In any case, while the study stood up to the careful peer review of The Journal of Personality and Social Psychology, it’s probably all statistical wheeling and dealing, and I’d like to think that we can get over this.

I trust you’d tell me if you knew we wouldn’t.

Sep
27
2009

H1N1 vaccination
H1N1 vaccinationCourtesy AJC1

How do I know it is safe?

"The recurring question is, 'How do we know it's safe?'" said Dr. Gregory Poland of the Mayo Clinic. What if, after getting a flu shot, a person goes home. then suddenly has a heart attack. Was the heart attack a side effect of the flu shot?

More than 3,000 people a day have a heart attack. This happens when no flu shots are given. When no flu shots are given, from 14,000 to 19,000 miscarriages happen every week.
When we start giving flu shots to 100s of millions of people, how do we differentiate side effects caused by the vaccination, from what would have happened even without the vaccination?

Intensive monitoring of side effects planned

This year there will be intense new monitoring.

Harvard Medical School scientists are linking large insurance databases that cover up to 50 million people with vaccination registries around the country for real-time checks of whether people see a doctor in the weeks after a flu shot and why. The huge numbers make it possible to quickly compare rates of complaints among the vaccinated and unvaccinated, said the project leader, Dr. Richard Platt, Harvard's population medicine chief.

Johns Hopkins University will direct e-mails to at least 100,000 vaccine recipients to track how they're feeling, including the smaller complaints that wouldn't prompt a doctor visit. If anything seems connected, researchers can call to follow up with detailed questions.

The Centers for Disease Control and Prevention is preparing take-home cards that tell vaccine recipients how to report any suspected side effects to the nation's Vaccine Adverse Event Reporting system.

However the flu season turns out, the extra vaccine tracking promises a lasting impact.

"Part of what we hope is that it will teach us something about how to monitor the safety of all medical products quickly," said Harvard's Platt.

Source: Associated Press

Some numbers are more random than others

If you asked 100 people to choose a "random" number (digit) from the numbers 0 through 9 you would not expect any number to be picked more than any other. Studies show that the number "7" is picked six times more often than the number "5".

In the Iran electon 29 provinces voted for 4 people yielding 116 numbers. These numbers describing how many votes each person got in each province should have ended randomly with numerals 0 to 9.

They did not.

There were about six times as many that ended with a 7 as ending with a 5.

This post in Cognitive Daily has a Nice analysis of why the Iranian election is probably fraudulent.

Feb
28
2009

The hounds of spring are on winter’s traces: It appears that winter and summer temperatures are yoked together.  (This photo is for you, Thor.  I'm a cat person myself. Title explained here.)
The hounds of spring are on winter’s traces: It appears that winter and summer temperatures are yoked together. (This photo is for you, Thor. I'm a cat person myself. Title explained here.)Courtesy sbpoet

Last week, Candace asked whether there’s a connection between winter temperatures and summer temperatures. She noted that the winter of 2007-08 was pretty cold by recent standards, and the following summer was cool as well. Is there something going on here?

Liza searched the Web but couldn’t find anything definitive. I (after pooh-poohing the idea that this has been a warm winter – if you want to pay my heating bill, you’re welcome to it!) decided to crunch the numbers.

First, I went to this site. It records monthly average temperatures going back to December 1978.

Actually, it records temperature anomalies – whether the observed temperature in a given month is higher or lower than the average. They use the 20 years of 1979-1998 as their baseline. A reading of 1.00 means the temperature was 1 degree Centigrade (1.8 °F) warmer than expected. (One degree may not sound like much, until you realize it means 1 degree of every minute of every hour of every day. It quickly adds up to a lot of heat.) A reading of -1.00 means it was 1 degree cooler.

Using temperature anomalies is good for this exercise, as it removes the effects of global warming. Because global temperatures rose during the period under study, a “warm” winter in the late ‘70s might be considered only “average” today. In fact, that’s exactly what happened last winter. It was almost perfectly average by historical standards, but because recent winters have been so much warmer, it felt cold to us.

Getting back to Candace’s question: does a warm winter predict a warm summer? To answer this question, we have to calculate my all-time favorite statistical formula, the coefficient of correlation!

It sounds like a mouthful, but it’s a pretty easy concept to grasp. The coefficient of correlation measures how tightly two sets of numbers go together. For example, if you surveyed 100 people, and asked each one what year they were born, and how old they were, you would find that every single person born in 1990 was the exact same age. The first number (year of birth) and the second number (age) are linked together 100%.

OTOH, if you asked those people for the last digit in their telephone number, you would find no relationship whatsoever. A person born in 1990 is just as likely to have a phone number ending in 9 as ending in any other number, and the same goes for people born in every year.

Calculating the coefficient of correlation (or “coco,” as I affectionately call her), requires wading through a truly horrific battery of equations all to arrive at a number between 0 and 1. A coefficient of 1 means the two sets of numbers are perfectly synched together; a coefficient of 0 means there is no connection whatsoever.

So, I went back to the temperature data. First, I defined “winter” the same way the weather bureau does: December, January and February, the three coldest months of the year. (None of that solstice-equinox nonsense here!) I defined “summer” as the three warmest months: June, July and August, again following weather bureau standards. Using the Northern Hemisphere Land figures (sorry, they didn’t have anything Minnesota-specific), I came up with an average anomaly for every winter and every summer. I crammed the numbers into the formula, turned the crank, and came up with a coefficient of…

(drum roll, please)

0.71

OK, now what does that mean?

Well, in general, a score below 0.30 is considered inconclusive. It’s too close to zero—the “relationship” could just be random. A score between 0.30 and 0.50 is generally considered moderate—there’s a connection there, but it’s somewhat weak. A score over 0.50 is generally considered strong—there’s definitely something important going on there.

(This is especially true in highly complex systems, like weather, where a lot of different factors can affect your results. In a very simple system, you’d probably want a result much closer to 1.)

It all boils down to this: we can be more than 99% certain that, yes, there is a connection between a warm winter and a warm summer, or a cold winter and a cool summer. How much of a connection? For that, we need another figure, the coefficient of determination.

This one is much easier. Just square the coefficient of correlation. 0.71 squared yields 0.5041. That means 50% of the variability in summer temperatures is determined by the winter temperatures.

And “variability” is the key. Like I said, weather is an extremely complex system. Lots of things can affect the temperature for a day, a week, even a season. The fact that this winter is warmer than last winter does not guarantee that this coming summer will be warmer than last summer. (For example, the winter of 2003-04 was one of the warmest on record, but the following summer was one of the coolest in the study period.)

What this number does mean is, that of all the factors that will affect next summer’s temperatures, half of them seem to be connected to winter temperatures. And this winter was warmer than last winter.

Just for fun, I also ran the calculations the other way, to see if a warm summer predicts a warm winter. The coefficient of correlation was 0.54, and the coefficient of determination was 0.29. So, again, there is a connection, but it seems to b a good deal weaker.

A word of caution: one thing statisticians like to say is “correlation is not causation.” Partly because it’s fun to say, but mostly because it’s true. Just because two things are correlated does not mean one causes the other. We have not proven that warm winters cause warm summers. It could be that winter temps and summer temps are both boosted by some other factor – El Nino, perhaps. All we can say is that there is some sort of connection going on, and that it probably wouldn’t hurt to lay in some tanning cream now.

Jan
29
2009

A new analysis of passenger survival rates aboard the Titanic reveals interesting cultural differences. Behavioral economists, David Savage and Bruno Frey found that British passengers had a 10 percent lower chance of survival than any other nationality aboard the Titanic. These findings contracted their original hypothesis:

"The Titanic was built in Great Britain, operated by British subjects, and manned by a British crew. It is to be expected that national ties were activated during the disaster and that the crew would give preference to British subjects, easily identified by their language."

The survival rate for American passengers was 12 percent higher than the British. Americans reportedly fought their way to the lifeboats, whereas the British politely waited in line.

Savage and Frey's analysis revealed other disparities in survival rate. More women survived than men and children (aged 15 or younger) were more likely to live than elderly.

"Be British, boys, be British!" the captain, Edward John Smith, shouted out, according to witnesses.

The Titanic captain referred to the social norm of putting women and children first. The passenger survival data suggests that this did occur.

You can learn a lot more about one of the worst maritime disasters in history. The Science Museum of Minnesota is hosting an exhibit on the Titanic opening this summer.

I was very excited to listen to Barack Obama's inauguration address and hear him speak the words, science, data, and statistics with pride and emphasis. We will keep a watchful eye over the next four years to make sure that science policy adheres to the agenda and principles that our new president has set out.

Nov
02
2008

So, I open up my web browser this weekend to check the news, and I see the following three polls, all on the same page:

  • Rasmussen: Obama up by 5 points
  • Gallup: Obama up by 10 point
  • Zogby: McCain up by 1 point

These can’t all be right, can they?

Actually, they can. Or, at least, they can all be properly conducted, and just lead to wildly different results.

The only way to get a perfect result is to interview everyone in the country. (In fact, that’s exactly what we do on Election Day.) But that takes so much time and money that no individual pollster can do it. Instead, they interview several hundred people, maybe a couple thousand, and from there extrapolate what the country as a whole will do.

Now, mathematically, you can do this. You just can’t be sure of your answer. Here are a few of the reasons why.

Margin of error

Most opinion polls will state the margin of error. For example, they may say that that Candidate X is ahead by, say, 5 points, with a margin of error of plus-or-minus 3 points. Meaning, the real answer could be as high as 8 points or as low as 2 points.

(Sometimes, the margin of error is actually larger than the result. The poll shows Candidate X leading by 2 points, but with a margin of error of 4 points. Meaning, he could be ahead by 6, or he could actually be behind by 2! This seems to have happened a lot this year.)

A range of a few percentage points, when applied to a country with over 100 million voters, can lead to some pretty huge differences.

Confidence interval

In addition to reporting a margin of error, polls also report a confidence interval, usually 90% or 95%. This means that, according to the laws of mathematics, there is a 95% probability that the real result is the same as the poll result, within the margin of error.

But what about the other 5% or 10% of the time? Well, the folks reporting the numbers don’t like to tell you this, but, mathematically speaking, the poll can do everything right, and still be completely wrong, as much as 10% of the time.

There have been over 700 polls released this election season, and over 200 just in October. No doubt, many of the polls you have heard about fall into this category.

Weighting

In most elections, more women vote than men. If you conduct a survey and talk to 100 men and 100 women, you are going to have to give the women’s answers more weight to accurately reflect the Election Day results.

How much more weight? That depends. Do you think this election will be pretty much the same as previous years? Is there something happening this year that will make a lot more women come out to vote? Or, perhaps, something that will attract a lot more men?

The fact is, nobody knows. Weighting is just educated guesswork. And this year, it is more complicated than usual:

  • Black voters are expected to come out in record numbers to support Barack Obama. How many will actually vote? Nobody knows.
  • Young people generally do not vote as much as other groups. But many analysts expect more young people to vote this year. How many more? Nobody knows.
  • Democrats, having lost the last two elections, are likely to turn out in larger numbers. How much larger? Nobody knows.
  • New voters. There has been a great push to register new voters, many of them poor or minority members. These groups tend to vote Democratic. But, the NY Times reports that as many as 60% of those registrations may be fake. If you are a poll taker, you really have no idea how many new voters there actually are.
  • Likely voters. Every pollster ends up talking to some people who will not vote on Election Day. Different polls use different methods of figuring out who is likely to actually vote—based on whether they voted last time, how much interest they have in the election, or just taking the voter’s word for it.

The different weighting factors used by the different polls probably accounts for most of the variability we see in the results.

Human factors

Let’s face it – humans are complicated and sometimes uncooperative beings. There are lots of ways they can foul up a perfectly good poll.

  • Lying. People have a tendency to tell a pollster what they think he wants to hear. Maybe they just want to be nice; maybe they want to avoid an argument. This skewing has been found in many, many types of polls.
  • Refusal. In any poll, a certain number of people refuse to participate—they don’t want to be bothered, or they don’t want to talk politics with a stranger. If refusals are more likely to come from one party than the other, this can skew results.
  • Hard-to-reach folks. For years, pollsters called people on the telephone. But today more and more people have cell phones, or call screening, and are hard to reach. Again, if people with such gadgets are more likely to support one candidate or the other, it will skew the results. (One blogger has noticed that John McCain does better in polls conducted during the week than on the weekend, and speculates that's when McCain supporters are home.)
  • Bias. Poll takers are only human. They have their own thoughts and opinions. And while they take great pains to be neutral, those opinions sometimes come through in the questions asked or the way the answers are interpreted. Most of the major polling companies are based in big cities like New York, Los Angeles, Boston, or Washington, DC – all cities that are heavily Democratic. This may explain why, in the last two elections, the Democrats did better in the polls than they did on Election Day.

So, with all these problems, how can we figure out who is going to win the election? Well, never fear – there is one sure-fire way to find out the winner:

Read the newspaper Wednesday morning.

And don’t forget to vote!

Oct
15
2008

With the latest thrashing by the Devil Rays -> are the Boston Red Sox showing that the statistically better team can always be eliminated by a "hot" team?

a) Yes - win streaks are more powerful than long term stats
b) No - Long term statistics should be the deciding factor of the contests
c) Neither -> head to head is the only way to decide who is the best team

Jun
24
2008

Opps! There's one!: So I guess it's 17 now.
Opps! There's one!: So I guess it's 17 now.Courtesy Minnete Layne
Well, if you were feeling anxious about there being no more undiscovered sea monsters, chill out. There are still some out there. About 18, to be specific.

See, ever since Science’s parents (Magic and Critical Thought) stopped putting Science’s stuff up on the fridge, Science has really been going out of its way to make sure we all know how special it is.

We get it, Science, you’re great. Take it easy.

As if.

Science, in its latest flailing and pathetic play for attention, has announced that there are indeed more huge, unknown sea creatures out there, and it knows that there are 18 of them.

Okay, Science, whatever you say. Act like you know.

But, no, Science goes on to explain, here’s my reasoning: If we first decide that a body length exceeding 1.8 meters defines a large sea creature (which, by the way, makes JGordon a large sea creature by 3 cm when he goes swimming), we can then look at the rate at which large sea creatures have been discovered in the last 180 years or so. The rate of discovery for large sea creatures remains pretty strong, and if you consider the places large sea creatures could be hiding, deep in the oceans, or under polar ice, say, it’s very likely that there are quite a few of them left to find. Using some flashy statistical modeling, Science predicts that there could be as many as 18 of these large sea creatures still undiscovered.

Science goes on to emphasize that there probably aren’t any monsters hiding out in lakes and lochs, and that accounts of sea serpents and their ilk can probably all be explained by known creatures, like colossal squid, and 30 plus-foot oarfish. Ah, thanks for that, Science.

Still, Science doesn’t hold all the cards. It may know that there are 18 monsters still hiding out there, but I know exactly what they are. Deal with it, Science.

Anguirus
Baragon
Destroyah
Ebirah
Gabara
Ganime
Gigan
Gorosaurus
Kamoebas
King Caesar
King Ghidorah
Kumonga
Megalon
Minilla
Mothra
Rodan
Urkel
Varan

Apr
04
2008

Fair and balanced: Or leaning to the left, as it often does.  Journalism often presents science in less-than-accurate ways.
Fair and balanced: Or leaning to the left, as it often does. Journalism often presents science in less-than-accurate ways.Courtesy nick farnhill

You'll never find any of that here! ;-) But the American Association for the Advancement of Science recently discussed how the public receives and understands science news. The situation is discouraging – there’s a lot of bad information out there, much of it the result of sloppy reporting. One of the big culprits was a misunderstanding and misrepresentation of statistics.

Meanwhile, biochemist Michael White complains about how the human desire to tell a good story often misrepresents how science really works.