December 28, 2012
The Wall Street Journal
Statistical Habits to Add, or Subtract, in 2013
By Carl Bialik
Read the story below or view it on The Wall Street Journal website.
In the year ending Monday, we saw some gains in statistical savvy: Data-crunching pollsters accurately forecast the outcome of the presidential election; the Memphis Grizzlies hired a vice president of basketball operations for his statistical expertise; and folks grew comfortable with the phrase "big data," to describe the billions of billions of bytes generated daily by information technology.
The growing importance of statistical analysis is set to be a theme of next year, too, with more than 150 professional organizations worldwide, including the American Statistical Association, designating 2013 as the International Year of Statistics.
All of which should increase the pressure on writers, public speakers, researchers and corporate officials to improve how they present data to the public. I asked a half-dozen professional statisticians for their pet peeves about how numbers are presented and what resolutions they would suggest statistically minded people make for the new year. I also solicited ideas from readers on my blog.
A major concern running through the responses is that data tend to be fuzzy—much fuzzier than they can seem when stated with neither margins of error nor qualification. "The most important numerical fallacy is that people tend to think of numbers as known, constant and having no variability," said Donald Berry, a biostatistician at the University of Texas MD Anderson Cancer Center in Houston.
Many readers agreed. Richard Hoffbeck, a retired research data analyst in Minneapolis, asked for reports on job numbers and economic forecasts to include estimates of uncertainty—the statistical margin of error that is common in poll reporting.
"I never see them mentioned," he said.
Statisticians and readers also suggested ways to avoid some common traps next year:
Be patient: Don't rush to anoint the next big stock, or slugger or pill. A small sample size can yield extreme results just by chance. A company or athlete on a hot streak may be really good, but likely isn't as good as, say, two record weeks would suggest.
Statisticians call this phenomenon regression to the mean, and taking it into account involves using prior knowledge. In the case of a previously untouted stock or category of drug, the prior knowledge is that most stock prices don't skyrocket and most tested drugs don't work. So treat short-term gains with caution.
With a baseball player, a hot start is likely to fizzle. Prof. Berry predicts that whoever is leading the major leagues in batting average on May 15 next season won't be able to sustain the hot start.
"His batting average is going to drop," he said. "You can bank on it."
Provide context: One reader complained that too many commentators report the daily movement in stock indexes in terms of points gained or lost, rather than percentages. "These are meaningless numbers without knowing what the baseline is," the reader wrote.
Context is also useful in medical studies.
A seemingly large effect from a drug discovered in an observational study carries much less weight than one found in an experimental, double-blind study, said Robert N. Rodriguez, president of the American Statistical Association. "Experimental studies are much more reliable for deciding whether a factor has a causal effect."
An experimental study would randomly assign people to two groups that receive different treatments, while an observational one would compare people in the real world who already get one of the two treatments.
Conversely, the fact that a single study doesn't find a meaningful effect doesn't mean there is none—the study might be flawed, or the effect might be too small to be picked up. "The crux of the problem is that people interpret an absence of evidence as evidence for absence," said Carlisle Rainey, a doctoral student in political science and statistics at Florida State University.
Gregory Taylor, a math teacher in Ottawa, said that too often the context of a percentage or fraction is left out. "A diet that is effective on eight out of 10 people won't necessarily be useful to me if it was only tested on teenagers," he said. "Too often people focus on the number, and miss what it's talking about."
Context can also help temper excitement about apparent record breakers. That box-office mark? Try adjusting it for ticket-price inflation.
Believe in miracles: Seemingly improbable events do happen. People win the lottery twice, have three children years apart with the same birth date, or bump into old acquaintances on the other side of the world.
To the people involved, such occurrences may seem providential, but it is worth remembering that with seven billion people in the world, the same thing could happen to a lot of people.
The probability of a seemingly surprising coincidence, like winning the lottery twice, "is actually quite high, if you mean anyone, anytime" winning for a second time, said Jessica Utts, who heads the statistics department at the University of California, Irvine.
Statistical Habits to Add, or Subtract, in 2013