Wednesday, February 25, 2015

Statistical Generalization and the News Media

I must admit that this is far from the purview of this blog, but I am taking advantage of the fact that I have a blog that I can use as a soap box.  I can understand the advertising industry using poor statistics so they can sell an item.  Example - "4 out of 5 mothers said 'great'."  Of course they didn't mention that 3 of those 4 said "oh, great. Another stupid survey."  Yes, every one of us have probably misused statistics at some time in our life. But I must admit to being furious about the almost incessant deluge of not just poor statistics but totally misleading, dangerously misleading generalization.  

When I was first doing my doctor's degree my committee told me that to have 1,000 case studies for a population of 3 million was not sufficient to generalize - i.e. draw a conclusion about what can be expected among 3 million people by testing a sample of 1,000.  Almost daily Gallop Poll, Fox News or some other media which people erroneously trust tells us that "all Americans believe ..." or "all Americans want..." or "all Americans will..." based upon less than 2,000 people. They are telling the gullible American public that what they're saying is true based upon 0.00057% of the US population.  (My original PhD study was based upon 0.03% and academically that wasn't considered good enough.)  This is generally done on purpose because people will buy things if they think 'everyone else' is buying it, and they will often vote for or like a candidate if they think that 'everyone else' likes and is going to vote for them. (How do I know this? Not from any statistics but from observing people in a 37 year psych practice. So I could be wrong, but I bet you I'm not.) 
Now, I do have to admit that there are a lot of factors in deciding upon an adequate sample, along with a very lengthy and complex formula, but the basic is simple - the higher the percentage of respondents the lower your margin of error and higher you confidence level.  If I have a room full of people - say 50 - and I want to generalize what they think about an issue, I have a much better chance of being correct the more people I ask.  
Here is a good example from www.dummies.com article Generalizing Statistical Results to the Entire Population
"For example, a researcher wants to know how cable news channels have influenced the way Americans get their news. He also happens to be a statistics professor at a large research institution and has 1,000 students in his classes. He decides that instead of taking a random sample of Americans, which would be difficult, time-consuming, and expensive, he will just put a question on his final exam to get his students' answers. His data analysis shows that only 5 percent of his students read the newspaper and/or watch network news programs anymore; the rest watch cable news. For his class, the ratio of students who exclusively watch cable news compared to those students who don't is 20 to 1. The professor reports this and sends out a press release about it. The cable news channels pick up on it and the next day are reporting, "Americans choose cable news channels over newspapers and network news by a 20-to-1 margin!"
Do you see what's wrong with this picture? The problem is that the professor's conclusions go way beyond his study, which is wrong. He used the students in his statistics class to obtain the data that serves as the basis for his entire report and the resulting headline. Yet the professor reports the results about all Americans. It's safe to say that a sample of 1,000 college students taking a statistics class at the same time at the same college doesn't represent a cross section of America."   (ref.: http://www.dummies.com/how-to/content/generalizing-statistical-results-to-the-entire-pop.html)  
This is what organizations like Gallop and Fox do day after day after day. And we eat it up and repeat it as absolute truth ...  All Americans ... because Gallop or Fox did a study. Most of you who have followed me on FaceBook for years know that the first thing I do when anyone quotes a study is to check out the study. And more often than I like I have to try to find a nice way to tell a friend that they've fallen for a bunch of bull shit. 
You don't really need to be able to use the formula  E = zα/2/(2√ n)  [margin or error] or n = [ (z2 * p * q ) + ME2 ] / [ ME2 + z2 * p * q / N ]  to know the so-called pollsters are giving you bad statistics.  Since most people can't work those formula and even graduate students, who use statistics constantly, shudder at the thought, we can apply some good old fashioned common sense. When some news media or politician or corporation tells you "Americans think ..." ask some questions: (1) how many people did they ask?  It's hard to suggest that knowing what 1250 people think you therefore know what 350,000,000 Americans think.  (2) how did they gather their sample - i.e. is it really a random sample or is it like the Professor above?  If Fox news talks to 6 young people in a Vermont town and then wants to tell us everyone in Vermont thinks ... I wouldn't believe it.  Knowing the difference between NYC and Upstate NY, do you think you could generalize what all New Yorkers think because they asked a group from NYC?  Do you think you could generalize what all Illini think if the sample is all from Chicago?  I don't think people in Metropolis, IL would think so. (3) Was the sample representative?  In the example above all of the respondents were college students taking a statistics class. That isn't representative.  In the Fox news feature about Vermont all of the people to whom they spoke were young people from the same town.  Do you think someone can walk into your town, pick a handful of young people off the street and be able to say what your entire state thinks or believes?  Common sense would say "no". Diversity within a town makes good random sampling difficult. Diversity within a State makes good random sampling and statistical analysis a nightmare. So how could you feel at all confident of a generalized statement "all Americans believe ... want ... think" based upon <2000 people who may have come out of the same telephone directory?  

Last but not least, I have to admit something which every graduate student from the beginning of time has know but doesn't really want to admit.  There's an old adage "figures lie and liars figure."  Some years ago I was teaching gerontology and supervised doctoral students. I was the one the grad students didn't want on their committee because the only reason the church college put up with me was because I kept their statistics honest and they were going for a more prestigious accreditation. I haven't done any serious statistics since I stopped, but I still remember enough that I would give you almost any odds you wanted that I could take any set of data and draw a completely different conclusion without changing the data. How?  In almost every statistical analysis (especially the common ones used by most graduate students) the data, at some point, is in a cell, group, block or something similar. All you have to do is change the definitions of the cell, group, block, etc., so that it causes some data to be in a different cell. Take my word for it. It can be done.  Just ask any grad student who has been faced with their data showing absolutely nothing, and they've been faced with the real temptation to play the statistics. The thing is the grad student knows they're going to get caught and that's bad.  Gallop, Fox New, et al, don't have that worry.  

So what can you believe?  Unfortunately, not much.  Everyone has an agenda and I hate to admit that I have less confidence in professional integrity now than ever before. The professor in the example above had a contract and wanted to save money. The cable news company that contracted him was fine with that and they skewed it a bit farther. It was bad statistics in the first place that were made worse. You can be skeptical without being a pessimist. Be a scientist . . . a detective. Even if you have some level of confidence in your sources, check them out. Every properly prepared statistical analysis will tell you (i) how their research was designed, (ii) how their groups are defined, (iii) how they gathered their data, (iv) how they managed their data, and (v) what test(s) or formula(s) they used to arrive at their conclusion. If they are unwilling to tell you this information, don't believe a word they say.  If they do provide this information, then apply the questions: (1) how many people did they survey, question, or interview? (2) What percentage is their sample of the population to which they are generalizing the results? (3) Was the collection random? Disney has their data collectors watch a gate or some other physical feature and ask every x adult who passes that spot. (4)  Was the sample representative. An entire sample from Bucksnort, TN (which is a real place southwest of Nashville in Duck Hollow) can't represent Tennessee nevertheless the US.  (5) Considered variables - race, sex, age, even time of day, season, holidays, etc. Most of my doctoral students failed because they failed to consider the variables that can skew data.  

Even when you have extensive statistical experience it is hard to know what to believe. But there are those things which cause red lights to start flashing and sirens go off in my head. Don't just believe a source (news media, politician, political party, etc.) because you like them.  Use common sense -  if I say that "everyone in Evansville, IN wants the teen drinking age lowered to 15" and you find that I interviewed 12 teenagers at Eastland Mall, my conclusions should be suspect.  What should you think about a statement "95% of people interviewed felt Michigan State is a horrible university" and then you find that the sample was 100% from Columbus, Ohio?   There was a movie (I'm not going to take time to try to find it) that had a woman pointing at her ear and saying something like "this is a highly developed bullshit detector" . . . you need to develop such a detector.  

It is bad enough that we have to tolerate news media, politicians, political parties, corporations and all sort of public figures handing us bullshit and expecting us to believe it.  What is worse is that so many people very innocently do believe it.  I've known educated people who believed that if it was in print it must be true.  It takes a lot of effort not to be misled.  Please make the effort to resist bad "facts", "statistics", "studies" and do your best to encourage others to do likewise. We can do no more than try.  


















No comments:

Post a Comment