Quick Links
Can't display this module in this section.
Can't display this module in this section.
Community Partners
Can't display this module in this section.
Cultural Cornerstones
Can't display this module in this section.
River Friends
Can't display this module in this section.
Parks & Landmarks
Can't display this module in this section.
Search Mill City
Can't display this module in this section.
Recent News
Can't display this module in this section.
Front Page Archives
Can't display this module in this section.
Tuesday
Oct252022

The Statistical "Big Lie"

Article by Doug Berdie

Over and over again, when news media report the results of polls, they proudly, yet inaccurately, present the so-called "margin of error" they believe is associated with those results.  So, on the August 22 episode of "Morning Joe", the commentator proudly boasted that the poll results he was reporting, based on 1,000 responses, had a "margin of error" of 3 points.  Lack of knowledge regarding what is technically referred to as "statistical precision" underlies this type of "Big Lie" that continually misleads the public.

Why are these so-called "margins of error" incorrectly reported?  Several key reasons underlie this problem.  Among these are 7 key requirement that are often violated, including  (1) the underlying assumption that a scientifically accurate random sample of people has been surveyed; (2) all (i.e., 100%) of the people selected in that random sample actually responded to the survey; (3) the questions people were asked were so clearly worded that everyone polled understood them in the same way--and in the way in which they were intended to be understood; (4) the questions that were asked were not biased in any way--i.e., did not commit any of the many known wording problems that bias responses; (5) the correct statistical formulae were used to calculate the "margin of error"/"statistical precision"--given that different types of questions require different formulae; (6) the statistical confidence level associated with the reported "margin of error"/"statistical precision" was reported; and (7) the "margin of error"/"statistical precision" value is reported for each question about which people were polled--as the value will vary question by question based on the answers people give the question.  Let's look at each of these critically important issues.  And, moving forward we will refer to the "margin of error" as the more correct "statistical precision."

First, without responses collected from a truly random, representative sample of people, none of the statistical reporting can be considered accurate.  For example, if samples to be polled are drawn from a list of registered Democrats, results obtained can in no way be generalized to all eligible voters, all registered voters, or any other group other than to the group from which the sample was drawn.  Many polls these days are of people who've been recruited to "panels" (either internet or phone), and results from those groups, though inexpensive to obtain, are only reflective of people willing to sign up for such polls--and not the public at large.  It's very expensive to obtain truly representative samples of the general public, and that's why many polling organizations take shortcuts in their selection process.  But, those shortcuts ensure that statistical precision numbers that are presented based on the assumption that the sample is representative are not correct.

Second, the formulae underlying statistical precision estimates evolved within the field of agriculture--assessing plant growth under varying situations.  In those situations, if a plant did not grow, or grew less than another, that result was deemed relevant and the difference reported.  In other words, there was no "nonresponse problem" because every plant behavior was registered.  Data from survey research methodological studies during the past 100 years has documented that nonresponse bias often exists in polls--especially when low percentages of those selected in the original random sample participate in the poll.  The extent of that bias varies from poll to poll--affecting statistical precision estimates in varying, usually unknown, manners.  It is common to "replace" nonrespondents with additional sample to get the desired number of poll responses, but this practice does not guarantee that the final sample is representative.  Only extensive follow-up techniques aimed at the original sample will get to a response rate high enough to give some faith that the statistical precision number generated is a decent estimate of what it would have been with a 100% response rate.

Third, vaguely worded questions abound in polls and that results in respondents interpreting them in varying ways.  I once asked people in a poll if they believe "more cultural opportunities" were needed in their neighborhood.  When about 80% said "yes," we met with them to see what, specifically, they wanted.  It turned out that some of the people had interpreted "cultural opportunities" to mean opportunities to interact with different cultures and people from those cultures, whereas others interpreted the phrase to mean artistic events like concerts in the park, art fairs, and other such "cultural" events.  Hence, we had in that situation no real indication of how people as a whole in that neighborhood felt.  Only by extensive pretesting and wording revisions can poll sponsors reach a state where they can be confident that most respondents are "answering the same question."  And, in these times when poll results are wanted ASAP, this care in wording is often short circuited.

Fourth, Stanley Payne's 1951 classic, "The Art of Asking Questions," has served as a guide serious professional survey researchers used to avoid asking questions that will lead to  biased responses.  For example, presenting only one side of an issue in a question can influence response by as much a 40 percentage points--or more.  This bias results from, for example, asking "Do you favor the U.S. response to the war in Ukraine?" as opposed to asking, "Do you favor the U.S. response to war in Ukraine or do you disapprove of that response?"  Again, only by careful pretesting of questions can one discover (and repair) the many wording problems that bias results--and make statistical precision estimates meaningless.

Fifth, many formulae exist to calculate statistical precision--some for questions with only two response options (e.g., "Yes" or "No" questions), some for questions with more than two response options (e.g., "How do you feel about X?" with options, "Strongly Approve," Approve," "Neither Approve Nor Disapprove," "Disapprove," "Strongly Disapprove"), some for questions that require a numeric response when one wishes to present an average (i.e., mean) as the result (e.g., "How many years have you lived in your current residence?"), etc.  And, because varying types of questions require different formulae, the statistical precision estimates vary from question to question.

Sixth, statistical precision estimates are necessarily associated with given "confidence levels."  In other words, one can say (roughly speaking), "I'm 90% confident that the result is within +/- 4 percentage points."  (The word 'confidence' does not refer to psychological comfort but, rather, to the percentage of times a random sample would yield results such as those reported.)  And, because one can state any confidence level one wants (with lower confidence levels yielding smaller "margins of error"), one can state pretty much whatever one wants.  Hence, for honest reporting, it is critical to report the confidence level underlying the statistical precision being reported (most often 90% or 95%).

Seventh, even questions of the same type (e.g., "Yes" - "No" questions) will have varying statistical precision estimates with, in this case, the questions with responses closest to "50% - 50%" having the largest "margins of error."  So, one cannot just give one number for an entire poll and say: "The poll results are within X percentage points"!

With the mid-term elections fast approaching, and news media reporting polling results almost daily, it is critically important that those of us who hear or read these poll results understand the above points that influence how "accurate" the polls are.  After the Trump-Clinton election, I recall hearing people say, "How could the polls have been so wrong?"  And, the answer to that question is that many of the above problems were inherent in those polls.  With the focus on getting speedy results into the media, there is often not time to collect truly reliable poll results.  For example, immediately after results are posted on a Wednesday, a major event may occur that people believe will change opinions of voters.  So, there's a rush to get another poll conducted (without pretesting questions, "grabbing" anyone who agrees to be surveyed--without attention to representativeness, etc.).  So, "Beware!"  Because the underlying factors affecting poll results that are reported are often not presented, it's up to those of us who see such results to critically review them.

Doug Berdie, Ph.D. has been in the marketing research/public opinion business for 40+ years, has taught such courses at universities and for other organizations, and is senior author of the text:  Questionnaires:  Design and Use.

« October 26 Bridge 9 Improvement Project Update | Main | Theatre in the Round Debuts Stage Adaptation of Agatha Christie's First Novel, The Mysterious Affair at Styles »