Ex Orienti Lux

Beware of Flawed Polling

Originally posted at The Moderate Voice

The contrast in quality between the news and opinion divisions of The Wall St. Journal never ceases to amaze me.

While the news reporting has won award after award, and has become the definitive source of news for the American business and financial communities, the opinion pages are characterized by what some would consider mendacity and methodological sloppiness or a genuine lack of understanding of the subject matter. Indeed:

The 13 March, 2009, op-ed by Doug Schoen and Scott Rasmussen is just another indication of why one should beware of anything written in those sections. Both authors are pollsters, and should therefore know better, but they didn’t provide full cross-tabs, sampling information, fielding dates or questionnaires for the polls they cite, and their analysis meanders back and forth between different polls. It is impossible to take any of the claims in this article completely seriously on their own merits.

[...]

In my first piece for The Moderate Voice, I discussed the problems with Mark Penn’s holding back the data products from his clients. Journalistic best practices also involve making the data products available, usually in the form of a filled-in questionnaire. Without providing this information, he was basically telling his clients to take him at his word. In print, where real estate is at a premium, the full cross-tabs or a filled-in questionnaire is infeasible, so all we get is sampling and fielding information. It usually looks like this: “The New York Times/Gallup poll called 1,000 adults evenly distributed across the country on the evenings of X,Y and Z. Respondents were evenly distributed for gender and race.” That is the bare minimum of information required to report on a poll. In web journalism, hyper-linking abolishes the real estate problem: you can link to PDF’s of the data you’re citing. In fact, most people do. It’s responsible journalism.

This is not just a minor issue of protocol, but of great import to any poll or analysis. The means of verification of a poll consist of the sampling, and fielding. Without a carefully constructed sample, questionnaire and fielding plan, a the data in a poll are absolutely meaningless. The meaning of a poll is in precisely these details. Any credible pollster engaging in bona fide research knows this, which is why we spend so much time on these details. Pollsters will spend a lot of time and effort into making sure that all of these things are perfectly designed to acquire the right information to discover the answers to the questions being asked. Any discussion of a poll or its meaning will involve these details. And anyone who’s ever worked at a polling firm in any capacity knows this.

They say, “Polling data show that Mr. Obama’s approval rating is dropping and is below where George W. Bush was in an analogous period in 2001. Rasmussen Reports data shows that Mr. Obama’s net presidential approval rating — which is calculated by subtracting the number who strongly disapprove from the number who strongly approve — is just six, his lowest rating to date”, to which I ask, “Which data?” They haven’t provided any information about the polls, made the full tabs available or behaved as any responsible pollster briefing a client would. We’re basically being told to “trust them” that these data exist and indicate what they report without being able to look and see for ourselves. .To put it bluntly, we don’t know who was selected for the poll or what they were asked. All we have is their word that the polls they cite :

1. exist,
2. ask neutrally worded questions,
3. use a scientifically constructed representative sample of the population,
4. were called evenly
5. and actually say what Schoen and Rasmussen would have us believe that they do.

How many polls are they citing? Are they examining the shifts in a rigourous time-series analysis? We don’t know, because they won’t tell us. We don’t know when these polls were fielded, either, and whether or not these were all different polls with completely different questions asked of entirely different people, or if they were all different waves of the same poll.

Then, in a completely astonishing move, they shift from analyzing these elusive Rasmussen data and begin analyzing elusive Gallup data (they also fail to describe or cite these data in the same way that they do the Rasmussen data.), but fail to describe how they merged and reconciled two completely distinct datasets into one poll for analysis.

To put this into perspective, imagine a scientist writing a paper of his findings by writing about two completely distinct and unrelated experiments without doing any sort of work to show how these two distinct and unrelated experiments are actually neither distinct nor unrelated, and then moving back and forth between these two experiments at will, as portions of each may give him the findings that he wants. Such a scientist would be laughed out of academia, yet when social scientists are doing so, they are given prime journalistic real estate, the opinion pages of The Wall St. Journal.

In a nutshell, this is the problem. Some might suggest that Schoen and Rasmussen have engaged in such methodological sloppiness that one can only imagine that they were looking for anything that they could find to try and prop up a controversial thesis. Their seemingly magpie approach to data analysis would likely embarrass any entry level assistant analyst at any polling firm — but because most people are unfamiliar with the mechanics of polling, it has earned them headlines on the opinion pages of WSJ.