How We’re Thinking About Polling the Midterms
By Johannes Fischer
Ahead of the midterms, the Data for Progress team wanted to share a few thoughts on how we’re thinking about this cycle and how we’re conducting our polling. As we near November, we’ll take a look at the political environment as it stands today, offer some of our expectations, and conclude with an overview of how we approach the sampling and weighting process in our polls.
Political Environment
Since 1990, the “party in power,” or the party that holds the White House going into the midterm, has earned an average of 47.9 percent of the contested vote in a midterm election (excluding 2002). Research explains this as a “thermostatic” backlash effect: In democracies globally, voters consistently vote against the party in power during off-year elections.
It’s not just that history doesn’t favor Democrats this cycle: Biden’s approval rating over the winter and summer (it’s rising now) and high economic anxiety about gas prices and inflation are strong headwinds against Democrats. Yet, all at roughly the same time, three major things changed: The Supreme Court struck down Roe v. Wade, U.S. gas prices began to fall, and Democrats received positive media coverage for achievements including the landmark Inflation Reduction Act, CHIPs Act, a gun safety bill, and targeted student debt relief.
Aside from improving gas prices and positive news coverage, the special elections that occurred post-Dobbs were a strong showing for Democrats, with candidates outperforming (and winning) in elections that, based on historical trends, we would normally not consider competitive.
Forecasters, including FiveThirtyEight and the Economist, weigh special elections in their judgment of the cycle, and Democrats seem more optimistic about November than earlier in the cycle. We think the special elections represent an important piece of evidence: They seem to indicate Democratic voter turnout will be higher than in previous midterms.
This is good news. Due to rising education polarization and strong turnout in Democratic areas in the special elections, it’s clear that Democrats are energized over abortion, Biden’s legislative achievements, and the threat of fascist right-wing politicians. On top of this, Biden’s approval appears to be improving, which will likely help Democrats in November.
The bad news is that special elections are extremely low-turnout events, and the final share of the electorate that is represented by voters in special elections is extremely small. Democratic performances in special elections appeared to cluster in counties and precincts where Biden did better in 2020, an indicator of turnout effects. In other words, these wins seem to have been driven by engaged Democrats more than persuaded moderates.
Democrats are likely in a better scenario than they were earlier in the cycle (as illustrated by the Virginia gubernatorial loss). It’s why Data for Progress is closely following how undecided voters nationally are weighing Biden’s approval, threats to abortion rights, and economic concerns against each other.
How We’re Polling
Inspired by the transparency from the New York Times, here is how we are sampling and weighting this cycle.
Sampling
For probability sampling (SMS, live caller, IVR, and mail), we use stratified sampling against a commercial voter file with inverse response propensity selection. Within the geography being sampled, we stratify on urbanicity, modeled education, age, modeled partisanship, modeled or self-reported race, and gender. We then sample using a score generated by intersecting a turnout model and a mode-specific response score, oversampling the voters most likely to vote and least likely to respond.
For non-probability sampling, Data for Progress works with a series of web panel respondent marketplaces to recruit respondents. Web panel survey research requires a delicate balance between disqualifying professional or disingenuous respondents while preserving responses from genuine and low socioeconomic status respondents. To guard against selection biases among respondents, DFP maintains a complex set of quotas (using our weighting targets) and screening questions to ensure each survey is representative of a genuine sample of respondents.
Weighting
As a baseline, DFP monitors providers, respondents, and trends among responses to detect anomalies in respondents, and automatically disqualifies or downweights respondents based on a series of checks, including survey engagement, attention, truthfulness, and completion speed.
We use raking with regularization, implemented in Python, to generate weights for our survey respondents. Raking is a procedure where data points are given a weight so that selected marginal distributions match given targets. Our weighting scheme also requires additional conditions: (a) our weights are as close to uniform as possible, (b) our weights are constrained by upper and lower bounds, and (c) that the weights sum to the survey N (i.e., the average weight is 1).
We generate our weighting targets from the same voter file, census data, and high-quality turnout modeling for a likely voter universe. We produce and use models scored against the voter file to generate both an estimate of votes cast and the marginal distributions across our weighting variables. In addition to voter file features, turnout models are trained on past voter turnout, education models are based on self-reported data and calibrated to census estimates, and race (when not self-reported) is modeled based on a combination of self-reported data, census estimates, and name categorization.
The variables we use in weighting are: age, gender, education, urbanicity, race, vote recall, survey engagement, and joint distributions of those variables.
Johannes Fischer (@pollhannes) is the Lead Survey Methodologist at Data for Progress.