Home » Survey Statistics: a new paradigm

Survey Statistics: a new paradigm

Over the next posts, let’s dig into Michael Bailey‘s A New Paradigm for Polling and the commentary it got from other wonderful survey statisticians. In this post, let’s get introduced to the response instrument, and see what these folks have to say about it.

Bailey begins:

the polling field needs to move to a more general paradigm built around the Meng (2018) equation that characterizes survey error for any sampling approach, including nonrandom samples.

We discussed Meng 2018 “Statistical Paradises and Paradoxes” in this blog series here.

Most of what we’ve discussed so far (e.g. poststratification, weighting) assumes that within groups based on covariates X, response R is independent of outcome Y. Random sampling (within X) achieves this.

In Section 4, Bailey looks at some methods for when R might depend on Y (within X), a “nonrandom” paradigm. These methods rely on a response instrument Z: a variable that affects the probability of response R but does not directly affect the outcome of interest Y.

For example, let Z = 0 if someone is forced to discuss politics (“high response protocol”), and Z = 1 if they are given the choice to discuss politics or opt out and discuss sports (“low response protocol”). From Bailey 2025 (with my additions in green):

Intuitively, the response instrument helps because we can compare observed Y between low versus high response protocols, which gives information about the dependence between Y and R. How this translates to an estimate of population Y depends on methods and assumptions Bailey doesn’t fully dive into here.

Sharon Lohr’s comments describe a sufficient assumption: no interaction between outcome Y and instrument Z in the model for response R. Confusingly, she uses “X” instead of “Z” for the instrument, so I’ve edited below:

All three models in Figure 1 perfectly fit the data. In Figure 1(a), the high response protocol (Z = 0) is “random” (R does not depend on Y), so our usual methods work. In Figure 1(c), there is no Z*Y interaction in the response model, so the response instrument methods work. In Figure 1(b), it isn’t clear what to do.

Perhaps we should propagate uncertainty about these assumptions to our final results. Rod Little’s comments include a suggestion to “use Bayesian modeling, including a prior distribution for unidentified parameters.” Has anyone seen a good example of Bayesian modeling using response instruments ?

In his comments, Shiro Kuriwaki says that randomized instruments “give us leverage in the face of unobservable confounders, but no leverage comes for free.” He advocates more research into these methods. Agreed !

We’ve been focused on the randomization instrument Z. But we can’t forget about the response R and how its dependence on Y differs with different survey recruitment protocols (e.g. web versus text), see Sharon Lohr’s comments. And conditioning on more covariates X increases the plausibility that Y and R are independent. So instead of pursuing a randomization instrument, we could focus on improved recruitment protocols and enlarging our set of covariates X. Thoughts ?

p.s. Bailey’s post also clarifies why Raphael Nishimura doesn’t like the term “representativeness”, see this blog comment: it’s too vague, and can be used to mean “matches some population demographics”, which may not guarantee much about the outcomes of interest Y.

Related Posts

Leave a Reply

Your email address will not be published. Required fields are marked *