Home » Survey Statistics: Thomas Lumley writes about Interviewing your Laptop

Survey Statistics: Thomas Lumley writes about Interviewing your Laptop

Thomas Lumley wrote a post about substituting LLMs for survey respondents:

Let’s connect it to our discussions.

Let:

  • X = what Lumley calls a “micro-demographic”
  • Y = vote choice

Lumley first describes two problems we might see when calibrating via poststratification (see our discussion here):

  1. E[Y|X] != E[Y|X, sample], “recruits one member of each micro-demographic …the people might not be as representative as you wanted.”
  2. p(X) unknown “how many high-metabolism world dominators are there in the current US voting population”

Lumley then describes these two problems when interviewing LLMs:

  1. E[Y|X] != E_LLM[Y|X], “If political opinion experts don’t know…it’s hard to be confident that AI will”
  2. p(X) unknown

So problem 2 remains the same. And problem 1 is similar: if we can’t estimate the mean of Y conditional on X, poststratification isn’t guaranteed to give us a good estimate of the mean of Y.

In the comments, Joe Paxton and Lumley discuss empirical justifications. Has anyone seen any compelling ones here ?

p.s. I use square brackets for expectations for consistency with my past posts. E(Y|X) is equally valid notation.

Related Posts

Leave a Reply

Your email address will not be published. Required fields are marked *