Survey Statistics: Thomas Lumley writes about Interviewing your Laptop

Thomas Lumley wrote a post about substituting LLMs for survey respondents:

Let’s connect it to our discussions.

Let:

X = what Lumley calls a “micro-demographic”
Y = vote choice

Lumley first describes two problems we might see when calibrating via poststratification (see our discussion here):

E[Y|X] != E[Y|X, sample], “recruits one member of each micro-demographic …the people might not be as representative as you wanted.”
p(X) unknown “how many high-metabolism world dominators are there in the current US voting population”

Lumley then describes these two problems when interviewing LLMs:

E[Y|X] != E_LLM[Y|X], “If political opinion experts don’t know…it’s hard to be confident that AI will”
p(X) unknown

So problem 2 remains the same. And problem 1 is similar: if we can’t estimate the mean of Y conditional on X, poststratification isn’t guaranteed to give us a good estimate of the mean of Y.

In the comments, Joe Paxton and Lumley discuss empirical justifications. Has anyone seen any compelling ones here ?

p.s. I use square brackets for expectations for consistency with my past posts. E(Y|X) is equally valid notation.

Survey Statistics: Thomas Lumley writes about Interviewing your Laptop

Related Posts

Podcast: The Annual Stanford AI Index Reveals a Fast-Changing Industry with Enormous Business and Social Impact

How to Develop Powerful Internal LLM Benchmarks

Leave a Reply Cancel reply