In June we discussed 2 flavors of calibration:
- Poststratification: Calibrate our estimates of means E[Y] to population data about another variable X.
- Intercept Correction: Calibrate our estimates of regressions E[Y|X] to aggregate data about E[Y].
Let’s focus on the 2nd. This is called the “Logit Shift” in Rosenman et al. 2023, “Intercept Correction” in Ghitza and Gelman 2020, “simple adjustment” on p.769 of Ghitza and Gelman 2013, and “calibration to election results” in Kuriwaki et al. 2024.
Rosenman et al. 2023 have a footnote:
Throughout this note, we focus on a binary outcome for simplicity. The same logic applies in cases of multinomial outcomes.
But something funny can happen with at least J = 3 outcomes !
Start with an N by 3 table of initial predictions for each person’s predicted probability of being White, Black, or Other. Suppose we know the correct aggregate data for this population should be P[White] = 0.4, P[Black] = 0.2, P[Other] = 0.4. So we want the column marginals to satisfy this. We also need to rows to continue to sum to 1.
As Evan Rosenman points out, we can update our predictions using Iterative Proportional Fitting (IPF), also known as “raking” in survey statistics:
Here the “u” are the known aggregates and “v” are 1.
Homework 1: take exp(log())s and combine steps to get that our updates look like:
updated m_ij = exp(log(m_ij) + shift_j) / sum_k exp(log(m_ik) + shift_k)
Homework 2: now show how if J = 2, this simplifies to:
updated m_i1 = logit^-1(logit(m_i1) + shift_1 - shift_2)
For J = 2, this is monotone, so the updated probabilities preserve rank order. In other words, if person A has a higher probability of being white than person B, this stays true after the shifts. However, this simplification cannot be done for J >= 3. And the updated probabilities depend not only on the shift for that race, but the denominators that sum across all races. So this can flip the rank order.
Here’s a little example:
import numpy as np np.random.seed(123) alpha = [10, 2, 5] init_predictions = np.random.dirichlet(alpha, size=10000) print("initial aggregates:", init_predictions.mean(0)) # One IPF iteration targets = np.array([0.4, 0.2, 0.4]) preds = init_predictions * targets / init_predictions.mean(0, keepdims=True) preds /= preds.sum(1, keepdims=True) # Show two rows before/after for r in [0, 23]: print("initial predictions for r =", r, ":", init_predictions[r]) print("adjusted predictions for r =", r, ":", preds[r])
Homework 3: are you bothered by this ? why or why not ?