In our post from two weeks ago we started to learn about the Current Employment Statistics (CES) survey by the U.S. Bureau of Labor Statistics (BLS) that produces the monthly jobs report.
We wondered why establishment size is used in the stratification and not the estimation. Commenters suggested contacting the BLS to ask, so I did ! They wrote back quickly with initial resources and a timeline for their full answer, which I got on Friday (August 15, 2025) and will digest with you now.
Steve Mance, CES Branch Chief at the BLS, writes (all bolding is my own):
The CES private sector sample uses a stratified simple random sample design allocated by state, 8 business size classes, and 13 broad industries, with weights assigned as the inverse of the probability of selection. The estimation structure is composed of over 500 cells, defined by detailed NAICS industry (and region in construction). We aggregate estimates developed at this detailed industry level to form the national total private estimate.
This lines up with my understanding: Sample strata are defined by state, size, and (broad) industry. Estimation cells are defined by (detailed) industry and region.
Government estimates still use a quota design and include regional stratification for estimation in state and local education. State and Metro estimates use the same sample as the National data, but are calculated independently, and use different aggregation structures. In some cases they include explicit nonresponse weight adjustments defined by detailed NAICS within a broad industry.
This was newer to me. Government job counts are based on a quota (not probability) sample. Estimation seems to use only (detailed) industry and region, same as for the private sector.
In most cases CES does not make explicit nonresponse adjustments (either through weight adjustments or imputation) in the National data. Our estimator is equivalent to imputing for non-respondents using the relative employment growth of respondents calculated at the estimation cell level.
As we discussed in the poststratification and imputation posts: we can use respondents to estimate per-cell employments, which are then aggregated using a known distribution across cells. (I have to throw in brackets for Andrew: E[E[Y | X, respondents]] where X define the cells and Y is employment. The outer expectation is over X.)
In rare cases, late or non-response from large sample units that historically exhibit seasonal differences from the rest of the population is accounted for through explicit imputation. If one of these businesses fails to report CES data for the current month, its historical over-the-month change data may be used in the imputation to capture its expected seasonal movement.
If a large establishment doesn’t respond, they impute using its history.
In the question we received, you noted a paper by Ken Copeland from 2003 that explored nonresponse in CES, and asked why we use size in allocation but not in our estimation structure (I’ll also address geography):
-
For design purposes, allocation must include state. CES is a Fed-State cooperative program, and we have to ensure a degree of reliability in estimates for small states. An optimum allocation designed for the National data could have unacceptably small sample sizes for some states. State sample size allocations are agreed upon by the CES Policy Council, which is composed of representatives from BLS and State Workforce Agencies, and typically updated every 5 years. (Within each state, the fixed sample size is reallocated to the industry/size strata annually.)
This is a great example of what s Raphael Nishimura and Andrew said in our poststratification discussion “Often a stratum will be purposely oversampled.”
Steve Mance (BLS) continues:
-
Size class was also an important consideration of our sample redesign, developed in the 1990s and implemented in the early 2000s. Our previous quota design resulted in a sample heavily weighted toward large firms, which was considered a significant potential source of bias, although research done leading up to the redesign did not indicate differences between large and small firms as an important contributor to benchmark revisions.
Benchmark revisions are corrections to CES estimates based on the Quarterly Census of Employment and Wages (QCEW). The QCEW has nearly complete response, so we can use it to assess nonresponse bias in the CES. The BLS didn’t see that establishment size was a big contributor to nonresponse bias, despite the sample having larger firms. It seems that X (size) is associated with R (response) “a sample heavily weighted toward large firms”. But if X isn’t a big contributor to nonresponse bias then does that mean it is not associated with Y (jobs) ? This confuses me.
If X is associated with R but not Y then X is what Kuh et al 2023 Table 1 calls “inconsequential”:
Their table was inspired by Little and Vartivarian 2005 Table 1, which warns that adjusting for such an X can increase variance:
We discussed sparsified MRP as a way to avoid this increase in variance. Indeed, Little and Vartivarian 2005 conclude:
A more sophisticated approach is to apply random-effects models to shrink the weights, with more shrinkage for outcomes that are not strongly related to the covariates (e.g., Elliott and Little 2000). A flexible alternative to this approach is imputation based on prediction models…
Steve Mance (BLS) continues:
-
We have found that stratifying the estimation structure by detailed industry is the most important dimension in maintaining the ignorable nonresponse assumption you noted from Copeland (2003). Within the given broad, allocation industries, we have found a great deal of heterogeneity across—and relative homogeneity within—detailed industries in terms of relative job growth and response propensity. Construction is a notable exception where there is greater geographic heterogeneity, which is accounted for in the estimation cell structure.
This makes sense. They’ve found that detailed industry is most important for reducing bias.
It is certainly plausible that including information such as size and state in weight adjustments or imputation could reduce nonresponse error, and BLS is currently researching more directly accounting for nonresponse in CES using state, size, and other information. That said, previous work incorporating size and other characteristics in nonresponse adjustments, such as Copeland and Valliant (2007), did not yield significant improvements to aggregate revisions in CES.
This is reiterating the point above, the BLS didn’t see that establishment size was a big contributer to nonresponse bias. And they are continuing to research this.