Technical writing
Census LEHD: The Longitudinal Employer-Household Dynamics Program and What It Reveals About the American Workforce
The Census Bureau's Longitudinal Employer-Household Dynamics program is the closest thing the United States has to a complete administrative record of who works where, for how much, and how that changes over time. Built from Unemployment Insurance wage records that cover roughly 95 percent of private employment, LEHD links individual workers to their employers and then to their home addresses — producing a family of public data products that no survey could replicate at any affordable cost.
What LEHD Is and Where the Data Comes From
LEHD stands for Longitudinal Employer-Household Dynamics, a research program and data infrastructure housed in the Census Bureau's Center for Economic Studies. The program was established in the late 1990s through partnerships with state workforce agencies. Every quarter, participating states transmit their Unemployment Insurance wage records to the Census Bureau. UI wage records are administrative filings: every covered employer reports each employee's Social Security Number and gross earnings to the state as part of its UI tax obligation. Because UI coverage is broad — nearly all private-sector employers and most public employers must participate — the resulting file contains earnings records for the overwhelming majority of American workers.
The Census Bureau then links those UI records to two additional data sources. Employer records from the Quarterly Census of Employment and Wages connect each UI account to an establishment with a location, industry classification, and payroll total. Household records from the decennial Census, the American Community Survey, and other administrative sources such as Social Security and IRS tax files supply demographic attributes — age, sex, race and ethnicity, educational attainment, and place of residence. The result of that three-way linkage is the LEHD Infrastructure File, an internal longitudinal database that tracks individual workers across employers over time. The public never sees the micro-records directly; instead the Census Bureau produces several aggregated data products from the infrastructure file, each designed to reveal a different dimension of labor market dynamics while preserving the statistical confidentiality of individual workers and firms.
LEHD is often confused with the QCEW because both programs trace their origin to UI administrative records. The distinction is fundamental. QCEW processes UI account records at the establishment level: it reports total employment headcounts and total payroll aggregated by establishment, county, and industry for each quarter. LEHD goes one layer deeper and processes the individual worker earnings records that underlie those establishment aggregates. Because LEHD knows which Social Security Numbers appear on which employer's quarterly filing, it can track a worker moving from one employer to another, measure how many employees were hired versus how many were already in place, and connect worker demographics to employer characteristics in ways that aggregate payroll data cannot support.
Quarterly Workforce Indicators
The Quarterly Workforce Indicators are the core public product of the LEHD program. Published through the LEHD Explorer web application and the Census API, QWI reports quarterly employment counts, total payroll, and a set of labor market flow statistics — hires, separations, job creation, and job destruction — tabulated along four worker-characteristic dimensions simultaneously: state and county geography, NAICS industry, age group, sex, educational attainment, and race and ethnicity.
The flow statistics are what distinguish QWI from every other federal employment series. Employment counts alone — the kind QCEW and the Current Employment Statistics both provide — describe a stock: how many people held jobs at a point in time. QWI measures the transactions that drive that stock. A county can maintain stable employment for a full year while simultaneously churning through workers at a high rate; the employment count looks flat while hires and separations tell a story of labor market volatility that matters for workforce planning, union organizing, and economic development analysis. Job creation and job destruction measure the employer side of the same transactions: how many positions were added at expanding establishments versus eliminated at contracting ones.
The four-dimensional cross-tabulation is analytically powerful in a way that is easy to underestimate. Because QWI simultaneously classifies workers by age group, sex, education, and race or ethnicity, users can retrieve figures like new hires in the construction industry in a specific county broken out by age group and sex — a cross-tabulation that no establishment survey produces. The Bureau of Labor Statistics' Current Employment Statistics provides monthly employment by industry but carries no worker demographic detail at all. The Current Population Survey provides demographic detail but only at the national and large-state level. QWI combines geography, industry, and worker characteristics in a single consistent series, updated quarterly.
QWI data carry a publication lag of roughly twelve months. The most recent quarter available at any given time is typically about a year behind the current date, because states transmit UI records to the Census Bureau with their own reporting lags and the Census Bureau requires time for record linkage, quality review, and noise infusion before release. For longitudinal trend analysis over multiple years this lag is unimportant; for tracking the current quarter it is a significant limitation.
The QWI API lives at api.census.gov/data/timeseries/qwi and supports standard Census API query conventions. The LEHD Explorer atledextract.ces.census.gov provides a point-and-click interface for generating time-series downloads without writing API queries. Both surfaces expose the same underlying QWI tabulations.
LODES: Origin-Destination Employment Statistics
The LEHD Origin-Destination Employment Statistics dataset is the most spatially granular federal labor market dataset available to the public. LODES is built from the same UI wage records and address linkages as QWI, but its output is a set of census-block-level files rather than quarterly flow statistics.
LODES is published in three file types. The Workplace Area Characteristics file counts workers at each work census block, broken out by earnings tier, age group, and industry sector. The Residence Area Characteristics file counts workers at each home census block with the same breakdowns. The Origin-Destination file pairs each home census block to each work census block for the workers who live there and are employed there, recording a count of job-holding pairs along with characteristics like earnings tier. A single state's OD file can contain tens of millions of block-pair records, because even a moderately dense labor market has many possible home-to-work combinations.
The practical applications of LODES are extensive. Transit planners use the OD matrix to estimate how many workers in a proposed bus rapid transit corridor actually travel in the direction the route serves, down to the census block. Economic developers use the WAC file to measure job concentration in specific industrial parks or opportunity zones. Commuting researchers compare OD matrices across years to measure how remote work reshaped travel patterns between 2019 and 2021 — the dramatic reduction in long-commute job-to-block flows that appeared in the 2021 LODES data relative to 2019 is among the most direct quantitative evidence of pandemic-era work-from-home adoption available at fine geographic scale.
LODES data begin in 2002 and are published annually with roughly an eighteen-month lag. The Census Bureau's OnTheMap application at onthemap.census.govprovides an interactive visualization layer over the LODES OD files, allowing users to draw arbitrary geographic areas and retrieve inflow, outflow, and internal commute statistics by industry, earnings tier, and worker age. OnTheMap also supports commute distance analysis and can generate workforce catchment area reports for a defined employment site. The underlying flat files are available for bulk download atlehd.ces.census.gov/data/lodes/, with separate files for each state and year.
Job-to-Job Flows
The Job-to-Job Flows data product tracks the most consequential labor market event that neither surveys nor establishment records capture well: the moment a worker leaves one employer and begins work at another. Because LEHD links workers across employers over time using Social Security Numbers, it can observe when a worker who was employed at firm A in quarter T appears for the first time at firm B in quarter T+1 without an intervening quarter of non-employment. That pattern — employer A to employer B with no gap — defines a job-to-job transition in the LEHD framework.
The J2J data product aggregates those transitions into rates and characteristics at the industry-geography level. Core measures include the job-to-job hire rate (what share of workers at a destination employer in a given quarter came directly from another employer rather than from non-employment), the job-to-job separation rate (what share of workers leaving an origin employer in a given quarter moved directly to another employer), and the distribution of origin and destination industries for workers who changed jobs. The product also reports the earnings change associated with employer-to-employer transitions, which is where J2J produces its most striking finding.
Workers who make direct employer-to-employer transitions — job switchers — consistently earn substantially more in the destination quarter than they earned in their origin quarter. Across most years and industries the average earnings gain from a job-to-job transition is in the range of seven to ten percent above what the same worker was earning before switching, compared to a gain of roughly one to three percent for workers who stayed at the same employer over the same period. This earnings premium on mobility has been documented in academic research using LEHD micro-data and is now visible in aggregate form in the public J2J statistics.
The J2J framework also enables a distinction that administrative data handles better than surveys: separating voluntary quits from involuntary layoffs. A worker who appears at a new employer in the very next quarter almost certainly left the prior employer voluntarily or at minimum had another job lined up. A worker who disappears from employment for one or more quarters before reappearing is more likely to have experienced an involuntary separation or a period of job search. J2J rates therefore serve as a proxy for labor market tightness from the worker side — when job-switching rates are high, workers are confident enough in alternative opportunities to move. The elevated J2J rates observed during 2021 and 2022 corroborate survey evidence about voluntary quits during that period and provide industry and geography detail that survey-based quit rates cannot match.
LEHD and Business Dynamics
LEHD provides the empirical foundation for one of the most influential findings in modern labor economics: that young firms, not large firms, are the primary engines of net job creation in the United States. The work of economists John Haltiwanger, Ron Jarmin, and Javier Miranda, conducted using LEHD infrastructure files and related Census longitudinal business data, overturned a decade of research that had credited small businesses with most job creation. The corrected finding is that firm age matters far more than firm size: new businesses in their first few years of operation create jobs at very high rates, but they also fail at high rates. The net job creation attributed to “small businesses” in earlier studies was almost entirely concentrated in young small businesses — startups.
LEHD data allow researchers to observe startup job creation at the county level and connect it to local economic conditions, venture capital availability, and industry mix. The geographic concentration of startup activity is visible in QWI job creation statistics: counties with dense venture-backed technology ecosystems in the San Francisco Bay Area, Seattle, Austin, and Boston show dramatically higher ratios of new-firm job creation to incumbent-firm job creation than comparable counties without that capital infrastructure. QWI's distinction between jobs created by establishment openings versus jobs created by expansion at continuing establishments is the key variable that makes this analysis possible at the county level.
High-growth firms — those that roughly double employment over a three-year window — contribute disproportionately to gross job creation even though they represent a small share of all businesses. LEHD-based research has found that a relatively small number of high-growth establishments account for the majority of net employment gains in most local economies, while the median establishment experiences little or no growth in any given year. This skewness in the growth distribution is one reason aggregate employment counts can look stable while the underlying churn of job creation and destruction is substantial.
Race, Ethnicity, and Wage Structure in LEHD
The QWI breakdowns by race and ethnicity are among the most analytically distinctive features of the LEHD data system. No other federal data source produces quarterly earnings and employment flows simultaneously by race or ethnicity, industry, and county. The BLS Occupational Employment and Wage Statistics survey provides occupational wage detail but does not break out figures by race or county simultaneously. The Current Population Survey provides race-by-occupation earnings but only at broad geographic levels. QWI allows a researcher to examine, for example, whether the earnings gap between Black and white workers in a specific metropolitan area's manufacturing sector narrowed or widened between 2015 and 2023.
The education dimension of QWI adds further analytical depth. The intersection of educational attainment, race or ethnicity, industry, and geography creates a framework for studying how returns to education vary across demographic groups and local labor markets. Research using LEHD has found that the earnings premium on a college degree varies considerably by geography: workers in labor markets with high concentrations of professional services industries see larger wage gaps between degree-holders and non-degree-holders than workers in labor markets dominated by manufacturing or logistics, where wages compress across education levels because union bargaining or industry wage norms dominate individual worker characteristics.
The longitudinal dimension of LEHD — the ability to follow the same Social Security Number across employer records over many years — supports cohort analysis that cross- sectional surveys cannot replicate. Researchers using LEHD infrastructure have tracked immigrant workers' earnings trajectories from first appearance in U.S. wage records through subsequent years, measuring how quickly earnings converge toward native-born levels and how that convergence rate varies by country of origin, industry of entry, and local labor market conditions. Similar cohort tracking has been applied to workers who entered the labor market during recessions, documenting the persistent earnings penalties associated with graduating into a weak job market.
COVID Labor Market Disruption Visible in LEHD
The pandemic-era disruption to American labor markets is unusually legible in LEHD products because LEHD captures flows, not just stocks. When millions of workers were separated from employment in March and April of 2020, QWI separation rates spiked dramatically — and because J2J data can distinguish employer-to-employer moves from employer-to-non-employment exits, the data are clear that these were not voluntary job switches but involuntary separations with no immediate destination employer.
The recovery from that shock produced equally distinctive patterns. Hire rates in leisure and hospitality surged in 2021 as vaccination rates rose and restrictions lifted, while hire rates in professional services remained elevated well into 2022, reflecting both genuine demand for workers and unusually high voluntary separation rates as workers tested their bargaining power in a tight labor market. The job-to-job transition rates for younger workers — those in the 25–34 age bracket visible in QWI's age group breakdowns — were especially elevated during 2021 and 2022, consistent with survey evidence that younger workers were disproportionately likely to voluntarily quit in search of higher pay or better conditions.
The LODES OD data for 2021 compared to 2019 provides the most direct quantitative record of the geographic reshaping produced by remote work. In 2019 LODES, the largest single-county OD flows in major metropolitan areas were from high-density residential counties to central business district counties — the classic suburb-to-downtown commute pattern. By 2021 those flows had contracted substantially, with a corresponding increase in within-county pairs where the home block and work block were in the same county. The practical interpretation is that workers who had previously commuted to central business districts were instead being credited to a work location in their home county, consistent with working from home or from a local satellite office rather than a downtown headquarters.
The sector-level hiring and separation flows during the 2021–2022 recovery period also illustrate LEHD's advantage over monthly CES estimates. CES provides net employment change each month by industry but cannot separate how much of that change came from new hires versus retained workers, or how many of the new hires came directly from competing employers versus from non-employment. QWI decomposed those flows by industry and age group, showing that the construction sector's strong post-pandemic employment growth was driven heavily by workers transitioning from other goods-producing industries rather than by new entrants to the labor force — a pattern that would have been invisible in net employment numbers alone.
OnTheMap and LEHD Explorer
The Census Bureau maintains two interactive tools that make LEHD products accessible without writing API queries or downloading large flat files. OnTheMap atonthemap.census.gov is a mapping interface that draws on LODES OD data. Users can define an area of interest by selecting a census geography or drawing a custom polygon, then retrieve inbound and outbound worker flows with filters for industry, earnings tier, age group, and year. The tool generates tabular and map outputs showing where workers employed in the defined area live and where residents of the defined area work. For site selection, transit planning, or retail trade area analysis, OnTheMap provides answers that previously required purchasing expensive proprietary commuting data.
LEHD Explorer at ledextract.ces.census.gov provides a form-based interface for extracting QWI time-series data. Users select a state, geography level, industry grouping, worker characteristic dimensions, and date range, then download a CSV or view a chart. The Explorer is most useful for analysts who need a multi-year QWI series without learning the API parameter structure. For programmatic access to QWI at scale, the API endpoint at api.census.gov/data/timeseries/qwi is more efficient; it follows the same query pattern as other Census APIs and supports filtering by geography, industry code, age group, sex, educational attainment, and race or ethnicity in a single request.
How LEHD Compares to Other Federal Labor Data
LEHD occupies a unique position among federal labor market data sources. Its nearest relative in terms of input data is the QCEW: both begin with UI administrative records transmitted from state workforce agencies to the federal government. But QCEW processes those records at the establishment level, producing aggregate payroll and headcount statistics for every covered business location. LEHD processes the individual worker earnings records that underlie those establishment aggregates, enabling the individual linkage that makes demographic breakdowns and longitudinal tracking possible.
The Current Employment Statistics program produces the monthly employment numbers that appear in most news coverage of the labor market. CES is a survey of approximately 120,000 business establishments, providing estimates of total payroll employment and average hourly earnings for major industry sectors, updated monthly with only a three-week lag. CES is timely and widely understood. It carries no worker demographic information, covers only about one-third of all establishments via its sample frame, and measures net employment change rather than gross flows.
The American Community Survey provides labor force status, occupation, industry, earnings, commute mode, and many other labor market variables at fine geographic resolution. But ACS is a survey conducted by mailing questionnaires to residential addresses, with all the limitations of survey methodology: sample error, recall bias, and a one-year publication lag for the annual estimates or a five-year pooled period for tract-level data. LEHD is an administrative record system and is not subject to survey nonresponse or recall error within the domains it covers, though it does not cover self-employed workers or those working off the books.
The practical summary is that LEHD products are best suited to applications requiring the combination of worker demographics with employer geography and labor market flows at sub-state geographic resolution. No other federal data source provides that combination. The tradeoffs are the publication lag — roughly twelve months for QWI and eighteen months for LODES — and the exclusion of self-employment, the informal economy, and workers whose employers are not required to file UI wage records.
Accessing LEHD Data
The main LEHD portal at lehd.ces.census.gov links to all three public data products. QWI data are accessible through the Census API atapi.census.gov/data/timeseries/qwi with separate endpoints for the seasonally adjusted (/sa) and not-seasonally-adjusted (/rh) series. The API requires a free key obtainable at api.census.gov/data/key_signup.html. LODES flat files are available for direct download atlehd.ces.census.gov/data/lodes/, organized by state and year, with separate files for OD, WAC, and RAC. J2J data are available atapi.census.gov/data/timeseries/qwi/j2j using the same API conventions as the QWI endpoint.
Researchers needing access to LEHD micro-data rather than the public aggregate products can apply for access through the Federal Statistical Research Data Center network. FSRDC locations at universities and federal facilities provide secure access to the LEHD Infrastructure Files under data use agreements that prohibit disclosure of individual worker or firm records. Published research using LEHD micro-data must pass Census Bureau disclosure avoidance review before publication. The public aggregate products — QWI, LODES, and J2J — have already passed through disclosure avoidance procedures including noise infusion, which adds small controlled perturbations to cell counts to prevent identification of individuals, and suppression of cells with very small counts.
The following Python script retrieves county-level construction employment for workers aged 25–34 for three benchmark years — 2019, 2022, and 2024 — and identifies which counties showed the strongest recovery in young construction employment. The script uses Texas as the example state but accepts any state FIPS code.
import requests
import pandas as pd
# Pull county-level employment for construction workers age 25-34
# from the Census QWI API, comparing 2019, 2022, and 2024.
# QWI endpoint: api.census.gov/data/timeseries/qwi/sa
# Variables used:
# Emp = employment count (end of quarter)
# agegrp= age group (A03 = 25-34 per QWI age classification)
# ind = NAICS 2-digit sector (23 = Construction)
# geo_level = county (within a single state)
API_KEY = "YOUR_CENSUS_API_KEY" # register at api.census.gov/data/key_signup.html
BASE = "https://api.census.gov/data/timeseries/qwi/sa"
# We query Q4 of each year so the employment snapshot reflects
# end-of-year headcounts; Q4 = quarter 4.
TARGET_QUARTERS = [
("2019", "4"),
("2022", "4"),
("2024", "4"),
]
# Texas (FIPS 48) as the example state — swap for any state FIPS
STATE_FIPS = "48"
frames = []
for year, quarter in TARGET_QUARTERS:
params = {
"get": "Emp,county,year,quarter",
"for": "county:*",
"in": "state:" + STATE_FIPS,
"agegrp": "A03", # age group 25-34
"ind": "23", # NAICS 23 = Construction
"year": year,
"quarter": quarter,
"key": API_KEY,
}
resp = requests.get(BASE, params=params, timeout=60)
resp.raise_for_status()
raw = resp.json()
cols = raw[0]
rows = raw[1:]
df = pd.DataFrame(rows, columns=cols)
df["Emp"] = pd.to_numeric(df["Emp"], errors="coerce")
frames.append(df)
# Combine all three years
all_data = pd.concat(frames, ignore_index=True)
# Pivot to wide format: one row per county, columns for each year
pivot = all_data.pivot_table(
index="county",
columns="year",
values="Emp",
aggfunc="first",
)
pivot.columns.name = None
pivot = pivot.reset_index()
# Rename year columns for clarity
rename_map = {}
for yr in ["2019", "2022", "2024"]:
if yr in pivot.columns:
rename_map[yr] = "emp_" + yr
pivot = pivot.rename(columns=rename_map)
# Require data in all three years to measure recovery
pivot = pivot.dropna(subset=["emp_2019", "emp_2022", "emp_2024"])
# Recovery ratio: 2024 employment relative to 2019 baseline
pivot["recovery_2024"] = (pivot["emp_2024"] / pivot["emp_2019"]).round(3)
# Absolute gain from 2019 to 2024
pivot["gain_2019_2024"] = pivot["emp_2024"] - pivot["emp_2019"]
# Sort by absolute gain descending to find counties with strongest recovery
top_counties = (
pivot.sort_values("gain_2019_2024", ascending=False)
.head(15)
.reset_index(drop=True)
)
print("Top 15 Texas counties by growth in construction employment (age 25-34), 2019 to 2024:")
print(top_counties[["county", "emp_2019", "emp_2022", "emp_2024", "recovery_2024", "gain_2019_2024"]].to_string(index=False))
# Statewide summary
total_2019 = pivot["emp_2019"].sum()
total_2024 = pivot["emp_2024"].sum()
state_recovery = round(total_2024 / total_2019, 3)
print("")
print("Statewide young-construction employment 2019: " + str(int(total_2019)))
print("Statewide young-construction employment 2024: " + str(int(total_2024)))
print("Recovery ratio (2024 / 2019): " + str(state_recovery))
The script pivots the three-year query into a wide table and computes a recovery ratio (2024 employment divided by 2019 employment) alongside the absolute employment gain. Counties where the ratio exceeds 1.0 have fully recovered young construction employment relative to the pre-pandemic baseline; ratios below 1.0 indicate persistent shortfalls. The absolute gain column ranks counties by the magnitude of workforce addition, which is more relevant for infrastructure planning than the ratio alone in high-growth areas where both numerator and denominator have changed substantially.
What LEHD Cannot Tell You
LEHD's coverage of 95 percent of private employment leaves meaningful gaps. Self- employed workers — independent contractors, sole proprietors, gig economy participants working outside platform payroll arrangements — do not appear in UI wage records and therefore do not appear in LEHD. The informal economy is entirely excluded. Agricultural workers in states that do not extend UI coverage to farm labor are largely absent. Federal civilian employees and uniformed military personnel are covered by separate data systems and are only partially integrated into LEHD depending on the state and year.
The twelve-to-eighteen-month publication lag makes LEHD unsuitable for tracking current conditions. Analysts who need a read on last month's labor market must use CES, JOLTS, or state UI claims data. LEHD is a retrospective source: it describes what happened with precision after the administrative record-keeping cycle closes, rather than providing a real-time signal.
Finally, the noise infusion applied to public QWI, LODES, and J2J cells introduces measurement error that can be substantial for small geographic areas with thin employment in a specific demographic or industry cell. Users working with county-level data for small rural counties or with highly specific demographic and industry cross-tabulations should inspect the QWI quality flags and margin-of-error fields included in the API response and treat small-cell estimates with appropriate caution.
Within its coverage domain LEHD remains unmatched. The combination of near-universal coverage of formal employment, individual-level longitudinal linkage, employer geography, worker demographics, and both stock and flow measures in a single integrated data system has no equivalent in any other federal statistical program. For labor economists, workforce planners, community developers, and transit agencies that need to understand not just how many people work somewhere but who they are, where they live, and how they move between employers, LEHD is the essential starting point.
Related writing
BLS QCEW: The County-Level Employment and Wages Dataset Behind Every Local Economic Analysis — the establishment-level payroll aggregate that shares LEHD's UI source but differs fundamentally in structure.
Census ACS: The American Community Survey and the Federal Demographic Dataset Behind Every Policy Decision — the survey-based complement to LEHD for household demographics and labor force characteristics.
BLS OEWS: Occupational Employment and Wage Statistics — occupation-level wage benchmarks by industry and geography, the closest BLS analog to LEHD's earnings breakdowns.