Technical writing

Census Current Population Survey: The Federal Database Behind the Official US Poverty and Unemployment Rates

May 24, 2026· 17 min read· AI Analytics

CensusCPSPovertyUnemploymentFederal Data

Every first Friday of the month, the Bureau of Labor Statistics releases the jobs report — a number that moves financial markets, shapes monetary policy, and frames the political narrative about the economy. The official unemployment rate at the center of that release comes not from employer records but from a monthly household survey of roughly 60,000 housing units conducted jointly by the Census Bureau and BLS since 1940. That survey — the Current Population Survey — also produces the official US poverty rate every September, annual health insurance coverage estimates, labor force participation rates by demographic, and the most comprehensive portrait of who is working, who isn't, and why.

This article covers the CPS survey design and rotation structure, how monthly labor force status is measured and how the unemployment rate is calculated, the full U-1 through U-6 spectrum of labor underutilization measures, the Annual Social and Economic Supplement and the methodology behind official poverty measurement, Mollie Orshansky's original threshold construction and its limitations, the Supplemental Poverty Measure added in 2011, key microdata fields for research use, how CPS compares to the establishment-based Current Employment Statistics and Quarterly Census of Employment and Wages programs, accessing CPS data through IPUMS-CPS, BLS, FRED, and the Census API, and a Python script that queries BLS and Census APIs to compare state unemployment rates, poverty rates, and labor force participation across all fifty states.

Survey design and the 4-8-4 rotation group structure

The Current Population Survey covers approximately 60,000 housing units per month, representing a sample of roughly 110,000 individuals. The sample is drawn from the Master Address File maintained by the Census Bureau, stratified by state and county to support state-level estimates, and weighted to reflect the civilian noninstitutional population — that is, everyone not living in military barracks, group quarters, prisons, long-term care facilities, or active-duty military installations. The survey excludes Alaska and Hawaii from some historical series, though both are included in current national estimates.

Housing units enter the CPS sample on a 4-8-4 rotation schedule: each unit is interviewed for four consecutive months, then rotated out of the sample for eight months, then returned for a final four consecutive months, for sixteen months of total contact spread over a two-year window. At any given month, the active sample consists of eight rotation groups in different phases of their contact schedule. Approximately one-eighth of the sample is replaced each month — adding freshly entering units while retiring units that have completed their final fourth month. This design balances two competing interests: month-to-month continuity (which enables reliable estimates of change) and sample refresh (which prevents conditioning effects from repeated interviewing). The 75% overlap between adjacent monthly samples is the basis for the gross-flow statistics BLS publishes on worker transitions between employment states.

The reference week for each monthly survey is the calendar week that contains the twelfth of the month. Interviewers ask about labor market activity during that specific week. Interviews are conducted in the following week, primarily by telephone using computer-assisted telephone interviewing (CATI), with in-person computer-assisted personal interviewing (CAPI) for households without telephone access or for initial contacts in the first and fifth months of participation. Response rates have been a growing concern: CPS monthly response rates declined from approximately 92% in 2000 to around 70–75% by the early 2020s, requiring increased reliance on nonresponse weighting adjustments.

Labor force status and the unemployment rate calculation

Every civilian noninstitutional person aged 16 and older in a sampled housing unit is classified into one of three mutually exclusive labor force statuses based on their activity during the reference week. Employed persons worked at least one hour for pay or profit during the reference week, or were temporarily absent from a job they held (on vacation, ill, on leave, on strike). Unpaid family workers who work fifteen or more hours per week in a family business are also counted as employed. The one-hour threshold is intentionally low — it captures gig work, part-time work, and casual labor, which critics argue overstates the effective employment level.

Unemployed persons meet three conditions simultaneously: they were without work during the reference week (did not work even one hour for pay), they made at least one active job-search effort in the four weeks ending with the reference week (contacted employers, sent applications, visited an employment agency, contacted friends or relatives about jobs, checked union or professional registers, or took other steps to find work), and they were currently available to start work. Availability is defined as being able to accept a job if one were offered during the reference week; persons who would need to arrange childcare or transportation are considered available if they could arrange it within a week. The requirement for active search is the critical gating criterion: persons who want work but have not actively searched in the past four weeks are classified as not in the labor force, not as unemployed.

The labor force consists of employed plus unemployed persons. Thecivilian noninstitutional population is the denominator for the labor force participation rate. The official unemployment rate — designated U-3 by BLS — is the count of unemployed persons divided by the labor force (employed plus unemployed), expressed as a percentage. During April 2020 at the height of the COVID-19 economic shock, the U-3 rate reached 14.7% — the highest recorded since the survey's 1948 monthly series began — before falling rapidly as displaced workers were recalled. The labor force participation rate fell to 60.2% in April 2020, the lowest since the early 1970s.

U-1 through U-6: the full spectrum of labor underutilization

BLS publishes six supplemental labor underutilization measures, designated U-1 through U-6, which capture progressively broader definitions of labor market distress. The official unemployment rate is U-3. Understanding the full spectrum is essential for interpreting labor market conditions, particularly during recessions when involuntary part-time work and discouraged workers move independently of the headline rate.

Measure	Definition
U-1	Persons unemployed 15 weeks or longer as a share of the labor force
U-2	Job losers and persons who completed temporary jobs as a share of the labor force
U-3	Official unemployment rate: total unemployed as a share of the civilian labor force
U-4	U-3 plus discouraged workers (want work but stopped searching because they believe no jobs are available) as a share of the labor force plus discouraged workers
U-5	U-4 plus all marginally attached workers (want work, available, searched in past 12 months but not in past 4 weeks) as a share of the labor force plus all marginally attached workers
U-6	U-5 plus persons employed part-time for economic reasons (working part-time because full-time work unavailable or economic conditions reduced their hours) as a share of the labor force plus marginally attached workers

Discouraged workers are a subset of marginally attached workers: they want work and are available but have not searched in the past four weeks because they believe no jobs are available for them specifically — citing discrimination, lack of qualifications, or no jobs in their area as reasons. Marginally attached workers more broadly include persons who stopped searching for any reason, not just discouragement. Persons employed part-time for economic reasons — sometimes called involuntary part-time workers — are fully counted as employed in U-3 but surface in U-6. The gap between U-3 and U-6 widens substantially in recessions: during April 2020, U-6 peaked at 22.9% while U-3 peaked at 14.7%, a gap of more than 8 percentage points driven primarily by involuntary part-time work.

The Annual Social and Economic Supplement and official poverty measurement

Each year, typically in February through April, the Census Bureau augments the regular monthly CPS with the Annual Social and Economic Supplement — commonly called the ASEC or the March supplement, though sample collection now spans three months. The ASEC expands the sample to approximately 100,000 households and adds an extensive battery of questions covering income from all sources during the prior calendar year, health insurance coverage, poverty status, program participation (SNAP, Medicaid, housing assistance, Social Security, veterans benefits), and demographic characteristics. The ASEC is the primary source for the official US poverty rate, the Census Bureau's income distribution statistics, and annual health insurance coverage estimates.

The official poverty measure uses a set of 48 poverty thresholds developed by Mollie Orshansky at the Social Security Administration in 1963 and 1964. Orshansky constructed the thresholds from the Department of Agriculture's Economy Food Plan — the cheapest of four nutritionally adequate food plans — and a multiplier of three, reflecting a 1955 USDA food consumption survey finding that families of three or more spent approximately one-third of their after-tax income on food. The 48 thresholds vary by family size (one through nine or more persons), by the number of related children under 18, and for families with one or two adults aged 65 or older. The threshold for a family of four was approximately $30,900 in 2023; for a single person under 65 it was approximately $15,500.

A family or individual is classified as poor if their total pre-tax cash income — wages and salaries, self-employment income, Social Security, pension income, interest and dividends, rental income, and cash public assistance — falls below the applicable threshold. The measure has been updated annually for inflation using the Consumer Price Index for All Urban Consumers (CPI-U) since 1969. The official poverty rate in 2023 was approximately 11.1%, representing roughly 36 million people. Child poverty was approximately 15.3%; poverty among persons 65 and older was approximately 10.7%.

The Orshansky thresholds have been widely criticized since at least the 1970s. The core critiques are four: first, food spending as a share of family budgets has fallen sharply since 1955 (to roughly one-seventh rather than one-third), making the three-times multiplier outdated; second, the thresholds do not vary by geographic cost of living, treating a family in rural Mississippi identically to one in Manhattan; third, the income measure excludes non-cash benefits (SNAP, Medicaid, housing assistance, the Earned Income Tax Credit) that substantially supplement low-income family resources; fourth, the thresholds do not account for taxes paid or work-related expenses such as childcare and transportation. A 1995 National Academy of Sciences panel report authored by Constance Citro and Robert Michael recommended a substantially revised poverty measure addressing all four critiques, which eventually became the foundation for the Supplemental Poverty Measure.

The Supplemental Poverty Measure

The Census Bureau introduced the Supplemental Poverty Measure (SPM) in 2011 as an alternative to the official measure, following the NAS panel's 1995 recommendations and subsequent interagency development work. The SPM is published alongside the official measure each September in a separate report; it does not replace the official measure, which remains the legal standard for federal program eligibility thresholds.

The SPM differs from the official measure in four principal respects. First, the SPM threshold is based on out-of-pocket expenditures on food, clothing, shelter, and utilities by families near the 33rd percentile of the expenditure distribution, adjusted to a national standard and updated annually using a five-year moving average of Consumer Expenditure Survey data. Second, the SPM resource measure is comprehensive: it adds the value of non-cash benefits (SNAP, the National School Lunch Program, housing subsidies, Low Income Home Energy Assistance) and the EITC and Child Tax Credit to cash income, and subtracts federal and state income taxes, payroll taxes, work-related expenses (transportation, childcare), and out-of-pocket medical costs. Third, the SPM thresholds vary by housing tenure (owner with mortgage, owner without mortgage, renter) to capture differences in shelter costs. Fourth, the SPM unit of analysis is the “SPM unit” — cohabitating individuals who share resources — which can differ from the Census family definition.

The SPM typically produces a lower overall poverty rate than the official measure for working-age adults and children — primarily because SNAP, housing subsidies, and the EITC are substantial income supplements not counted in the official measure. The SPM produces higher poverty rates for the elderly, however, because out-of-pocket medical costs for seniors are large relative to their cash incomes and the SPM subtracts those costs from resources. This divergence is analytically significant: it means that Social Security's anti-poverty effectiveness is somewhat understated by the official measure (because Medicare value is not counted in SPM resources either), while Medicare's coverage gaps are understated by the official measure (because out-of-pocket costs are not subtracted).

Key CPS microdata fields

CPS microdata — the underlying person- and household-level records behind the published statistics — is available in raw form at census.gov and in harmonized form through IPUMS-CPS. Research use of CPS microdata requires understanding the principal variable names used in both raw Census files and IPUMS-harmonized extracts. The following table covers the core variables for labor force and poverty analysis.

Variable	Source	Contents
PWSSWGT	Basic monthly	Person-level final weight; multiply counts by this weight to get population estimates
PRTAGE	Basic monthly	Age in years (topcoded at 80 in some years)
PESEX	Basic monthly	Sex: 1 = male, 2 = female
PRDTRACE	Basic monthly	Detailed race recode; 1 = White only, 2 = Black only, 3 = American Indian/Alaska Native only, 4 = Asian only, 5 = Hawaiian/Pacific Islander only, 6+ = multiracial combinations
PEHSPNON	Basic monthly	Hispanic origin: 1 = Hispanic, 2 = non-Hispanic (separate from race per OMB standards)
PEEDUCA	Basic monthly	Educational attainment recode; 31–46 scale from less than first grade through doctoral degree
PEMLR	Basic monthly	Monthly labor force recode: 1 = employed at work, 2 = employed absent, 3 = unemployed at work (with job), 4 = unemployed looking, 5 = not in labor force — want job, 6 = not in labor force — do not want job, 7 = retired
PRUNTYPE	Basic monthly	Reason for unemployment: job loser, job leaver, reentrant, new entrant
PRERNWA	Outgoing rotation	Usual weekly earnings (collected only in months 4 and 8 of rotation — outgoing rotation groups)
OFFPOV	ASEC	Official poverty indicator: 1 = below poverty threshold, 2 = at or above threshold
POVLL	ASEC	Poverty level ratio: person's family income as a percentage of their applicable poverty threshold (e.g., 50 = income at 50% of threshold; 200 = income at double the threshold)
PRCITSHP	Basic monthly	Citizenship status: 1 = native born in US, 2 = born in US territory, 3 = born abroad to US parents, 4 = naturalized citizen, 5 = not a citizen

The outgoing rotation group variables — PRERNWA and related earnings fields — are collected only from households in their fourth or eighth interview month. This design means that cross-sectional earnings analysis can use only a quarter of the monthly sample. BLS uses the outgoing rotation group data to produce the monthly median weekly earnings estimates. ASEC variables like OFFPOV and POVLL are available only in the March supplement records and are not present in basic monthly CPS extracts.

IPUMS-CPS and data access

IPUMS-CPS, maintained by the Institute for Social Research and Data Innovation at the University of Minnesota, provides the most researcher-friendly access to CPS microdata. IPUMS harmonizes variable coding across survey years going back to 1962 — a critical service because the Census Bureau has changed variable names, coding schemes, and question wording dozens of times over six decades. IPUMS renames variables to consistent names (AGE, SEX, RACE, EMPSTAT, POVERTY, EARNWEEK, WTFINL), codes missing values consistently, provides detailed universe statements for each variable in each year, and supplies sample-specific technical documentation. IPUMS-CPS requires free registration at ipums.org/cps; custom data extracts can be generated through a web interface and downloaded as fixed-width, CSV, or Stata/SAS/SPSS format files, with an R and Stata API for programmatic access.

Raw CPS basic monthly files are available at census.gov/data/datasets/time-series/demo/cps/cps-basic.html. These are fixed-width text files with a separate data dictionary for each survey month; the variable positions and lengths change across years. BLS publishes CPS summary tables covering the current and historical unemployment rate, labor force participation rate, employment-population ratio, duration of unemployment, reason for unemployment, and industry and occupation of employment at bls.gov/cps. FRED (fred.stlouisfed.org) publishes hundreds of CPS-derived series including UNRATE (U-3, seasonally adjusted), U6RATE, CIVPART (labor force participation rate), LNS11000000 (civilian labor force level), and state-level series from BLS Local Area Unemployment Statistics, retrievable via a public API with no registration required.

The Census Bureau's API at api.census.gov provides access to ASEC aggregate tables for poverty rates, income distribution, and health insurance. The timeseries poverty endpoint (api.census.gov/data/timeseries/poverty) returns annual national and state poverty rates. The American Community Survey 1-year and 5-year estimates, also available through api.census.gov, provide poverty rates at the county, census tract, and block group level using a question closely related to the CPS poverty measure but with a larger sample, making ACS the appropriate source for sub-state geographic poverty analysis.

CPS vs. CES and QCEW: residence-based vs. establishment-based measurement

The CPS unemployment rate and the BLS Current Employment Statistics payroll jobs count are both released in the monthly jobs report but measure different things from different populations using fundamentally different methodologies. Understanding the distinction is essential for interpreting divergences between the two.

CPS is residence-based: it counts workers where they live. A person who lives in New Jersey and commutes to New York City is counted in CPS as a New Jersey worker. CPS counts all employment including agricultural workers, domestic workers paid in cash, self-employed persons (including gig economy workers), and unpaid family workers in family businesses. CPS also counts multiple jobholders once, based on their primary job; a person holding two part-time jobs appears in CPS as one employed person.

CES (Current Employment Statistics, also called the payroll survey) is establishment-based: it counts jobs where the establishment is located and derives counts from payroll records, not household interviews. The commuter from New Jersey to New York City is counted as a New York job in CES. CES excludes self-employed workers entirely — no gig workers, no freelancers, no sole proprietors. A person holding two payroll jobs is counted twice. CES counts are based on a monthly sample of approximately 119,000 businesses and government agencies covering roughly 670,000 individual worksites.

The Quarterly Census of Employment and Wages (QCEW) is a near-complete census of employment and wages at covered establishments — those covered by state unemployment insurance, which excludes self-employed, certain small agricultural employers, and some other categories. QCEW is published with a five-month lag but provides a near-census benchmark that BLS uses to revise the CES monthly estimates in its annual benchmark revision each March.

The practical consequence of these methodological differences is that CPS and CES employment counts routinely diverge by hundreds of thousands of jobs and can diverge by millions in economic disruptions. During COVID-19 in March–April 2020, CPS showed a sharper and faster spike in unemployment than CES payroll job losses captured, because CPS counted gig workers, independent contractors, and self-employed persons who lost income but had no employer payroll from which a job loss could be measured. BLS's official monthly report acknowledges this duality explicitly: the headline unemployment rate comes from CPS, while the headline nonfarm payroll employment count comes from CES. Both numbers appear in the same press release and frequently move in different directions in the same month, confusing media coverage that treats them as measuring the same thing.

Health insurance coverage estimates

The ASEC is the primary federal source for annual health insurance coverage estimates. The Census Bureau publishes the uninsured rate, coverage by type (employer-sponsored insurance, Medicaid/CHIP, Medicare, direct-purchase, military), and coverage rates by demographic group each September alongside the poverty and income report. The 2023 ASEC reported an uninsured rate of approximately 8.0% — 26.0 million non-elderly persons without coverage — following the Medicaid continuous enrollment unwinding that began in early 2023.

There is persistent confusion between CPS ASEC insurance estimates and the National Health Interview Survey (NHIS) and American Community Survey (ACS) insurance estimates. The ASEC asks retrospectively about coverage “at any time during the past year,” which produces slightly different rates than point-in-time questions used in other surveys. The ACS asks whether the respondent is currently covered; NHIS asks both. As a result, ASEC uninsured rates are generally somewhat lower than ACS rates for similar years. BLS and Census jointly publish a technical note each year comparing the CPS ASEC insurance methodology to ACS and NHIS.

Python: querying BLS LAUS and Census ACS for state-level comparisons

The following script retrieves state-level unemployment rates from the BLS Local Area Unemployment Statistics API, annual average poverty rates from the Census Bureau's ACS 1-year API, and national CPS aggregate series from FRED. It then builds a comparative table showing, for each state, the BLS LAUS annual average unemployment rate for 2024 and 2023, the year-over-year change, the ACS poverty rate, and the ACS labor force participation rate. A Pearson correlation between unemployment rates and poverty rates across states is computed to test whether states with higher unemployment also tend to have higher poverty. The script requires requests and pandas only; no API keys are required for FRED or the Census API, and BLS API v2 allows 25 unauthenticated series per request (register a free key at bls.gov for 50 per request).

import requests
import pandas as pd
from io import StringIO

# ---------------------------------------------------------------------------
# Part 1: National unemployment rate time series from FRED (BLS CPS series)
# ---------------------------------------------------------------------------
# FRED provides key CPS aggregate series via a public API.
# UNRATE   = official U-3 unemployment rate (seasonally adjusted)
# U6RATE   = U-6 (total unemployed + marginally attached + part-time for
#             economic reasons) as share of civilian labor force
# CIVPART  = civilian labor force participation rate
# FRED does not require an API key for basic series retrieval.

FRED_BASE = "https://fred.stlouisfed.org/graph/fredgraph.csv"

def fetch_fred(series_id: str, observation_start: str = "2010-01-01") -> pd.Series:
    url = f"{FRED_BASE}?id={series_id}&vintage_date=&observation_start={observation_start}"
    resp = requests.get(url, timeout=30)
    resp.raise_for_status()
    df = pd.read_csv(StringIO(resp.text), parse_dates=["DATE"])
    df = df[df[series_id] != "."]
    df[series_id] = df[series_id].astype(float)
    return df.set_index("DATE")[series_id]

unrate  = fetch_fred("UNRATE")
u6rate  = fetch_fred("U6RATE")
civpart = fetch_fred("CIVPART")

latest_date   = unrate.index[-1].strftime("%Y-%m")
latest_unrate = unrate.iloc[-1]
latest_u6     = u6rate.iloc[-1]
latest_lfpr   = civpart.iloc[-1]

print("=== National Labor Force Indicators (most recent month) ===")
print(f"  Reference month           : {latest_date}")
print(f"  U-3 unemployment rate     : {latest_unrate:.1f}%")
print(f"  U-6 unemployment rate     : {latest_u6:.1f}%")
print(f"  Labor force participation : {latest_lfpr:.1f}%")

# Peak COVID-19 comparison
covid_peak_unrate  = unrate["2020-04-01"]
covid_peak_lfpr    = civpart["2020-04-01"]
print(f"\n  COVID-19 peak U-3 (Apr 2020): {covid_peak_unrate:.1f}%")
print(f"  COVID-19 trough LFPR (Apr 2020): {covid_peak_lfpr:.1f}%")

# Year-over-year change
if len(unrate) >= 13:
    yoy_change = unrate.iloc[-1] - unrate.iloc[-13]
    direction  = "up" if yoy_change > 0 else "down"
    print(f"  YoY change in U-3: {abs(yoy_change):.1f} pp {direction}")

# ---------------------------------------------------------------------------
# Part 2: State-level unemployment rates from BLS CPS state data
# ---------------------------------------------------------------------------
# BLS Local Area Unemployment Statistics (LAUS) produces state-level
# unemployment rates derived from CPS with additional modelling. Published
# at: https://www.bls.gov/lau/
# The BLS public API (v2) allows retrieval of state series without a key
# for up to 25 series per request (50 with a registered key).
# LAUS series IDs: LASST<FIPS>0000000000003 for state unemployment rate.

STATE_FIPS = {
    "Alabama": "01", "Alaska": "02", "Arizona": "04", "Arkansas": "05",
    "California": "06", "Colorado": "08", "Connecticut": "09",
    "Delaware": "10", "Florida": "12", "Georgia": "13", "Hawaii": "15",
    "Idaho": "16", "Illinois": "17", "Indiana": "18", "Iowa": "19",
    "Kansas": "20", "Kentucky": "21", "Louisiana": "22", "Maine": "23",
    "Maryland": "24", "Massachusetts": "25", "Michigan": "26",
    "Minnesota": "27", "Mississippi": "28", "Missouri": "29",
    "Montana": "30", "Nebraska": "31", "Nevada": "32",
    "New Hampshire": "33", "New Jersey": "34", "New Mexico": "35",
    "New York": "36", "North Carolina": "37", "North Dakota": "38",
    "Ohio": "39", "Oklahoma": "40", "Oregon": "41", "Pennsylvania": "42",
    "Rhode Island": "44", "South Carolina": "45", "South Dakota": "46",
    "Tennessee": "47", "Texas": "48", "Utah": "49", "Vermont": "50",
    "Virginia": "51", "Washington": "53", "West Virginia": "54",
    "Wisconsin": "55", "Wyoming": "56",
}

def laus_series_id(fips: str) -> str:
    # Series ID format: LASST + state FIPS (2 digits) + 0000000000003
    return f"LASST{fips}0000000000003"

BLS_API = "https://api.bls.gov/publicAPI/v2/timeseries/data/"

def fetch_bls_state_unemployment(state_fips_map: dict, year: int = 2024) -> dict:
    """Fetch annual average unemployment rates for all states from BLS LAUS."""
    series_ids = [laus_series_id(fips) for fips in state_fips_map.values()]
    state_by_series = {
        laus_series_id(fips): name
        for name, fips in state_fips_map.items()
    }
    results = {}

    # BLS v2 API: up to 25 series per unauthenticated request
    chunk_size = 25
    for i in range(0, len(series_ids), chunk_size):
        chunk = series_ids[i : i + chunk_size]
        payload = {
            "seriesid": chunk,
            "startyear": str(year - 1),
            "endyear": str(year),
            "annualaverage": True,
        }
        resp = requests.post(BLS_API, json=payload, timeout=60)
        resp.raise_for_status()
        data = resp.json()

        if data.get("status") != "REQUEST_SUCCEEDED":
            print(f"  [warn] BLS API chunk {i // chunk_size + 1}: {data.get('message', 'unknown error')}")
            continue

        for series in data.get("Results", {}).get("series", []):
            sid  = series["seriesID"]
            name = state_by_series.get(sid, sid)
            rows = [
                (d["year"], d["period"], float(d["value"]))
                for d in series.get("data", [])
                if d["period"] == "M13"   # M13 = annual average
            ]
            if rows:
                rows.sort(key=lambda x: x[0])
                results[name] = {row[0]: row[2] for row in rows}

    return results

state_ur = fetch_bls_state_unemployment(STATE_FIPS, year=2024)

# ---------------------------------------------------------------------------
# Part 3: State poverty rates from Census API (ACS 1-year)
# ---------------------------------------------------------------------------
# Census API endpoint for ACS 1-year poverty rate by state.
# Variable B17001_002E = persons below poverty level
# Variable B17001_001E = total persons for whom poverty status determined
# api.census.gov/data/2023/acs/acs1/profile endpoint, variable DP03_0119PE
# gives "percent below poverty level" directly.

CENSUS_YEAR = 2023
CENSUS_URL = (
    f"https://api.census.gov/data/{CENSUS_YEAR}/acs/acs1/profile"
    "?get=NAME,DP03_0119PE,DP03_0005PE,DP03_0004PE"
    "&for=state:*"
)
# DP03_0119PE = % families and people below poverty level
# DP03_0005PE = unemployment rate (ACS civilian labor force definition)
# DP03_0004PE = labor force participation rate

resp = requests.get(CENSUS_URL, timeout=30)
resp.raise_for_status()
rows = resp.json()
headers = rows[0]
acs_data = []
for row in rows[1:]:
    record = dict(zip(headers, row))
    try:
        acs_data.append({
            "state":  record["NAME"],
            "poverty_rate": float(record["DP03_0119PE"]) if record["DP03_0119PE"] else None,
            "acs_unemp":    float(record["DP03_0005PE"]) if record["DP03_0005PE"] else None,
            "acs_lfpr":     float(record["DP03_0004PE"]) if record["DP03_0004PE"] else None,
        })
    except (ValueError, KeyError):
        pass

acs_df = pd.DataFrame(acs_data).sort_values("state").reset_index(drop=True)

# ---------------------------------------------------------------------------
# Part 4: Build comparison table: BLS LAUS UR + Census ACS poverty + LFPR
# ---------------------------------------------------------------------------
print("\n=== State Comparison: Unemployment, Poverty, Labor Force Participation ===")
print(f"  Sources: BLS LAUS 2024 annual avg UR | Census ACS {CENSUS_YEAR} 1-year poverty/LFPR")
print(f"  {'State':<20}  {'LAUS UR 24':>10}  {'LAUS UR 23':>10}  {'YoY':>6}  "
      f"{'Poverty %':>10}  {'ACS LFPR':>9}")
print("  " + "-" * 75)

for _, row in acs_df.iterrows():
    sname     = row["state"]
    poverty   = row["poverty_rate"]
    acs_lfpr  = row["acs_lfpr"]
    ur_data   = state_ur.get(sname, {})
    ur_2024   = ur_data.get("2024")
    ur_2023   = ur_data.get("2023")

    if ur_2024 is None and ur_2023 is None:
        continue

    yoy_str = ""
    if ur_2024 is not None and ur_2023 is not None:
        delta   = ur_2024 - ur_2023
        sign    = "+" if delta >= 0 else "-"
        yoy_str = f"{sign}{abs(delta):.1f}"

    ur24_str   = f"{ur_2024:.1f}%" if ur_2024 is not None else "N/A"
    ur23_str   = f"{ur_2023:.1f}%" if ur_2023 is not None else "N/A"
    pov_str    = f"{poverty:.1f}%" if poverty is not None else "N/A"
    lfpr_str   = f"{acs_lfpr:.1f}%" if acs_lfpr is not None else "N/A"

    print(f"  {sname:<20}  {ur24_str:>10}  {ur23_str:>10}  {yoy_str:>6}  "
          f"{pov_str:>10}  {lfpr_str:>9}")

# ---------------------------------------------------------------------------
# Part 5: Correlation check &mdash; unemployment vs. poverty across states
# ---------------------------------------------------------------------------
merged = acs_df.copy()
merged["bls_ur_2024"] = merged["state"].map(
    {s: d.get("2024") for s, d in state_ur.items()}
)
clean = merged.dropna(subset=["bls_ur_2024", "poverty_rate"])

if len(clean) >= 10:
    corr = clean["bls_ur_2024"].corr(clean["poverty_rate"])
    print(f"\n  Pearson correlation (BLS UR 2024 vs ACS poverty rate): r = {corr:.3f}")
    print(f"  n = {len(clean)} states with both measures available")
    if corr > 0.5:
        print("  Strong positive correlation: higher unemployment states tend to have higher poverty rates.")
    elif corr > 0.3:
        print("  Moderate positive correlation.")
    else:
        print("  Weak correlation: unemployment and poverty diverge at the state level.")

The BLS LAUS state unemployment rates used here are derived from CPS data but are not identical to direct CPS state estimates. Because CPS sample sizes at the state level are too small for reliable direct estimates in many states, BLS applies a signal-plus-noise model that combines CPS state sample data with payroll employment data, establishment survey data, and unemployment insurance claims data to produce smoothed state-level estimates. The LAUS methodology is documented at bls.gov/lau/laumthd.htm. For states with very small CPS sample sizes — Wyoming, Vermont, Alaska, North Dakota — the modeled estimates rely more heavily on the administrative UI claims component.

CPS data limitations and research notes

CPS estimates carry sampling error that is substantial for small subgroups. BLS publishes standard errors for published CPS estimates; researchers should not compare CPS point estimates for small demographic groups across years without testing whether observed changes exceed the margin of error. The CPS sample is not designed to support reliable estimates below the state level — metropolitan area and county estimates require ACS 1-year or 5-year data. BLS does publish annual CPS state-level estimates of unemployment by race and Hispanic origin, but these carry large confidence intervals for smaller states.

The CPS misses several populations relevant to labor and poverty analysis. Institutionalized persons — including approximately 2.1 million incarcerated people — are excluded from the civilian noninstitutional population and from both labor force and poverty statistics. This exclusion biases unemployment rates downward (incarcerated persons disproportionately come from the demographic groups with the highest unemployment) and biases poverty rates downward (incarcerated persons are overwhelmingly low-income prior to incarceration). The undercoverage of homeless persons and doubled-up households that do not appear as separate housing units in the Master Address File is a second source of bias.

Survey response rates declining from 92% to roughly 73% since 2000 introduce nonresponse bias if the propensity to respond is correlated with labor force status or income. Census Bureau research suggests that nonrespondents have somewhat lower employment rates than respondents with similar observed characteristics, and that the standard weighting adjustments do not fully correct for this. This is an active methodological research area; the Census Bureau publishes periodic assessments of CPS data quality at census.gov/topics/employment/cps.

For researchers linking CPS records across months to study worker transitions, the 4-8-4 rotation structure means records must be matched on household address identifiers and household member sequence numbers, because CPS does not include a person identifier. Matching success rates are typically 85–90%; unmatched records represent households that moved, refused to continue, or could not be located. IPUMS-CPS provides pre-matched longitudinal extracts for outgoing rotation groups, substantially simplifying this process.