Technical writing

CDC Excess Deaths: The Federal Measure of How Many More Americans Died Than Expected

· 12 min read· AI Analytics
CDCNCHSExcess MortalityPublic HealthFederal Data

In an ordinary week a given state buries a fairly predictable number of people— predictable enough that a statistical model trained on the past several years can say, within a narrow band, how many deaths that week should produce. Excess mortality is what happens when reality overshoots the model: the gap between the deaths that actually occurred and the deaths that were expected. The CDC's National Center for Health Statistics computes that gap from the nation's death certificates, jurisdiction by jurisdiction and week by week—roughly 57,000 records that together form the most complete federal measure of how many more Americans died than expected.

This article covers what the excess-deaths dataset is and how the National Vital Statistics System frames it; the central idea of measuring a mortality event by the gap between observed and expected deaths, and why that gap captures impacts a direct cause count cannot; the baseline model—the over-dispersed Poisson, Farrington-style approach NCHS uses to estimate expected deaths from prior years, accounting for trend and seasonality—and the expected-deaths threshold it produces; how COVID-19 made excess mortality central, and why it became the most complete measure of the pandemic's toll, capturing undiagnosed COVID deaths and the indirect deaths of disrupted care; the jurisdiction-by-week structure and the role of cause groupings; the provisional nature of recent weeks and the death-certificate processing lag; how the data is published through data.cdc.gov; a Python workflow that pulls excess-death rows, sums excess by year and state, and compares observed against expected for a jurisdiction over time; and the caveats—model dependence, reporting lag, and the difference between excess and attributable mortality—that every analyst must internalize before drawing conclusions.

What the dataset is

The National Vital Statistics System (NVSS) is the federal government's near-census record of every death in the United States, assembled by the National Center for Health Statistics from the death certificates that the states are legally responsible for registering. Excess mortality is not a separate collection of facts; it is an analysis built on top of the NVSS death counts. NCHS takes the count of deaths actually recorded in a jurisdiction in a week—the observed number—and compares it against a modeled estimate of how many deaths that jurisdiction and week should have produced based on its own recent history—the expected number. The difference is the excess. Surfaced through CDC data services, the excess-deaths record we store comprises roughly 57,000 rows.

In our database this record is stored as the table cdc_excess_deaths, with the grain of one row per jurisdiction by week (and cause grouping). A single jurisdiction—a state, or the United States as a whole—observed across several years of weeks, split between an all-cause grouping and a grouping that excludes a major cause, generates a long ribbon of rows, each carrying its own observed count, its own expected threshold, and its own excess estimate. The columns capture the place, the week, what was actually observed, what was expected, the gap, and whether the gap was large enough to count:

jurisdiction          -- United States, or a state / reporting area
week_ending_date      -- the Saturday ending the MMWR week observed
outcome               -- cause grouping (e.g. all causes; all causes
                         excluding COVID-19) the row pertains to
observed_number       -- deaths actually recorded for the cell
threshold             -- expected-deaths threshold (upper bound of the
                         modeled baseline) for that jurisdiction-week
average_expected      -- the central expected count from the baseline
excess_estimate       -- observed minus expected (the excess), floored
                         at zero in most NCHS releases
exceeds_threshold     -- flag: did observed exceed the threshold?
data_completeness     -- provisional / weighting indicator for the week
type                  -- predicted (final) vs unweighted/provisional

The observed_number and threshold columns are the load-bearing pair, and the comparison between them is the whole point of the dataset. The threshold is the upper bound of the modeled baseline: NCHS treats observed deaths as significantly elevated when they cross it, and the exceeds_threshold flag records exactly that crossing for each jurisdiction-week. The excess_estimate is the headline number—the count of deaths beyond expectation—and because expected deaths are themselves a modeled estimate with uncertainty, NCHS typically publishes the excess in a range rather than as a single point. The outcome column is what lets the same jurisdiction-week appear more than once: an all-cause row and a row for all causes excluding a major contributor are distinct rows with the same place and date, and an analyst who sums without filtering on outcome will double-count. The week_ending_date anchors everything to the standard MMWR week, and the completeness and type columns are the analyst's warning that the most recent weeks are still filling in.

The idea: observed minus expected

The conceptual core of this dataset is older than any pandemic and simpler than its machinery suggests. To measure the mortality impact of an event—a pandemic, a heat wave, a hurricane, an economic shock—you cannot simply count the deaths labeled with that event's cause, because the label is unreliable and incomplete. Instead you ask a counterfactual question: how many people would have died in this place and time had the event not occurred? That counterfactual is the expected count, and the difference between what actually happened and that counterfactual is the excess. Excess mortality is, in essence, a way of letting the deaths themselves reveal the size of an event, without depending on anyone correctly attributing each death to a cause.

This indirection is precisely what gives excess mortality its analytic power. A direct cause count—deaths with a particular disease written on the certificate—captures only the deaths that were diagnosed, certified, and coded to that cause. It misses two large categories. The first is undiagnosed deaths directly caused by the event but never recognized as such: people who died of an infection that was never tested for, or whose certificate named a complication rather than the underlying cause. The second, and often larger, is indirect deaths—deaths the event caused not by its direct mechanism but by its disruption of everything around it: a heart attack untreated because the patient avoided an overwhelmed emergency room, a cancer that progressed because screening was delayed, an overdose that rose with isolation and disrupted treatment. Excess mortality sweeps in all of these, because it does not ask why each person died—only whether more people died than the baseline predicted.

The all-cause framing is therefore both the method's strength and the source of its most important caveat. Because excess deaths measure total mortality against expectation, the number is agnostic about cause: it does not distinguish a death the event caused from a death that would have occurred anyway in an unusually deadly year for unrelated reasons, nor does it credit the event with deaths it indirectly prevented (for example, the well-documented drop in some categories of death when behavior changes during a crisis). Excess mortality is the net deviation of total deaths from baseline, which is usually the right thing to measure for a large event but must always be read as a net, all-cause figure rather than as a tally of deaths attributable to a single cause.

The baseline model and the expected-deaths threshold

Everything in the dataset rests on the expected count, and the expected count is the output of a statistical model fitted to the jurisdiction's own recent past. NCHS estimates expected deaths using an over-dispersed Poisson regressionof the kind associated with the Farrington family of methods used in public-health surveillance. The model is trained on several prior years of weekly death counts for each jurisdiction and learns the structure of normal mortality—its long-run trend (slow changes from population growth and aging) and its strong seasonality (the pronounced winter peak from respiratory illness and cold, the summer trough)—so that it can project, for each future week, how many deaths that week should produce if nothing unusual happens.

Two technical choices in that sentence carry real weight. Counts of deaths are modeled with a Poisson-type distribution because they are counts of independent-ish events; the over-dispersion matters because weekly death counts vary more than a plain Poisson would predict, and ignoring that extra variance would make the expected band too narrow and flag ordinary fluctuations as excess. The Farrington-style approach is built precisely for this surveillance task: it produces not just a central expected value but an upper threshold—a level that observed deaths are unlikely to exceed by chance alone—and it is designed to be robust to the past outbreaks already sitting in the training data, so that a prior bad flu season does not inflate the baseline and mask a new event.

The practical output is the pair of numbers in every row: a central expected count and an upper threshold. NCHS treats a week as showing significant excess when the observed count crosses the threshold, and it estimates the excess two ways— against the central expected value and against the threshold—which is why published excess figures come as a range bracketing a higher and a lower estimate. The crucial interpretive point is that expected deaths are a model output, not a fact. A different baseline window, a different treatment of trend, or a different handling of past anomalies would shift the expected line and therefore the excess. The dataset is honest about this by publishing thresholds and ranges rather than a single authoritative count, and any serious use of the data should carry that uncertainty forward rather than collapsing the excess to a single deceptively precise number.

How COVID-19 made excess mortality central

Excess mortality existed as a surveillance concept long before 2020—it has been used for a century to size influenza seasons and to count the dead from heat waves and hurricanes—but COVID-19 moved it from a specialist tool to a central public measure of the pandemic. The reason was the gap between the official COVID-19 death count and the toll the country was actually absorbing, and excess mortality is the instrument that measured the gap.

Early in the pandemic the official tally undercounted for concrete, structural reasons. Testing was scarce, so many people who died of COVID-19 were never tested and their deaths were certified to pneumonia, respiratory failure, or simply “natural causes.” Certification practice varied by jurisdiction and improved over time, so the direct count was uneven across places and across the timeline. And the official count, by construction, could only ever capture deaths attributed to the virus—it could say nothing about the indirect deaths the pandemic caused by overwhelming hospitals, deterring people from seeking care, delaying surgeries and screening, and colliding with a worsening overdose crisis. Excess mortality captured all of it at once: by comparing total deaths against the pre-pandemic baseline, it counted the undiagnosed COVID-19 deaths, the indirect deaths from disrupted care, and the secondary surges in other causes, without needing any of them correctly labeled.

The result is that excess mortality came to be regarded as the most complete measure of the pandemic's mortality toll, and the dataset documents it directly. The all-cause excess ran substantially above the directly attributed COVID-19 count, especially in the first waves and in places where testing lagged. The weekly, jurisdiction-resolved structure made the waves legible—the sharp spring 2020 surge in the Northeast, the summer Sun Belt wave, the broad winter peaks—and the cause-grouping rows let analysts separate the COVID-labeled excess from the remainder. The same machinery that sized the pandemic is what makes the dataset durable beyond it: the method applies unchanged to any future event that raises mortality above its expected baseline, which is why NCHS maintains it as standing infrastructure rather than a one-time pandemic product.

Jurisdiction, week, and cause grouping

The grain of the data—one row per jurisdiction by week by cause grouping—is what makes it analytically rich, and each of the three dimensions does distinct work. The jurisdiction dimension is the geography: the United States as a whole plus the individual states and reporting areas. Because each jurisdiction gets its own baseline model fitted to its own history, the expected line for Florida (older, with a high baseline death rate) differs structurally from that of Utah (younger, lower baseline)—which is exactly right, since the comparison should always be a place against its own past, never one place's raw counts against another's.

The week dimension uses the standard MMWR week—the epidemiological week ending on a Saturday that the CDC uses across its surveillance systems—so that excess mortality lines up cleanly with case counts, hospitalizations, and the other weekly public-health series. Weekly resolution is what lets the data show the timing and shape of a surge rather than just its annual total: the lead and lag between waves across regions, the speed of a rise, the duration of a plateau. It is also why the dataset is sensitive to reporting lag—the finer the time grain, the more the most recent observations are still incomplete.

The cause grouping dimension is the subtlest and the most often mishandled. The dataset typically carries, for each jurisdiction-week, both an all-cause row and a row computed excluding a major cause (for the pandemic period, all causes excluding COVID-19). The two together let an analyst decompose the total excess: comparing all-cause excess against the excess that remains when the named cause is removed isolates how much of the deviation the named cause directly carried versus how much came from everything else—the indirect and undiagnosed deaths. The practical hazard is purely mechanical: because the same place and week appear in more than one outcome row, every aggregation must filter to a single cause grouping first. Summing across outcome rows without that filter double-counts the deaths, and it is the single most common error in working with this dataset.

Provisional data and the reporting lag

No feature of this dataset matters more for honest analysis than the fact that the most recent weeks are incomplete. Death data does not arrive instantly. A death must be certified, the certificate registered by the state, and the record transmitted to NCHS before it can be counted—and for deaths that require investigation, the certification itself can take weeks or months. The consequence is a systematic reporting lag: the count for last week is not wrong so much as unfinished, and it will rise as late-arriving certificates are processed.

NCHS handles this in two ways that the analyst must respect. First, the most recent weeks are published as provisional and clearly flagged as incomplete; the completeness and type indicators in the data exist precisely to mark which weeks are still filling in. Second, to keep the leading edge of the series from looking like a cliff, NCHS produces a weighted version that adjusts recent provisional counts upward to estimate where they will land once reporting completes, based on the historical pace of certificate arrival—so the published series can include both an unweighted (raw, undercounted) and a weighted (predicted) view of the recent weeks.

The discipline this imposes is absolute: never read the last several weeks of the raw series as a decline. A drop at the right edge of an excess chart is almost always the reporting lag, not a fall in deaths, and naive analyses that take the most recent provisional weeks at face value will confidently announce that a surge is ending when it is merely unrecorded. The lag also varies by jurisdiction—some states report quickly, others slowly—so cross-state comparisons at the leading edge are doubly unsafe. The dataset is authoritative for established weeks and completed periods; for the most recent weeks, use the weighted estimate if at all, treat the unweighted counts as lower bounds, and expect upward revision.

Analytical uses

A weekly, jurisdiction-resolved, baseline-referenced record of total mortality supports a distinctive range of analysis that a direct cause count cannot.

Sizing the full toll of an event is the foundational use. By summing the excess across the weeks of a surge for a jurisdiction, an analyst recovers the complete mortality impact—direct, indirect, and undiagnosed—and by comparing it against the directly attributed cause count, quantifies how much the official tally missed. The cause-grouping rows refine this further, decomposing the total excess into the named cause and the remainder. This is the analysis that produced the most credible estimates of the pandemic's true cost, and it transfers directly to heat waves, hurricanes, and future epidemics.

Comparing jurisdictions and timing waves exploits the geographic and weekly structure. Because each jurisdiction is measured against its own baseline, excess—ideally expressed relative to the expected count or per capita—is a more honest cross-state comparison than raw death counts, surfacing which states absorbed proportionally larger tolls and how the waves rippled across regions in time. Joining to other federal data extends the reach: aligning excess mortality by MMWR week with the CDC's case, hospitalization, and vaccination series to relate the death deviation to the epidemic curve, or pairing it with cause-specific mortality to ask which causes drove the non-COVID excess. Throughout, excess mortality serves as the outcome—the bottom-line measure of harm—that the upstream surveillance series help explain.

Python workflow: excess deaths from CDC data services

The script below pulls excess-death rows from the NCHS dataset on data.cdc.gov—the Socrata API that ships the observed counts, the expected-deaths thresholds, and the excess estimates already computed—and performs two core analyses: summing the estimated excess by year and state to find the heaviest state-years, and lining up observed deaths against the expected threshold week by week for a single jurisdiction to count the weeks that breached it and locate the peak. No API key is required for modest volumes. Because NCHS Socrata dataset identifiers and field names vary between releases, the script isolates the dataset id in one place and resolves the jurisdiction, week, outcome, observed, threshold, and excess column names defensively rather than hard-coding them; critically, it filters to a single cause grouping before aggregating so the same jurisdiction-week is not double-counted, and it drops the national rollup row before summing across states. Any production use should be validated against the current data.cdc.gov catalog and should honor the provisional and weighting flags discussed above.

import requests
import pandas as pd

# CDC excess deaths come from NCHS, built on the National Vital
# Statistics System. The published estimates live on data.cdc.gov
# (Socrata), one row per jurisdiction x week (x outcome grouping),
# carrying the observed count, an expected/threshold value from an
# over-dispersed Poisson baseline, and the estimated excess.
# No API key is required for modest volumes; the API returns JSON.
SODA = "https://data.cdc.gov/resource"

# The 4x4 Socrata dataset id changes across NCHS releases; isolate it
# here and confirm against the current data.cdc.gov catalog. xkkf-xrst
# is the NCHS "Excess Deaths Associated With COVID-19" resource.
EXCESS_DATASET = "xkkf-xrst"


def fetch(dataset, where=None, select=None, limit=60000):
    # Socrata accepts SoQL query parameters ($where, $select, $limit).
    params = {"$limit": limit}
    if where:
        params["$where"] = where
    if select:
        params["$select"] = select
    url = f"{SODA}/{dataset}.json"
    r = requests.get(url, params=params, timeout=120)
    r.raise_for_status()
    return pd.DataFrame(r.json())


def _col(df, *names):
    # Resolve the first matching column name actually present; NCHS
    # field names vary by release (state vs jurisdiction, etc.).
    for n in names:
        if n in df.columns:
            return n
    return None


def _num(df, col):
    return pd.to_numeric(df[col], errors="coerce")


# --- 1. Sum estimated excess by year and state -----------------------
# The "excess" column is observed minus expected (floored at zero in
# most NCHS releases). Restrict to a single outcome grouping so cause
# categories are not double-counted, then roll up by year and state.
def excess_by_year_state(outcome="All causes"):
    df = fetch(EXCESS_DATASET)
    if df.empty:
        print("No rows returned.")
        return df
    geo = _col(df, "state", "jurisdiction", "geography")
    wk = _col(df, "week_ending_date", "week", "date")
    out = _col(df, "outcome", "type", "group")
    exc = _col(df, "excess_estimate", "excess_higher_estimate", "excess_lower_estimate")
    if not (geo and wk and exc):
        print("Expected columns not found in this release.")
        return df
    if out:
        df = df[df[out].astype(str).str.contains(outcome, case=False, na=False)]
    df["wk"] = pd.to_datetime(df[wk], errors="coerce")
    df["yr"] = df["wk"].dt.year
    df["exc"] = _num(df, exc).clip(lower=0)
    # Drop the national rollup row so states are not double-counted.
    df = df[df[geo].astype(str).str.lower() != "united states"]
    by = (df.dropna(subset=["yr", "exc"])
            .groupby(["yr", geo])["exc"].sum().reset_index())
    print("Top state-years by estimated excess deaths:")
    for _, row in by.sort_values("exc", ascending=False).head(15).iterrows():
        print(f"  {int(row['yr'])}  {str(row[geo])[:20]:<20} {row['exc']:>12,.0f}")
    return by


# --- 2. Compare observed against expected for one jurisdiction -------
# For a single state, line up observed deaths against the expected
# threshold week by week and flag the weeks that breached it.
def observed_vs_expected(state="United States", outcome="All causes"):
    df = fetch(EXCESS_DATASET, where=f"upper(state)=upper('{state}')")
    if df.empty:
        print(f"No rows for {state}.")
        return df
    wk = _col(df, "week_ending_date", "week", "date")
    out = _col(df, "outcome", "type", "group")
    obs = _col(df, "observed_number", "observed", "number_of_deaths")
    thr = _col(df, "upper_bound_threshold", "threshold", "average_expected_count", "expected")
    if out:
        df = df[df[out].astype(str).str.contains(outcome, case=False, na=False)]
    df["wk"] = pd.to_datetime(df[wk], errors="coerce")
    df["obs"] = _num(df, obs)
    df["thr"] = _num(df, thr)
    df = df.dropna(subset=["wk", "obs", "thr"]).sort_values("wk")
    breached = (df["obs"] > df["thr"]).sum()
    print(f"\n{state} ({outcome}): {breached:,} of {len(df):,} weeks "
          f"exceeded the expected-deaths threshold.")
    peak = df.loc[(df["obs"] - df["thr"]).idxmax()]
    print(f"  Peak excess week: {peak['wk'].date()}  "
          f"observed {peak['obs']:,.0f}  threshold ~{peak['thr']:,.0f}")
    return df


excess_by_year_state("All causes")
observed_vs_expected("United States", "All causes")

Two practical notes apply. First, the script reads the excess estimate NCHS publishes rather than re-fitting the baseline, which is the right default—reproducing the expected count correctly requires several years of jurisdiction-specific weekly history, the over-dispersed Poisson specification with its trend and seasonality terms, and the robust handling of past anomalies, all of which the published threshold and expected columns already encode. An analysis that wants a different baseline must pull the historical weekly counts and fit its own model, and should expect a different excess as a result. Second, the script clips the excess at zero and filters to a single outcome before summing, but serious work must go further: it should restrict to completed weeks (or use the weighted estimate) so the reporting lag does not depress recent totals, and it should carry the higher/lower excess range rather than a single point, because the expected count is itself uncertain. Treating a provisional recent week as final, or a point estimate as exact, manufactures false precision from a fundamentally model-based, still-arriving series.

Limitations and analytical caveats

Excess mortality is the most complete federal measure of an event's mortality impact, but it is a modeled, derived quantity, and it carries structural limitations that an analyst must internalize before drawing conclusions from it.

The expected count depends on the model. Excess is observed minus expected, and expected is the output of a baseline fitted to past years—so the excess inherits every assumption in that baseline. The choice of training window, the handling of trend and seasonality, the treatment of prior anomalies, and the over-dispersion specification all shift the expected line and therefore the excess. Different reputable models produce different excess estimates for the same event, sometimes by a meaningful margin. The NCHS figures are credible and transparent, but they are one estimate among several defensible ones, and an analysis should treat the published excess as a model-dependent quantity with genuine uncertainty—which is why NCHS publishes a range and a threshold rather than a single number.

Recent weeks are provisional and lag-depressed. As the provisional-data section stressed, the most recent weeks undercount because certificates are still arriving, and the undercount varies by jurisdiction. The leading edge of the series will almost always appear to fall, and that apparent fall is the reporting lag, not a real decline in deaths. Any analysis of the current situation must use the weighted estimate or treat the latest unweighted weeks as lower bounds, and any cross-state comparison at the leading edge is compromised by differential reporting speed. The data is authoritative for completed periods; it is not a real-time monitor of last week's deaths.

Excess is net, all-cause, and not the same as attributable. Excess mortality measures the total deviation of all deaths from baseline, which means it nets together everything that moved mortality in either direction. It cannot, by itself, say which deaths a particular event caused: it folds in undiagnosed and indirect deaths (its great strength) but also any unrelated reason deaths might have run high or low that period, and it offsets deaths the event indirectly prevented. Equating the all-cause excess with deaths attributable to a single named cause over-reads the measure—the cause-grouping decomposition helps, but the residual after removing a named cause is itself a mix of indirect effects and unrelated variation, not a clean count.

Small jurisdictions and short windows are noisy. In a small reporting area, or for a single week, the expected band is wide relative to the count and ordinary fluctuation can look like excess or mask it; the threshold logic is designed to absorb this, but excess estimates for small populations and short periods are inherently unstable and should be aggregated up to a state-season or longer before they bear weight. Held with these caveats in mind, the cdc_excess_deaths table remains the definitive federal answer to a deceptively simple question—how many more Americans died than expected—and the most complete measure available of the full mortality toll of the events, from pandemics to heat waves, that bend the country's death curve above its baseline.

Related writing

CDC Injury Mortality: The Federal Record of How Americans Die from Firearms, Overdoses, and Crashes — The cause-specific companion drawn from the same National Vital Statistics System: where excess mortality measures the net deviation of total deaths from baseline, the injury-mortality file resolves the external-cause deaths—overdoses chief among them—that drove much of the pandemic's indirect excess.

CDC Nutrition, Physical Activity, and Obesity: The Federal Surveillance Record of American Health Behavior — The chronic-disease upstream of the deaths excess mortality counts: the NPAO behavioral surveillance records the diet and activity patterns that drive the leading natural causes of death, both reported as stratified state-level series from CDC surveillance systems.

CDC Foodborne Outbreak Database: The Federal Record Behind 25,000 Annual Illness Clusters — Another CDC surveillance system that sizes a health event by counting its cases and deaths against an expected background, complementing the excess-mortality view of how the country detects and measures sudden departures from the public-health norm.