Technical writing
CDC WONDER: The Federal Mortality Database Behind Every Cause-of-Death Analysis
Every cause-of-death statistic published in an academic paper, a public health report, or a news headline ultimately traces back to a single federal system: CDC WONDER. The Wide-ranging Online Data for Epidemiologic Research platform, hosted by the Centers for Disease Control and Prevention, is the canonical access point for all registered deaths in the United States since 1999 — roughly 3 million deaths per year, each coded with an ICD-10 cause, paired with demographic characteristics, and localized to a county or state. No other dataset has the same combination of geographic depth, cause specificity, and temporal span for US mortality.
What CDC WONDER Is
CDC WONDER is a web-based query system that gives public access to multiple CDC surveillance databases, but in practice it is synonymous with one dataset: the Underlying Cause of Death (Multiple Cause of Death) file compiled by the National Center for Health Statistics (NCHS). This file records every death registered in the United States from 1999 to the most recent completed year, with a lag of roughly twelve months for final data and six to eight months for provisional data. The underlying cause — the disease or injury that initiated the chain of events leading to death — is the single ICD-10 code that appears in every mortality statistic you have ever read about heart disease, cancer, overdoses, or COVID-19.
The WONDER query interface at wonder.cdc.gov allows users to group deaths by year, month, state, county, age group, sex, race, Hispanic origin, education, place of death, and ICD-10 code or code range. It returns death counts, population denominators sourced from Census Bureau estimates, crude rates, and age-adjusted rates with 95% confidence intervals. It is free, requires no registration for most queries, and is the starting point for the overwhelming majority of peer-reviewed mortality research published in the United States.
The Death Certificate: Source of Record
Every death in the United States requires a death certificate. The certificate is completed by a physician, medical examiner, or coroner — the choice depends on the manner of death and state law — and filed with the vital statistics office of the state where the death occurred. State vital statistics agencies register and digitize the certificates, then transmit them to NCHS, which compiles the national mortality file.
A death certificate contains more information than most analysts realize. In addition to the date and place of death, it records the decedent's age, sex, race, Hispanic origin, educational attainment, marital status, and birthplace. It records the place of death (hospital, home, nursing facility, emergency room, hospice, or other). It records the manner of death: natural, accident, suicide, homicide, or undetermined. And it records the cause of death in a structured two-part format.
Part I of the cause-of-death section asks the certifier to list the immediate cause of death on line (a), and then, on subsequent lines (b), (c), and (d), any intermediate conditions that led to the immediate cause, ending with the underlying cause — the condition that set the whole process in motion. Part II asks for other significant conditions that contributed to death but were not part of the causal chain. NCHS nosologists code all stated conditions into ICD-10 and apply selection rules to identify the single underlying cause. The resulting file is the Multiple Cause of Death (MCOD) database, and the underlying cause field is what drives the WONDER statistics.
Death certificate quality varies. Cause-of-death reporting by physicians is inconsistent for conditions like sepsis, dementia, and diabetes, which are frequently listed as contributing causes but not certified as underlying causes. Forensic autopsies — required for many external-cause deaths — produce more specific coding. Drug overdose deaths are among the better-coded causes because toxicology is routine, though the specific substance codes depend on what was tested for and whether the certifier specified the drug class or left it as “unspecified.”
ICD-10: The Coding Architecture
Causes of death in the WONDER system are coded using ICD-10, the International Classification of Diseases, Tenth Revision, published by the World Health Organization. ICD-10 codes have a letter-plus-number structure: a letter prefix identifying the chapter, followed by two digits for the category, a decimal, and up to two more digits for specificity (e.g., I21.0 for acute transmural myocardial infarction of the anterior wall).
The major ICD-10 chapters relevant to US mortality research are:
- A00–B99: Infectious and parasitic diseases. Includes tuberculosis, HIV, septicemia, and, since 2020, COVID-19 under code U07.1.
- C00–D49: Neoplasms. Cancer in all forms — the second leading cause of death in the US, approximately 600,000 deaths per year. C34 covers lung cancer; C18–C20 colorectal; C50 breast.
- I00–I99: Diseases of the circulatory system. Heart disease (I20–I25 for ischemic heart disease, I50 for heart failure) and stroke (I60–I69) together account for roughly one in four US deaths — the single largest cause-of-death chapter.
- J00–J99: Diseases of the respiratory system. Chronic obstructive pulmonary disease (J40–J44), influenza (J09–J11), and pneumonia (J12–J18).
- F00–F99: Mental and behavioral disorders. Includes alcohol use disorder (F10) and drug use disorders (F11–F19), though these appear as underlying cause less often than the external cause codes for poisoning.
- V01–Y89: External causes of morbidity and mortality. This is the chapter that captures deaths by mechanism rather than disease. Transport accidents are V01–V99. Falls are W00–W19. Drug poisonings (the preferred term for overdoses) are X40–X44 (accidental), X60–X64 (intentional self-harm), X85 (assault by drugs), and Y10–Y14 (undetermined). Suicide by other means runs through X60–X84. Assault (homicide) is X85–Y09.
For drug overdose research, the external cause codes X40–X44 and X60–X64 identify the intent, while the drug-specific codes in T36–T50 identify the substance involved. The most analytically critical codes in current research are T40.1 (heroin), T40.2 (natural and semi-synthetic opioids such as oxycodone and hydrocodone), T40.3 (methadone), and T40.4 (synthetic opioids other than methadone, which captures illicitly manufactured fentanyl and its analogs). These T-codes appear as contributing cause codes in the MCOD file rather than as the underlying cause; the external cause code is typically the underlying cause of an overdose death.
The WONDER Query Interface and Suppression Rules
The WONDER online query tool is organized around grouped tabulation requests. A user selects a grouping variable — year, state, age group, sex, race — and a cause-of-death filter, then submits. The system returns a table with death counts, population, crude rates (deaths per 100,000), and age-adjusted rates. Multiple grouping variables can be combined, subject to a suppression rule that protects individual privacy.
The suppression rule is important: any cell with fewer than ten deaths is suppressed and replaced with the notation “Suppressed.” This is analogous to the BLS suppression rules in QCEW and serves the same disclosure-avoidance purpose. For national or large-state queries on common causes of death, suppression is rarely triggered. For county-level queries on rare causes or small demographic subgroups, suppression becomes pervasive. Analysts working at fine geographic or demographic granularity need to either aggregate across years or use restricted-use data.
The WONDER system also enforces a “minimum cell size” constraint on intersecting multiple grouping variables. Querying deaths simultaneously by county, race, and age group for a rare cause will produce almost entirely suppressed tables. The practical workaround is to aggregate either the geography (to state) or the cause-of-death codes (to a broader chapter) until cells exceed the suppression threshold. For researchers who need unsuppressed county-level data with full demographic and cause detail, NCHS offers restricted-use data through a research data center application process.
Age-Adjusted Mortality Rates
Crude death rates are dominated by age structure. A county whose population is 35% over age 65 will have a far higher crude death rate than a county that is 8% over 65 — even if the underlying health of residents at each age is identical. Age-adjusted rates correct for this by applying the observed age-specific death rates to a fixed standard population, producing a rate that reflects what mortality would be if all compared populations had the same age distribution.
The CDC uses the 2000 US Standard Population as the reference for most WONDER age-adjusted rates. The adjustment is a weighted average of age-specific death rates, with the weights set by the 2000 age distribution. This means age-adjusted rates from WONDER are internally comparable across geographies, demographic groups, and time periods — but not directly comparable to age-adjusted rates from other countries or other historical periods that used different reference populations.
Age-adjusted rates are what appear in every headline about racial health disparities, state-level overdose comparisons, or historical mortality trends. When a study reports that Black Americans have a 30% higher cardiovascular mortality rate than white Americans, or that West Virginia has the highest drug overdose death rate in the country, those figures are age-adjusted rates from WONDER data.
The Opioid Crisis in Three Waves
No dataset tells the story of the US opioid crisis more completely than CDC WONDER. Drug overdose deaths (underlying cause X40–X44, X60–X64, X85, Y10–Y14) have grown from roughly 16,000 per year in 1999 to more than 110,000 per year by 2022 — a nearly sevenfold increase that represents one of the most significant deteriorations in US life expectancy in modern history. Epidemiologists identify three distinct waves visible in the WONDER data.
The first wave, from the late 1990s through approximately 2010, was driven by prescription opioids. Deaths involving natural and semi-synthetic opioids (T40.2, covering oxycodone and hydrocodone) and methadone (T40.3) rose steeply as physicians prescribed opioids for chronic non-cancer pain at rates far above historical norms, encouraged by pharmaceutical manufacturers' marketing and flawed assurances about addiction risk. The geographic pattern in WONDER data tracks the distribution of high-prescribing medical practices — concentrated in Appalachia, the rural South, and parts of the Mountain West.
The second wave began around 2010 as prescription opioid supply tightened following regulatory action and reformulation of OxyContin, and many prescription opioid users shifted to heroin (T40.1) as a cheaper and more accessible alternative. Heroin overdose deaths peaked around 2016 at roughly 15,000 per year.
The third wave, ongoing and dominant, is illicitly manufactured fentanyl and its analogs, coded under T40.4 (synthetic opioids other than methadone). Fentanyl is roughly 50 to 100 times more potent than morphine by weight, making it economically attractive to traffickers — a lethal dose is measured in micrograms. Starting around 2016, fentanyl contaminated not only the heroin supply but eventually the stimulant supply as well, leading to overdose deaths in people who had no history of opioid use. By 2022, synthetic opioids (primarily fentanyl) were involved in more than 75,000 overdose deaths per year — nearly 70% of all drug overdose mortality. This shift is visible at the state and county level in WONDER, with hotspot geographies evolving from Appalachian prescription opioid areas in the early 2000s to a much broader national distribution by 2020.
“Deaths of Despair” and the Case–Deaton Research
In 2015, economists Anne Case and Angus Deaton published a paper in the Proceedings of the National Academy of Sciences documenting an extraordinary reversal: all-cause mortality among middle-aged non-Hispanic white Americans had been rising since the late 1990s, while mortality continued to fall for nearly all other demographic groups and in all peer countries. The cause was not a resurgence of infectious disease or cardiovascular mortality but three specific causes: drug and alcohol poisoning, alcohol- related liver disease (ICD-10 K70–K76), and suicide (X60–X84). Case and Deaton termed these “deaths of despair” — deaths driven by cumulative disadvantage, loss of economic status, and deteriorating social fabric among working-class whites who had seen manufacturing employment collapse and wages stagnate for two decades.
The entire analysis was built from CDC WONDER data. The dataset's demographic granularity (race, age, sex) combined with its cause-of-death specificity made it possible to isolate the signal — a mortality increase in a specific demographic-cause combination — against the background of declining overall mortality. The paper has been cited thousands of times and directly shaped federal policy attention toward rural and working-class health, opioid prescribing regulation, and mental health investment. WONDER data enables ongoing replication and extension of this research for any researcher with an internet connection.
COVID-19 Mortality and Excess Death Analysis
COVID-19 became a registered ICD-10 code (U07.1) in early 2020, and NCHS began including it in the WONDER database with provisional weekly data within weeks of deaths being registered. By year-end 2020, COVID-19 was the third leading cause of death in the United States, behind heart disease and cancer, with roughly 350,000 deaths attributed to U07.1 as the underlying cause. In 2021 it remained in third place with approximately 460,000 deaths.
Direct COVID-19 mortality undercounts the pandemic's full impact on mortality, for two reasons. First, some COVID deaths were coded to other underlying causes (pneumonia, respiratory failure) when COVID was not recognized or not listed as the underlying condition. Second, the pandemic disrupted healthcare access and mental health broadly, producing indirect deaths from delayed cardiac care, missed cancer screenings, and acute mental health crises. Excess mortality analysis — comparing total observed deaths in 2020 and 2021 to the trend-expected number based on 2015 through 2019 data — reveals approximately 600,000 to 900,000 excess deaths over the two years, depending on the modeling approach, substantially above the direct U07.1 count. This analysis is performed directly in WONDER by pulling all-cause death counts by week and state and comparing them to pre-pandemic baselines.
The COVID-19 data in WONDER also illustrates the provisional-versus-final data distinction. Provisional data, available with a lag of a few weeks, reflects deaths registered but not yet fully coded; it undercounts by 5 to 10% because some certificates take months to reach final processing. Final data, released roughly twelve months after the reference year, is the authoritative count used in peer-reviewed research.
Python: Opioid Overdose Deaths by State from the NCHS MCOD File
The CDC WONDER online interface is convenient for exploratory queries but has limitations for bulk analysis: suppressed cells, rate limits, and no programmatic API for the full granularity of the MCOD file. For reproducible research, the preferred approach is to work directly with the NCHS public-use mortality data files, which NCHS distributes as compressed fixed-width ASCII files via the CDC FTP server.
The following script downloads the 2022 public-use mortality file, parses the fixed-width format, identifies opioid-involved drug overdose deaths using the T40.1–T40.4 multiple-cause codes, and counts deaths by state. The multiple cause structure is essential here: an opioid overdose death will have an external cause code (e.g., X42) as the underlying cause and a T40.x code as a contributing cause in the record-axis conditions. Neither field alone tells the full story.
import requests
import pandas as pd
import io, zipfile
# CDC publishes compressed fixed-width multiple cause of death (MCOD) public-use files
# at https://ftp.cdc.gov/pub/Health_Statistics/NCHS/Datasets/DVS/mortality/
# Each year is a ZIP containing a fixed-width ASCII file.
# Column positions follow the NCHS documentation for the public-use mortality file.
# This example uses the pre-processed CDC Wonder compressed files available
# via the NCHS FTP server for 2022 data (most recent public-use release).
YEAR = "2022"
FTP_URL = (
"https://ftp.cdc.gov/pub/Health_Statistics/NCHS/Datasets/DVS/mortality/"
"mort" + YEAR + "us.zip"
)
# Opioid-related ICD-10 drug codes (multiple cause codes from the MCOD file)
# T40.1 = heroin, T40.2 = natural/semi-synthetic opioids, T40.3 = methadone
# T40.4 = synthetic opioids other than methadone (fentanyl group)
OPIOID_CODES = {"T401", "T402", "T403", "T404"}
# Fixed-width column positions (1-indexed, inclusive) from NCHS documentation
# We read only the fields we need.
COLSPECS = [
(19, 20), # state FIPS (2 chars)
(63, 71), # underlying cause ICD-10 code (9 chars, left-padded)
(344, 348), # record-axis condition 1 (first contributing cause, 5 chars)
(349, 353), # record-axis condition 2
(354, 358), # record-axis condition 3
(359, 363), # record-axis condition 4
(364, 368), # record-axis condition 5
(69, 69), # age recode (1 char, 12 groups)
]
NAMES = ["state_fips", "underlying", "rc1", "rc2", "rc3", "rc4", "rc5", "age_recode"]
print("Downloading", FTP_URL)
resp = requests.get(FTP_URL, timeout=300)
resp.raise_for_status()
with zipfile.ZipFile(io.BytesIO(resp.content)) as zf:
fname = [n for n in zf.namelist() if n.endswith(".dat") or not "." in n][0]
with zf.open(fname) as f:
raw = f.read().decode("latin-1")
lines = raw.splitlines()
records = []
for line in lines:
if len(line) < 369:
continue
rec = {
"state_fips": line[19:21].strip(),
"underlying": line[63:72].strip(),
"rc1": line[343:348].strip(),
"rc2": line[348:353].strip(),
"rc3": line[353:358].strip(),
"rc4": line[358:363].strip(),
"rc5": line[363:368].strip(),
"age_recode": line[68:69].strip(),
}
records.append(rec)
df = pd.DataFrame(records)
# A death is opioid-related if any record-axis condition code matches our set.
# The multiple cause codes in the public-use file are 5-char (e.g. "T4040" for T40.4).
def is_opioid(row):
codes = {row["rc1"][:4], row["rc2"][:4], row["rc3"][:4],
row["rc4"][:4], row["rc5"][:4]}
return bool(codes & OPIOID_CODES)
df["opioid"] = df.apply(is_opioid, axis=1)
# Filter to drug overdose underlying cause (X40-X44, X60-X64, X85, Y10-Y14)
# These are "accidental" and "intentional self-harm" drug poisoning codes.
overdose_prefixes = (
["X4" + str(i) for i in range(0, 5)]
+ ["X6" + str(i) for i in range(0, 5)]
+ ["X85"]
+ ["Y1" + str(i) for i in range(0, 5)]
)
df["drug_overdose"] = df["underlying"].str[:3].isin(overdose_prefixes)
# Count opioid overdose deaths by state
opioid_od = df[df["drug_overdose"] & df["opioid"]].copy()
state_counts = (
opioid_od.groupby("state_fips")
.size()
.reset_index(name="opioid_overdose_deaths")
.sort_values("opioid_overdose_deaths", ascending=False)
)
# 2022 state population denominators (Census estimates, in thousands -- illustrative subset)
POP = {
"06": 39029, "48": 30030, "12": 22610, "36": 19678, "17": 12813,
"42": 12972, "39": 11799, "13": 10912, "37": 10699, "26": 10034,
"34": 9261, "23": 1383, "46": 909, "38": 779, "50": 647,
}
state_counts["population_thousands"] = state_counts["state_fips"].map(POP)
state_counts = state_counts.dropna(subset=["population_thousands"])
state_counts["rate_per_100k"] = (
state_counts["opioid_overdose_deaths"]
/ (state_counts["population_thousands"] * 1000)
* 100000
)
state_counts["rate_per_100k"] = state_counts["rate_per_100k"].round(1)
print(state_counts[["state_fips", "opioid_overdose_deaths", "rate_per_100k"]]
.head(15).to_string(index=False))
Several implementation details are worth noting. The NCHS fixed-width format is documented in the “User's Guide to the 2022 Mortality Public Use Data File” available on the NCHS website; column positions change slightly between years, so always verify against the year-specific documentation. The T-codes appear in the record-axis conditions as five-character strings (e.g., “T4040” for T40.4, representing the underlying substance in a fentanyl overdose); checking only the first four characters handles the decimal-position variation. Population denominators for rate calculation are not in the mortality file itself — they must be joined from Census Bureau intercensal or Vintage estimates, which NCHS packages alongside the mortality data for convenience.
This framework extends to any cause-of-death subgroup: swap the drug poisoning external cause filters for suicide codes (X60–X84) or cardiovascular codes (I00–I99), adjust the multiple-cause filters, and the same pipeline produces state or county death counts for any ICD-10 cause group of interest.
Connecting WONDER to Other Federal Health Datasets
WONDER mortality data becomes more analytically powerful when paired with federal datasets that measure the upstream conditions driving the deaths it records.
- CMS Medicare Part D prescribing data — opioid prescribing rates by physician and county are available from CMS and map directly onto the first wave of opioid overdose mortality visible in WONDER. High prescribing counties in 2006–2012 predict high overdose mortality in the following five years with substantial reliability. See Medicare Part D: Opioid Prescribing Patterns.
- CDC BRFSS behavioral risk surveillance — the Behavioral Risk Factor Surveillance System measures self-reported health behaviors, chronic conditions, and healthcare access at the state and sometimes sub-state level. Smoking rates, obesity prevalence, and physical inactivity — all measured in BRFSS — are upstream predictors of the cardiovascular and respiratory mortality patterns visible in WONDER. See CDC BRFSS: Behavioral Risk Factor Surveillance.
- CMS Hospital Quality data — hospital-level outcome measures including 30-day mortality and readmission rates for heart attack, heart failure, and pneumonia. Geographic overlap between high WONDER cardiovascular mortality and low CMS hospital quality scores identifies areas where healthcare delivery quality may be compounding population health risk. See CMS Hospital Quality: Outcome Measures and Star Ratings.
Limitations and Analytical Cautions
The twelve-month lag to final data is the primary operational limitation. Provisional WONDER data is available much sooner but undercounts by 5 to 10% due to incomplete certificate processing. For trend analysis over completed years, this is not an issue; for surveillance of emerging causes — a new synthetic opioid, a novel infectious disease — provisional data requires careful framing.
Race and ethnicity data quality in WONDER improved substantially in 2003 when NCHS adopted a revised race standard, and again in subsequent years as Hispanic origin reporting improved. Analyses of racial mortality disparities that span the pre- and post-2003 period require caution because the apparent racial composition of the death file changed with the coding standard, not just with underlying population health.
Urban–rural comparisons using WONDER are complicated by the suppression of county-level data for small rural counties with fewer than ten deaths in the relevant cause-of-death category. Rural overdose mortality is likely underestimated in published WONDER-derived statistics because the most severely affected small counties are systematically suppressed. The CDC publishes a six-category urban–rural classification scheme (the NCHS Urban–Rural Classification Scheme for Counties) that can be merged with WONDER data to partially address this, though it cannot recover suppressed cells.
The underlying cause framework, by design, attributes each death to a single code. This works well for deaths from a single clear disease but creates ambiguity for complex multimorbid deaths. A person who dies of pneumonia after a stroke after years of type 2 diabetes may have any of those conditions as the underlying cause depending on how the certifier fills out the form and how NCHS applies selection rules. Analyses of conditions that frequently appear as contributing causes — diabetes, obesity, sepsis — should use the multiple cause fields from the MCOD file rather than restricting to underlying cause alone.
The opioid overdose deaths visible in WONDER correlate tightly with physician opioid prescribing rates in the first wave. CMS Medicare Part D data provides county-level prescribing detail that maps directly onto the WONDER mortality geography. See Medicare Part D: Opioid Prescribing Patterns and Drug Spending.
Mortality data captures the outcome; behavioral risk factors capture the upstream conditions. The CDC Behavioral Risk Factor Surveillance System measures smoking, obesity, physical inactivity, and healthcare access at the state level and connects to the chronic disease mortality patterns in WONDER. See CDC BRFSS: Behavioral Risk Factor Surveillance System.
Hospital-level outcome measures from CMS — 30-day mortality for heart attack, heart failure, and pneumonia — provide a healthcare quality lens on the cardiovascular and respiratory mortality geography visible in CDC WONDER. See CMS Hospital Quality: Outcome Measures and Star Ratings.