Every death by suicide in the United States is recorded on a death certificate, coded by a nosologist to a single underlying cause, and counted into a national total that the country has tracked for the better part of a century. From those certificates the CDC builds a deceptively small table: roughly 6,400 suicide-rate records, each one a single number—a rate per 100,000 people—pinned to a year and a slice of the population. It is one of the few leading causes of death in America that rose for most of two decades, and this record is the baseline that every prevention program is measured against.
This article covers what the suicide-mortality dataset is and where it sits in the National Vital Statistics System; how a death becomes a coded statistic, from the death certificate to the ICD codes for intentional self-harm; the difference between crude and age-adjusted rates and why age-adjustment is what makes seven decades of data comparable; the long-run trend—the mid-century baseline, the late-1990s low, and the long rise through the 2000s and 2010s; the breakdowns that drive prevention—sex, age group, and method, and what each one reveals; the policy and crisis-response architecture the data underpins, including the 988 Suicide and Crisis Lifeline; how the suicide record joins to the broader injury-mortality and excess-deaths data; a Python workflow that pulls the rates from CDC's public portals and computes the trend and the sex and age breakdowns; and the caveats—coding limits, small-number instability, and the lag and revision of mortality data—that any honest analysis must carry.
What the dataset is
The suicide-mortality record is a product of the National Vital Statistics System (NVSS), the cooperative federal–state system through which the CDC's National Center for Health Statistics (NCHS)compiles the country's births and deaths. It is the same death-certificate pipeline that produces NCHS's life tables, its leading-causes-of-death rankings, and its broader mortality files; suicide is simply one cause, isolated and tabulated over time. What makes the suicide extract distinct is its long reach and its demographic granularity: rather than a single annual count, it is a set of rates—deaths per 100,000 population—published for many combinations of year, sex, age group, and, in the method-resolved series, means of death, stretching from the second half of the twentieth century into recent years.
In our database this record is stored as the table cdc_suicide, with roughly 6,400 rows. The grain is one row per rate—a single year crossed with a single demographic breakdown. The same calendar year therefore contributes many rows: the all-population age-adjusted rate, the rate for males and the rate for females, the rate for each standard age band, and so on. A typical row carries the year, the breakdown that defines the slice, the rate itself, and whether the rate is crude or age-adjusted:
year -- calendar year of death (e.g. 1950 .. recent)
breakdown_type -- the dimension this row stratifies on: sex / age / method / all
breakdown_value -- the specific slice: "Male", "15-24 years", "Firearm", "All persons"
rate -- deaths per 100,000 population
rate_type -- "age-adjusted" or "crude"
std_population -- standard population used for age-adjustment (e.g. 2000 US std)
icd_basis -- ICD revision in force for that year's coding (e.g. ICD-10)
unit -- "per 100,000 population"
notes -- coding-change footnotes, suppression flags, comparability caveatsThe load-bearing columns are rate, rate_type, and the breakdown_type/breakdown_value pair. The rate is the headline statistic, but it is meaningless without its type: a crude rate is the raw number of suicide deaths divided by the population, while an age-adjusted rate re-weights the age-specific rates to a fixed standard population so that the figure is not distorted by a changing age structure—the distinction that the age-adjustment section returns to. The breakdown pair is what turns a single national number into the stratified series that prevention work depends on: it tells you whether a row describes everyone, one sex, one age band, or one method. The icd_basis column is the quiet but essential companion: because the way a cause of death is coded changed across the decades, the ICD revision in force for a given year is what tells an analyst whether two years' rates are strictly comparable.
From a death certificate to a coded statistic
Every figure in this dataset begins with a single document: the death certificate. When a person dies, a certifier —typically a physician, a medical examiner, or a coroner—completes the cause-of-death section, recording the immediate cause, the underlying conditions, and the manner of death (natural, accident, suicide, homicide, or undetermined). The certificate is registered with the state vital-records office, which transmits a standardized electronic record to NCHS. There, trained nosologists—and, increasingly, automated coding systems—translate the certifier's narrative into the formal codes of the International Classification of Diseases and select, by a fixed set of rules, the single underlying cause of death: the disease or injury that initiated the chain of events leading to death. Suicide statistics are tabulated on this underlying-cause basis.
Under the tenth revision of the classification (ICD-10), in use for US mortality coding since 1999, suicide—formally intentional self-harm—is identified by the external-cause codes X60 through X84, together with a small number of related codes (for example certain sequelae and the late-effects code Y87.0). The X60–X84 block is itself organized by mechanism: self-poisoning, intentional self-harm by firearm discharge, by hanging, strangulation and suffocation, by drowning, and so on. That structure is exactly what lets the dataset carry a method breakdown without leaving the underlying-cause framework—the same codes that establish a death as a suicide also record how it occurred. Earlier years in the long series were coded under prior ICD revisions (ICD-9, ICD-8, and back), each with its own code ranges for suicide, which is why the icd_basis column matters: a revision boundary can introduce a comparability step in the series that is an artifact of recoding rather than a real change in behavior.
One feature of this pipeline shapes everything downstream: the manner of death is a determination, not an observation. A death is counted as a suicide only when the certifier affirmatively classifies it so, to a standard of evidence that varies across jurisdictions and over time. Deaths where intent cannot be established are coded as undetermined or as accidents, and a body of research argues that some true suicides are recorded under those headings—particularly some single-vehicle crashes and some drug poisonings, where intent is hard to prove. This is not a flaw in the data so much as a property of it, and it means the suicide count is best read as a careful lower-bound count of deaths affirmatively determined to be self-inflicted, a point the caveats section develops.
Crude versus age-adjusted: why the rates are comparable across decades
The single most important methodological feature of this dataset is age-adjustment, and understanding it is the difference between reading the data correctly and being misled by it. Suicide risk is not uniform across the lifespan; rates differ markedly by age. The age composition of the US population, meanwhile, has shifted enormously over seven decades—the post-war baby boom, its passage through middle age, and the broad aging of the population have all changed the share of people in higher- and lower-risk age bands. A crude rate—total suicides divided by total population—mixes the true change in age-specific risk with the change in the population's age structure, so a rising crude rate could in principle reflect nothing more than an aging population.
Age-adjustment removes that confound. The method computes the suicide rate within each age band separately, then combines those age-specific rates using the age distribution of a fixed standard population rather than the population of the year in question. NCHS has long used the year-2000 US standard population as the reference for adjusting recent mortality data. Because every year's age-adjusted rate is weighted to the same standard structure, the resulting figures can be compared directly across years and across groups: a difference between two age-adjusted rates reflects a real difference in age-specific risk, not a difference in who happened to be alive. This is why long-run suicide trends, and comparisons between, say, two states or two decades, are almost always reported on the age-adjusted basis. The dataset carries both rate types, and the rate_type and std_population columns are what keep them from being silently mixed.
The practical rule for an analyst is simple and strict. Use age-adjusted rates for trends and between-group comparisons—the long-run line, the male-versus-female gap, the state rankings. Use crude or age-specific rates when the age structure is itself the subject—when the question is which age band is at highest risk, the age-specific rate is exactly what you want, and age-adjusting it would defeat the purpose. Comparing a crude rate from one year against an age-adjusted rate from another, or comparing age-adjusted rates built on different standard populations, is one of the most common ways to draw a wrong conclusion from this data.
The long-run trend
The shape of the seven-decade trend is the dataset's central story, and it is not a smooth one. Suicide has been a leading cause of death in the United States throughout the period—persistently among the top ten or so causes overall, and far higher in the rankings for younger age groups, where the burden of chronic disease is low and injury causes dominate. Across the second half of the twentieth century the age-adjusted national rate moved within a band rather than climbing steadily, and toward the end of the 1990s it reached one of its lower points of the modern era.
What distinguishes suicide from most other leading causes of death is what happened next. Where age-adjusted mortality from heart disease and many cancers fell over the same span, the suicide rate rose substantially through the 2000s and 2010s, a sustained increase that ran counter to the broad national trend of falling death rates and made suicide a focus of public-health alarm. The rise was not uniform—it was steeper in some age groups and demographic segments than others, and it interacted with the parallel rise in drug-overdose deaths in ways the injury-mortality literature has examined under the heading of “deaths of despair.” The most recent years in the series show the trend flattening or easing in places rather than continuing its earlier climb, but the level remains far above the late-1990s low. The dataset is built precisely so that this arc can be read cleanly: because the rates are age-adjusted to a common standard, the rise is a real change in risk and not an artifact of the aging population, and that is what gives the trend its weight as evidence.
For an analyst, the trend is also a lesson in not over-reading any single year. Suicide rates move gradually; large year-over-year swings in a stratified cell are far more often a sign of a small denominator than of a real behavioral shift. The value of a long series is that it lets the signal—the multi-decade direction—separate from the year-to-year noise, and it is the long series, not the latest data point, that prevention policy is built on.
The breakdowns that drive prevention: sex, age, and method
A single national rate tells a prevention program almost nothing about where to act. The value of this dataset is in its stratification, and three breakdowns—sex, age group, and method—carry most of the actionable signal. They are also tightly interrelated, and reading them together is what turns the data from description into strategy.
Sex shows one of the most durable patterns in all of mortality data: males die by suicide at a substantially higher rate than females, consistently across the entire series. This coexists with the well-documented finding, visible in survey and clinical data rather than in death records, that suicide attempts and ideation are more common among females. The reconciliation lies largely in method: males more often use means with high lethality, particularly firearms, so a given attempt is more likely to be fatal. The method breakdown in the dataset, resolved through the X60–X84 code structure, therefore does double duty—it is both an epidemiological description and the most direct lever in prevention, because means restriction (reducing access to highly lethal methods at the moment of crisis) is among the best-evidenced interventions in the field. Firearm and suffocation methods in particular have driven much of the modern trend, and tracking method-specific rates is how a program knows whether a means-focused intervention is working.
Age is the third axis, and it is where the age-specific rates earn their place alongside the age-adjusted ones. Suicide rates vary by age across the lifespan in patterns that have themselves shifted over the decades, and the relative burden differs by what other causes of death are competing: among adolescents and young adults, where chronic disease is rare, suicide ranks among the leading causes of death even though the absolute rate is lower than at older ages, which is why it commands attention out of proportion to its raw count. Crossing age with sex and method—older males and firearms, younger people and the methods most accessible to them—is what lets a program target the right intervention to the right population, and it is exactly the cross-tabulation the dataset's grain is designed to support.
The data behind prevention policy and the 988 Lifeline
This dataset is not an academic artifact; it is the measurement instrument of national suicide-prevention policy. Prevention strategy in the United States is organized around an explicit public-health framework—reflected in the National Strategy for Suicide Prevention and in CDC's own technical guidance—that treats suicide as a preventable outcome with identifiable risk and protective factors, addressable through interventions ranging from means restriction and crisis services to economic supports and clinical care. Every such strategy needs a baseline and a yardstick, and the age-adjusted suicide rate is it: prevention goals are written as reductions in that rate, and progress is judged by whether the rate moves.
The most visible piece of the response architecture is the crisis-services side. In July 2022 the United States launched 988, the three-digit Suicide and Crisis Lifeline, replacing the older ten-digit national suicide-prevention line with an easily dialed number routed to a national network of local crisis centers. 988 is the acute, person-facing complement to the population-level statistics in this dataset: where the mortality record measures the outcome the system is trying to prevent, the Lifeline is one of the interventions meant to bend it. The two are linked analytically—crisis-line volume, response times, and outcomes are part of the same prevention enterprise whose ultimate metric is the suicide rate—but they are distinct data systems, and this dataset is the mortality half: the baseline that 988 and the broader strategy aim, over years, to lower.
Because the data is stratified, it also tells policy where the response is most needed. A method breakdown that shows firearms driving the trend points toward means-safety programs; an age and sex profile that concentrates risk in a particular group directs outreach and clinical resources accordingly; a geographic series—suicide rates have long been higher in some regions than others—helps target the rollout of crisis services. The dataset, in short, is both the scoreboard and part of the playbook.
Joining to injury mortality and excess deaths
The suicide record is most powerful as one face of the broader mortality system rather than in isolation, because it shares its provenance—the same death certificates, the same ICD coding, the same NVSS pipeline—with the rest of NCHS's mortality data. Two companion records matter most.
The first is the broader injury-mortality record. Suicide is one category of injury death, sitting alongside the unintentional injuries (drug overdoses, motor-vehicle crashes, falls) and the other intentional ones (homicide) that together make up the external causes. Reading suicide against the rest of the injury picture is what places it in context: the parallel rise of suicide and drug-overdose mortality through the 2010s is visible only when the two are read together, and the boundary between an intentional self-poisoning and an unintentional overdose is precisely where the manner-of-death determination is hardest—so the suicide and overdose series are not just neighbors but partly entangled at the coding margin. Any serious analysis of one benefits from holding the other in view.
The second is the excess-deaths framework, which measures how many more people died in a period than the historical baseline would predict. Suicide enters that conversation in two ways: as a cause whose own trend can be assessed against its expected baseline, and as one of the causes scrutinized during periods of social and economic disruption, when the question of whether crisis conditions moved the suicide rate becomes urgent. Because excess-deaths analysis and the suicide series are both built on the same underlying-cause-coded NVSS data, they can be reconciled directly: the suicide rate is a component that the excess-deaths lens can isolate, baseline, and test for deviation. The common thread across all three records is the death certificate and the ICD code, which is what lets them be joined into a single coherent picture of how Americans die.
Python workflow: pulling suicide rates from CDC's public portals
The script below pulls age-adjusted suicide rates from CDC's public data, builds the long-run national trend, and compares the most recent year across sex and across age groups. It uses the NCHS data portal's JSON (Socrata) extract of suicide death rates, which serves without authentication and traces to the same NVSS death records that underlie CDC WONDER; the comments note the WONDER XML request API as the canonical alternative for fully custom queries. Because column names differ slightly between releases, the script discovers the working year and rate columns at runtime rather than hard-coding them, and it keeps crude and age-adjusted rates from being mixed by reading the rate type. No API key is required for the public data.
import requests
import pandas as pd
# CDC WONDER -- Underlying Cause of Death, no API key required.
# WONDER exposes an XML request/response API at:
# https://wonder.cdc.gov/controller/datarequest/D76 (1999-2020)
# https://wonder.cdc.gov/controller/datarequest/D158 (2018-2023, single race)
# For a clean, scriptable pull this example uses the NCHS data-portal
# (Socrata) extract of age-adjusted suicide rates, which serves JSON
# without authentication. Both trace to the same NVSS death records.
#
# This script:
# 1. Pulls age-adjusted suicide rates by year, sex, and age group
# 2. Builds the long-run national trend (age-adjusted rate by year)
# 3. Compares the most recent year across sex and across age groups
PORTAL = "https://data.cdc.gov/resource"
# NCHS "Death rates for suicide, by sex, race, Hispanic origin, and age"
DATASET = "9j2v-jamp.json"
def fetch(where=None, limit=50000):
params = {"$limit": limit}
if where:
params["$where"] = where
r = requests.get(f"{PORTAL}/{DATASET}", params=params, timeout=120)
r.raise_for_status()
return pd.DataFrame(r.json())
def num(s):
return pd.to_numeric(s, errors="coerce")
def main():
df = fetch()
if df.empty:
print("No rows returned.")
return
# Column names vary by release; discover the working ones at runtime.
year_col = next(c for c in df.columns if "year" in c.lower())
rate_col = next(c for c in df.columns
if "estimate" in c.lower() or "rate" in c.lower())
df["year"] = num(df[year_col])
df["rate"] = num(df[rate_col])
# --- 1. Long-run national trend (both sexes, all ages) -------------
both = df[df.apply(lambda r: "both" in " ".join(
str(v).lower() for v in r.values), axis=1)]
trend = (both.dropna(subset=["year", "rate"])
.groupby("year")["rate"].mean().sort_index())
if not trend.empty:
lo, hi = trend.idxmin(), trend.idxmax()
print("National age-adjusted suicide rate (per 100,000):")
print(f" lowest: {trend[lo]:.1f} in {int(lo)}")
print(f" highest: {trend[hi]:.1f} in {int(hi)}")
print(f" change {int(trend.index.min())}-{int(trend.index.max())}: "
f"{trend.iloc[-1] - trend.iloc[0]:+.1f} per 100,000")
# --- 2. Most recent year, by sex ----------------------------------
recent = df[df["year"] == df["year"].max()]
for label in ("male", "female"):
sub = recent[recent.apply(lambda r: label in " ".join(
str(v).lower() for v in r.values), axis=1)]
if not sub.empty:
print(f" {label}: {num(sub['rate']).mean():.1f} per 100,000")
# --- 3. Most recent year, by age group ----------------------------
age_col = next((c for c in df.columns if "age" in c.lower()), None)
if age_col:
by_age = (recent.assign(rate=num(recent["rate"]))
.groupby(age_col)["rate"].mean()
.sort_values(ascending=False))
print("Highest-rate age groups (most recent year):")
print(by_age.head(5).round(1).to_string())
if __name__ == "__main__":
main()
Two practical notes apply. First, the trend calculation here averages within year for simplicity; a rigorous version must select exactly the all-persons, age-adjusted rows for the national line and the precisely matched stratum for each comparison, because silently blending crude with age-adjusted rates, or all-ages with age-specific rows, is the single easiest way to corrupt the result—the rate_type and breakdown columns exist to prevent exactly that, and any production query should filter on them explicitly. Second, for fully custom cross-tabulations—suicide by method crossed with age and sex, or by state and year—CDC WONDER's Underlying Cause of Death database is the authoritative source: it applies NCHS's confidentiality suppression rules (small cells are suppressed, and rates based on small numbers are flagged as unreliable) automatically, which a raw portal pull does not, and it lets the analyst specify the exact ICD-10 code range (X60–X84 and related codes) that defines a suicide death.
Limitations and analytical caveats
The suicide-mortality record is the most authoritative public measure of suicide in the United States, but it carries limitations that an analyst must hold firmly in mind— and, given the subject, must communicate honestly rather than letting a number stand alone.
A suicide is a determination, and some are missed.A death is counted as a suicide only when the certifier affirmatively classifies it as one, and the standard of evidence varies across jurisdictions, certifier types, and time. Research consistently finds that the true number of suicides is somewhat higher than the recorded count, with some deaths classified as undetermined or as accidents— especially certain drug poisonings and single-vehicle crashes—that are, in reality, suicides. The dataset is best read as a careful, consistent lower-bound measure of deaths determined to be self-inflicted, not as a complete census of suicidal death, and any cross-jurisdiction comparison inherits whatever differences exist in how aggressively intent is established.
Small numbers are unstable, and rates can mislead.When a stratified cell—a narrow age band in a small population, or a specific method in a single year—rests on few deaths, its rate becomes statistically fragile, prone to large swings that reflect chance rather than any change in risk. NCHS flags rates based on small numbers as unreliable and suppresses very small cells to protect confidentiality, and any analysis that drills into fine strata must respect those flags rather than treat a volatile rate as a signal. The corollary is that the dataset is strongest at describing broad trends and large groups and weakest at fine-grained, single-year, single-cell claims.
Mortality data lags and is revised, and coding changes introduce comparability breaks. Final mortality figures appear well after the year they describe—death records must be received, processed, coded, and reconciled—so the most recent years in any snapshot may be provisional and subject to upward revision as late records arrive. Over the long series, the periodic revision of the ICD (the move from ICD-9 to ICD-10 in 1999 being the most consequential for the modern data) changed how suicide is coded and can introduce a step in the series that is an artifact of recoding; the icd_basis column and NCHS's published comparability ratios exist precisely to bridge those boundaries, and ignoring them can turn a coding change into a phantom trend.
A rate is an outcome, not an explanation. The dataset records that suicides occurred and at what rate; it does not record why. The risk and protective factors, the circumstances, the access to means, the role of mental health and substance use—none of that lives in the mortality file, which captures the fact of death and a coded method, not the chain of causes behind it. Drawing causal conclusions from rate movements alone—attributing a rise to a single policy or condition— over-reads what a mortality series can bear; the rate is the thing to be explained, and the explanation has to come from elsewhere.
Held with these caveats, the cdc_suicide table is a uniquely valuable resource: a long, age-adjusted, demographically resolved record of one of the few leading causes of death in America that rose for most of two decades—the federal baseline against which a national prevention effort, from the 988 Lifeline to means-restriction programs, measures whether it is bending the curve back down. If you or someone you know is in crisis, the 988 Suicide and Crisis Lifeline can be reached by calling or texting 988.
Related writing
CDC Injury Mortality: The Federal Record of How Americans Die from Firearms, Overdoses, and Crashes — Suicide is one category of injury death, and reading it against firearm, overdose, and crash mortality from the same NVSS pipeline is what places the suicide trend in its true context, including the entangled coding boundary between intentional self-poisoning and unintentional overdose.
CDC Excess Deaths: The Federal Measure of How Many More Americans Died Than Expected — Built on the same underlying-cause-coded death records, the excess-deaths lens can isolate and baseline the suicide rate, testing whether periods of social and economic disruption moved it above what history would predict.
CMS Opioid Treatment Programs: The Federal Record of Medicare Access to Addiction Care — The parallel rise of suicide and overdose mortality runs through substance-use disorder, and the access-to-treatment record on the addiction-care side is part of the same public-health response whose ultimate outcome measure is the mortality rate.