Technical writing
FMCSA Crash Data: The Federal Database Behind Large Truck and Bus Crashes
The FMCSA crash file records every state-reported crash involving a federally regulated commercial truck or bus — large trucks over 10,000 pounds and motorcoaches — and it is the database that turns roughly five thousand large-truck fatalities a year into a structured, carrier-keyed record that feeds federal safety scoring, litigation, and a decade of policy debate over how unsafe America's heaviest vehicles really are. We hold a snapshot of this file as the tablefmcsa_crashes, covering 258,057 crashes.
What the FMCSA crash file is
The Federal Motor Carrier Safety Administration (FMCSA) is the modal safety agency within the US Department of Transportation responsible for reducing crashes, injuries, and fatalities involving large trucks and buses. Its crash file is drawn from the Motor Carrier Management Information System (MCMIS), the master safety-data repository that aggregates inspections, compliance reviews, enforcement cases, and crashes for every entity that holds a USDOT number. The crash component of MCMIS is what we mirror as fmcsa_crashes: a record of crashes, reported by the states, that involved a commercial motor vehicle within FMCSA's jurisdiction.
FMCSA does not collect crashes itself. The data originates with state and local police who respond to a crash, complete a state crash report, and — when the crash meets the federal reportability criteria and involves a qualifying commercial vehicle — transmit it to FMCSA through a state safety agency. This is a federal-state partnership: the states own the police reports and the reporting pipeline, and FMCSA defines the criteria, ingests the records, and stores them in MCMIS. The consequence, explored at length below, is that the completeness and timeliness of the federal crash file are only as good as each state's reporting, and they vary widely.
The crucial scoping concept is reportability. Not every collision involving a truck enters the file. A crash is reportable to FMCSA only if it involves a qualifying commercial motor vehicle — a truck with a gross vehicle weight rating over 10,000 pounds, or a vehicle displaying a hazardous-materials placard, or a bus designed to carry nine or more people including the driver — and it results in at least one of three outcomes:
- a fatality — any person killed;
- an injury to any person who, as a result of the crash, is transported for immediate medical treatment away from the scene; or
- a tow-away — a vehicle disabled badly enough by the crash that it must be towed from the scene.
If a crash produces none of those three outcomes — a minor fender-bender with no injuries, no fatality, and both vehicles drivable away — it is not reportable, and it does not appear in the file regardless of how dramatic it looked. This threshold is the single most important fact about the dataset's scope: fmcsa_crashes is a census of consequential commercial-vehicle crashes, not of all commercial-vehicle crashes, and the reportability rule is what draws that line.
The schema concept
Each row in the crash file is one reportable crash, and the columns describe the crash along a small number of analytically important axes. The exact field names differ between the various FMCSA download formats, but the conceptual schema is stable, and it is worth walking through because the column set is what determines which questions the file can and cannot answer.
Identity and provenance. Every crash carries a report number (a state-assigned identifier for the underlying police report) and a report state — the state whose agency reported the crash, which is the single most important provenance field because reporting completeness is a state-level phenomenon. There is a unique crash sequence identifier within MCMIS so that a crash can be referenced and de-duplicated.
When and where. The crash date and time place the event in the calendar and clock, enabling analysis of seasonality, day-of-week, and the well-known time-of-day risk patterns associated with driver fatigue. Location fields — county, route or highway, and (in the fuller extracts) latitude and longitude — place the crash geographically, though location precision varies with the quality of the originating police report.
The carrier. The defining feature of the FMCSA crash file, and what distinguishes it from a generic crash database, is that each crash is tied to the responsible motor carrier by USDOT number. That single field is the join key to the motor carrier census — the table we hold asfmcsa_carriers — and it is what makes carrier-level analysis possible at all. Through the USDOT number a crash inherits the carrier's legal name, fleet size, operation type, commodity, and entire safety history. Without it, the crash file would be a pile of incidents; with it, the file becomes the outcome variable in a carrier-level safety model.
The vehicle. The record captures the vehicle configuration — whether the commercial unit was a straight truck, a truck-tractor pulling one or more trailers, a bus, or another configuration — along with the gross vehicle weight rating category that establishes that the vehicle met the reportability weight threshold. Configuration matters because the crash dynamics of a single straight truck differ sharply from those of a multi-trailer combination.
Severity and consequences. Three fields encode the outcome that made the crash reportable in the first place: the count of fatalities, the count of injuries, and a tow-away flag. These determine the severity tier — fatal crashes (one or more deaths), injury crashes (no death but at least one transported injury), and tow-away-only crashes (property damage severe enough to require a tow, with no death or transported injury). A separate hazmat release flag records whether hazardous materials were released as a result of the crash, the field that isolates the small but high-consequence population of hazmat-involved crashes.
Environmental conditions. The crash file records the light condition (daylight, dark, dawn or dusk), the weather (clear, rain, snow, fog), and the road-surface condition (dry, wet, icy) at the time of the crash. These environmental covariates support analysis of how much of large-truck crash risk is concentrated in adverse conditions versus ordinary clear-and-dry driving — a question with direct relevance to speed-limiter, following-distance, and automatic-emergency-braking policy.
What the schema conspicuously lacks is as important as what it contains. The file records that a crash happened and what its consequences were, but it does not assign fault, and — in the underlying MCMIS data — it does not record crash causation or preventability. That omission is foundational to how the data may legitimately be used, and it is the subject of the Crash Preventability Determination Program below and of the caveats at the end.
CSA, the Crash Indicator BASIC, and preventability
The crash file does not sit inert in a warehouse; it is an active input to FMCSA's flagship safety program, CSA — Compliance, Safety, Accountability — launched in 2010 to identify high-risk carriers from the continuous stream of roadside and crash data rather than waiting for an infrequent on-site review. CSA's analytical engine, the Safety Measurement System (SMS), scores carriers across seven behavior categories called BASICs (Behavior Analysis and Safety Improvement Categories). Six of the seven are built from roadside-inspection violations; the seventh is built directly from this crash file.
That seventh category is the Crash Indicator BASIC. It measures a carrier's history and pattern of crash involvement: the count of reportable crashes, weighted by severity (fatal and injury crashes weigh more heavily than tow-aways) and by recency, then normalized against the carrier's exposure — its number of power units and its vehicle-miles or inspection volume — and converted into a percentile rank relative to a peer group of carriers with a similar crash count. A carrier whose Crash Indicator percentile climbs above an intervention threshold becomes a target for FMCSA attention, escalating from warning letters to focused inspections to full investigations.
The Crash Indicator BASIC has always been treated differently from the other six. Historically it was not displayed publicly in the same way as the inspection BASICs, and the reason goes to the heart of the dataset: the raw MCMIS crash record does not say whether the truck driver was at fault. A carrier whose truck is struck from behind at a red light by a distracted passenger-car driver accumulates a reportable crash exactly as a carrier whose fatigued driver runs a stop sign does. To penalize a carrier's public safety score on the basis of crashes it could not have prevented struck many in the industry as fundamentally unfair, and it became one of the most contested features of the entire CSA program.
FMCSA's answer was the Crash Preventability Determination Program (CPDP). Under the CPDP, a carrier may submit a Request for Data Review and contest a specific crash, supplying police reports, photographs, video, and other documentation, and FMCSA adjudicates whether the crash was not preventable by the carrier. The program began as a demonstration in 2017, covering a defined set of crash types — for example, a commercial vehicle struck in the rear, struck while legally stopped or parked, struck by a vehicle that crossed the centerline, struck by a wrong-way driver, or involved in a crash caused by an impaired or suicidal individual — and was made permanent and expanded in 2020 to cover additional eligible crash categories. A crash found not preventable is flagged as such and removed from the carrier's Crash Indicator calculation, so that it no longer counts against the carrier's safety score. The CPDP is, in effect, a structured way of grafting a preventability judgment onto a dataset that was never designed to carry one — and it is the reason any serious carrier-level crash analysis must distinguish total reportable crashes from crashes that survive a preventability review.
Trends: large-truck crash fatalities
The headline number the crash file ultimately rolls up to is the annual count of people killed in crashes involving large trucks, and it has been the focus of national safety attention for two decades. Fatalities in crashes involving large trucks run on the order of roughly five thousand deaths a year in the United States — a figure that includes truck occupants, but is dominated by the occupants of the other vehicles and by pedestrians and cyclists, because in a collision between a loaded combination vehicle weighing up to 80,000 pounds and a passenger car, the physics are merciless and one-sided.
The trajectory is the part that has driven policy. Large-truck-involved fatalities fell during the 2008–2009 recession as freight volumes collapsed and trucks logged fewer miles, then entered a sustained post-2009 rise as the economy and freight activity recovered and grew. Over the following decade the count climbed substantially from its recession trough, reaching the highest levels in a generation. Because fatality counts track exposure — total truck vehicle-miles traveled — some of the rise simply reflects more trucks driving more miles. But the fatality rate per mile, after a long historical decline, stopped improving and in periods even worsened, which is what elevated truck safety from a steady-state concern to an active policy crisis and prompted renewed regulatory pushes on speed limiters, automatic emergency braking, and underride protection.
Several recurring mechanisms sit behind the aggregate. Underride — a smaller vehicle sliding beneath a truck's trailer in a rear or side impact, so that the passenger compartment strikes the trailer floor rather than its bumper — produces a grossly disproportionate share of fatalities relative to the number of crashes it represents, which is why rear underride guards are federally mandated and side underride guards have been the subject of sustained rulemaking pressure. Driver fatigue remains a central causal theme: the entire federal hours-of-service regime, and the electronic-logging-device mandate that enforces it, exists to limit how long a commercial driver may operate before resting, because a fatigued driver of an 80,000-pound vehicle is one of the most dangerous things on a highway. And the ordinary kinematic factors — speed and following distance — loom large precisely because a fully loaded truck needs far greater stopping distance than a car; a truck traveling too fast for conditions or following too closely cannot physically stop in time, which is the rationale behind speed-limiter proposals and forward-collision-warning and automatic-emergency-braking requirements.
Data quality: MCMIS, FARS, and the completeness debate
No honest treatment of the FMCSA crash file can skip its data-quality story, because the completeness of the file has been the subject of formal study and persistent criticism for as long as the file has existed. The central issue is that state reporting completeness varies. Because the crashes flow from the states, a crash is only in MCMIS if the responding officer correctly identified the vehicle as a reportable commercial motor vehicle, the state captured the necessary federal data elements, and the state transmitted the record to FMCSA in a timely and complete form. States differ markedly in how well they do each of those steps. Some report nearly every qualifying crash; others miss a substantial fraction, especially tow-away and non-fatal injury crashes, which are easier to overlook than fatalities.
FMCSA monitors this through a formal data-quality program and publishes data-quality maps (through the SAFER and A and I web systems) that grade each state on the completeness, timeliness, accuracy, and consistency of its crash and inspection reporting. Those grades are essential context for anyone ranking states or comparing carriers domiciled in different states: a state that reports diligently will, all else equal, show more crashes than a state that under-reports, and that difference is an artifact of reporting behavior, not of road safety. A naive ranking of states by reported crash count measures reporting volume, not danger.
The completeness problem is thrown into sharp relief by comparing MCMIS with the other major federal source of truck-crash data: FARS, the Fatality Analysis Reporting System, maintained by the National Highway Traffic Safety Administration (NHTSA). The two systems have fundamentally different scopes and are not interchangeable:
- FARS is a fatal-crash census. It records every crash on a US public road that results in a death within thirty days, across all vehicle types, with deep detail on each fatality. For large trucks it captures fatal crashes essentially completely, because deaths are hard to miss — but it captures only fatal crashes, nothing about injury or tow-away events.
- MCMIS records all reportable crashes — fatal, injury, and tow-away — but only for FMCSA-regulated commercial vehicles, and only as completely as the states report. So MCMIS is far broader in severity coverage (it sees the non-fatal crashes FARS never touches) but is acknowledged to be incomplete, whereas FARS is narrower in severity (fatal only) but close to complete within that narrow band.
The practical upshot is that for counting truck-crash fatalities, FARS is the authoritative source and FMCSA's own published trend tables lean on it; for analyzing the full severity mix, carrier-level patterns, and the non-fatal majority of crashes, MCMIS is the only option — with its completeness caveats firmly in mind. The long-running crash-data-completeness debate — documented in reports from the DOT Inspector General and the Government Accountability Office and in FMCSA's own data-quality reviews — is fundamentally about closing the gap between the crashes MCMIS captures and the crashes that actually occur, and it is the reason any absolute count drawn from this file should be read as a lower bound.
What you can do with the crash file
Despite the caveats — or really because of how analysts learn to work around them — the crash file supports a rich set of analyses, and the value multiplies once it is joined to the carrier census on USDOT number.
Carrier-level crash rates. The flagship analysis is computing a crash rate normalized by fleet size — crashes per power unit, or per 100 power units — for each carrier, by counting that carrier's crashes in the file and dividing by its reported power-unit count from the census. This is the workhorse metric for comparing carriers of different sizes: a large fleet will naturally have more crashes than an owner-operator, so only the rate is comparable. The same idea underlies the Crash Indicator BASIC's normalization, and it is exactly what the worked example below computes.
State reporting completeness. By ranking states on reported crash counts and reading those counts against FMCSA's published data-quality grades — or against an external benchmark such as FARS fatal counts for the same state — an analyst can characterize which states report well and which under-report, which is a prerequisite for any cross-state comparison.
Severity mix. Decomposing the file into fatal, injury, and tow-away crashes reveals the severity distribution of commercial-vehicle crashes and how it shifts across vehicle configurations, conditions, or carrier types — for instance, whether multi-trailer combinations skew toward more severe outcomes.
Hazmat-involved crashes. Filtering on the hazmat-release flag isolates the small, high-consequence population of crashes in which hazardous materials were released, a distinct safety and emergency-response concern that draws on FMCSA's joint jurisdiction with the Pipeline and Hazardous Materials Safety Administration.
Fleet-size normalization through the census join. The single most powerful move is joining the crash file to fmcsa_carriers on USDOT number, which attaches each crash to the responsible carrier's fleet size, operation type, commodity, domicile state, and safety history. That join converts the crash file from a list of incidents into the outcome layer of a carrier-level dataset, enabling questions like whether new entrants, owner-operators, hazmat carriers, or particular fleet-size bands crash at elevated rates — the census supplies the denominator and the covariates, and the crash file supplies the outcome.
A worked example in Python
There are two practical paths into FMCSA crash data. For live, per-carrier summaries, FMCSA's QCMobile API — the same documented REST service that serves the carrier census — returns a carrier's crash totals (with the fatal, injury, and tow-away breakdown) summarized over the prior twenty-four months, given a USDOT number and a free web key. For population-level work, FMCSA publishes the SMS download files at the SAFER data-downloads page: a bulk extract of every reportable crash inside the current SMS retention window, one row per crash, keyed to USDOT number and therefore directly joinable to a census snapshot.
The workflow below does both. It pulls one carrier's crash summary from the QCMobile API, then loads the bulk crash file and the carrier census, computes the fatal/injury/tow-away severity mix across the whole file, counts crashes per carrier, joins to the census on USDOT number to attach fleet size, and ranks the largest fleets by crashes per 100 power units — the fleet-size-normalized rate that makes carriers of different sizes comparable.
import requests, pandas as pd
# ---------------------------------------------------------------------------
# FMCSA Crash Data Analysis
#
# Two complementary data sources:
#
# 1. QCMobile API (live, per-carrier crash summaries)
# https://mobile.fmcsa.dot.gov/qc/services/carriers/<dot>?webKey=KEY
# Requires a free webKey from https://mobile.fmcsa.dot.gov/QCDevsite/
# The /carriers/<dot> response carries a crashTotal and the
# fatal/injury/tow counts summarized over the prior 24 months.
#
# 2. SAFER / SMS bulk extract (full-population crash file)
# https://ai.fmcsa.dot.gov/SMS/Tools/Downloads.aspx
# A pipe/CSV snapshot of every reportable crash in the SMS window,
# one row per crash, keyed to the carrier's USDOT number.
#
# The bulk crash file holds only crashes inside the rolling SMS retention
# window (about two years), so it is NOT a full historical census; for
# long-run trends use the published FMCSA "Large Truck and Bus Crash
# Facts" tables instead.
# ---------------------------------------------------------------------------
WEB_KEY = "YOUR_WEBKEY_HERE"
BASE = "https://mobile.fmcsa.dot.gov/qc/services"
def get_carrier(dot_number):
url = f"{BASE}/carriers/{dot_number}"
r = requests.get(url, params={"webKey": WEB_KEY}, timeout=20)
r.raise_for_status()
return r.json().get("content", {}).get("carrier", {})
# Single-carrier crash summary by USDOT number
c = get_carrier(76830) # example DOT number
print(f"Legal name: {c.get('legalName')}")
print(f"Power units: {c.get('totalPowerUnits')}")
print(f"Crashes (24m): {c.get('crashTotal')}")
print(f" fatal: {c.get('fatalCrash')}")
print(f" injury: {c.get('injCrash')}")
print(f" tow-away: {c.get('towawayCrash')}")
# ---------------------------------------------------------------------------
# Population-level analysis: join the bulk crash file to the carrier census
# and compute crashes per 100 power units for the largest carriers.
#
# crashes.csv : one row per reportable crash (column REPORT_SEQ_NO etc.),
# with DOT_NUMBER, REPORT_STATE, REPORT_DATE, FATALITIES,
# INJURIES, TOW_AWAY, HAZMAT_RELEASED.
# census.csv : the motor carrier census, one row per USDOT number,
# with DOT_NUMBER and NBR_POWER_UNIT.
# ---------------------------------------------------------------------------
crashes = pd.read_csv("crashes.csv", dtype=str, low_memory=False)
census = pd.read_csv("census.csv", dtype=str, low_memory=False)
crashes.columns = [col.strip().upper() for col in crashes.columns]
census.columns = [col.strip().upper() for col in census.columns]
# Normalize the crash severity flags
crashes["FATALITIES"] = pd.to_numeric(crashes["FATALITIES"], errors="coerce").fillna(0)
crashes["INJURIES"] = pd.to_numeric(crashes["INJURIES"], errors="coerce").fillna(0)
crashes["IS_FATAL"] = crashes["FATALITIES"] > 0
crashes["IS_INJURY"] = (~crashes["IS_FATAL"]) & (crashes["INJURIES"] > 0)
crashes["IS_TOW"] = (~crashes["IS_FATAL"]) & (~crashes["IS_INJURY"])
# Severity mix across the whole file (fatal / injury / tow-away)
n = len(crashes)
print(f"\nReportable crashes in window: {n:,}")
print(f" fatal: {crashes['IS_FATAL'].sum():>8,} "
f"({100 * crashes['IS_FATAL'].mean():4.1f}%)")
print(f" injury: {crashes['IS_INJURY'].sum():>8,} "
f"({100 * crashes['IS_INJURY'].mean():4.1f}%)")
print(f" tow-away: {crashes['IS_TOW'].sum():>8,} "
f"({100 * crashes['IS_TOW'].mean():4.1f}%)")
# Count crashes per carrier
per_carrier = (crashes.groupby("DOT_NUMBER")
.size()
.rename("crashes")
.reset_index())
# Attach fleet size from the census
census["PU"] = pd.to_numeric(census["NBR_POWER_UNIT"], errors="coerce")
fleet = census[["DOT_NUMBER", "PU"]]
merged = per_carrier.merge(fleet, on="DOT_NUMBER", how="left")
# Crashes per 100 power units, restricted to fleets large enough that the
# rate is not dominated by a single event (>= 100 power units here).
big = merged[merged["PU"] >= 100].copy()
big["RATE"] = 100 * big["crashes"] / big["PU"]
top = big.sort_values("RATE", ascending=False).head(15)
print("\nHighest crash rate among 100+ power-unit fleets:")
print(f"{'DOT':>9} {'PU':>7} {'crashes':>8} {'per 100 PU':>10}")
for _, row in top.iterrows():
print(f"{row['DOT_NUMBER']:>9} {int(row['PU']):>7,} "
f"{int(row['crashes']):>8,} {row['RATE']:>10.2f}")
# Rank states by raw reportable-crash count (a reporting-volume view,
# NOT a safety ranking -- see the data-quality caveats below).
by_state = (crashes["REPORT_STATE"].value_counts()
.head(10))
print("\nReportable crashes by reporting state (top 10):")
for st, cnt in by_state.items():
print(f" {st:<4} {cnt:>8,}")
A few implementation notes. The QCMobile API nests the carrier object undercontent.carrier in the JSON response, and the crash summary fields it returns (crashTotal and the fatal, injury, and tow-away counts) are already summarized over the rolling twenty-four-month window FMCSA uses for SMS, so they will not match a multi-year historical total. On the bulk side, reading every column as a string withdtype=str and only then coercing the numeric fatality and injury counts withpd.to_numeric(..., errors='coerce') is the robust pattern, because the crash files encode missing counts as blank strings that would otherwise force pandas into mixed-type columns. The severity tiers are derived rather than stored: a crash is treated as fatal if it has any fatalities, as an injury crash if it has injuries but no fatality, and as tow-away otherwise — a strict precedence so that each crash lands in exactly one tier. The crashes-per-100-power-units rate is restricted to fleets of at least 100 power units in the example, because for a one- or two-truck carrier a single crash produces a wild rate that tells you nothing; the right floor depends on the question, but normalizing tiny fleets without a minimum-exposure filter is a classic way to produce a meaningless leaderboard. And the state ranking is deliberately labeled a reporting-volume view, not a safety ranking, for the data-quality reasons spelled out above.
Caveats and limitations
The crash file is indispensable, but it carries several structural limitations that any serious analysis must account for.
State under-reporting. As stressed throughout, the file is only as complete as state reporting, and that completeness varies by state and skews toward missing non-fatal crashes. Absolute crash counts from MCMIS are lower bounds, cross-state comparisons are confounded by reporting differences, and FMCSA's own data-quality grades should be consulted before any state-level claim. For fatal counts specifically, prefer FARS.
No fault or causation field. The record says a crash occurred and what its consequences were, but the underlying MCMIS data assigns no fault and no cause. A carrier accumulates a reportable crash whether its driver caused the collision or was a stationary victim of someone else's. This is precisely why the Crash Preventability Determination Program exists, and why raw crash counts — including a carrier's Crash Indicator inputs — should never be read as a measure of fault. Any causal interpretation requires the police narrative or a preventability adjudication that the file itself does not contain.
Exposure is required to normalize. A raw crash count is almost never the right unit of comparison. A carrier with more trucks, or one that drives more miles, will have more crashes for reasons unrelated to safety. Meaningful comparison demands an exposure denominator — power units (available in the census), or better, vehicle-miles traveled, which the census does not contain and which generally must be sourced separately. Crashes per power unit, used in the worked example, is a serviceable proxy, but it is a proxy: it assumes utilization is similar across carriers, which it is not.
The two-year SMS retention window. The crash data that feeds the Safety Measurement System — and the bulk SMS download files — covers only a rolling window of roughly the most recent two years. The QCMobile crash totals reflect the same twenty-four-month horizon. This means the operational crash file is not a complete historical archive: a crash older than the window has aged out of the SMS files even though it happened. For long-run trend analysis spanning many years, the published FMCSA Large Truck and Bus Crash Facts tables (which draw on FARS for fatalities and on the full crash data) are the appropriate source rather than the rolling SMS extract.
Carrier attribution can be imperfect. The join key, the USDOT number, is recorded by the reporting officer or state from whatever identifying information was available at the scene, and it can be missing, wrong, or attributed to the wrong carrier — particularly for leased equipment, owner-operators working under another carrier's authority, or rental units. Crashes with a missing or erroneous USDOT number either drop out of carrier-level analysis or, worse, attach to the wrong carrier, so the census join is robust in aggregate but imperfect for any single carrier's exact count.
Related writing
FMCSA motor carrier census: the federal database behind 2 million registered trucking companies — the registration layer this crash file joins to on USDOT number, where each crash inherits a carrier's fleet size, operation type, and safety history; read together, the census is the denominator and the crash file is the outcome.
NHTSA vehicle safety complaints: the database behind auto defect investigations and recalls — the passenger-vehicle counterpart from FMCSA's sister highway-safety agency, the same NHTSA that runs FARS, where consumer complaints rather than reportable crashes drive defect investigations and recalls.
NTSB aviation accidents: the federal database behind every investigated air crash — the aviation analogue, an independent-board accident database built, unlike MCMIS, around explicit probable-cause determinations, illuminating by contrast just how deliberately the FMCSA crash file omits fault.