Several times a year, a federal inspector walks into every working mine in the United States—down the shaft of an underground coal mine, across the benches of a surface quarry, through the mill of a metal operation—and writes down, standard by standard, what is out of compliance. Each of those findings becomes a citation or an order, and the Mine Safety and Health Administration records it: roughly 3.07 million violations, one row per citation, each keyed to a mine ID and an operator, scored for how likely it was to hurt someone and how culpable the operator was, and tagged with the penalty the government proposed and the amount it ultimately collected. It is the most complete enforcement record of any industrial-safety regime in the country.
This article covers what the violations dataset is and how the 1977 Mine Act frames it; the mandatory-inspection regime that makes mining unique among American workplaces; the anatomy of a violation record—the 30 CFR standard cited, the gravity and negligence assessments, and the all-important Significant and Substantial flag; the enforcement escalation ladder, from an ordinary citation through unwarrantable-failure findings to withdrawal orders and the pattern-of-violations process that the Upper Big Branch disaster forced into the open; how penalties are proposed, specially assessed, contested, and settled before the Review Commission; how the violations table joins to the MSHA mines inventory and the accidents data through the mine ID; a Python workflow that pulls the MSHA open-data violations file and computes the Significant and Substantial rate by operator and a ranking of operators by assessed penalty; and the caveats—contested-versus-final amounts, operator-name churn, inspection-intensity confounds, and reporting lag—that every analyst must internalize before drawing conclusions.
What the dataset is
The MSHA violations dataset is the agency's record of enforcement actions written against mine operators and the independent contractors who work at mines. Every time an inspector observes a condition or practice that violates one of the mandatory safety and health standards, the inspector documents it as a citation or, for more serious situations, an order. The dataset is the violations record itself: one row per citation or order, comprising on the order of 3.07 million rows accumulated over decades of inspection, and within that total a large and consequential subset—on the order of 812,000 records—carry the highest-stakes designation, Significant and Substantial.
In our database this record is stored as the table msha_violations, with the grain of one row per citation or order: a single mine inspected repeatedly over many years contributes one row for each separate violation ever written against it. The columns capture who was cited, at which mine, under which standard, how dangerous and how culpable the condition was, and what it cost:
mine_id -- 7-digit MSHA mine identification number
operator_name -- the operator (or contractor) cited
contractor_id -- contractor ID where a contractor, not the operator, is cited
violation_no -- the citation or order number (unique with mine_id)
issue_dt -- date the citation or order was issued
section_of_act -- enforcement authority: 104(a) citation, 104(d) order, 107(a), etc.
cfr_standard -- the 30 CFR mandatory standard cited
sig_sub -- Significant and Substantial flag (Y / N)
negligence -- none / low / moderate / high / reckless disregard
likelihood -- gravity: how likely an injury/illness was
injury_illness -- expected severity if the event occurred
no_affected -- number of persons potentially affected
proposed_penalty -- the dollar penalty MSHA proposed
amount_assessed -- the penalty after contest / settlement (final)
viol_status -- open, closed, contested, delinquentThe mine_id is the load-bearing column. The MSHA mine identification number is a persistent seven-digit identifier assigned to every mine the agency regulates, and it never changes even when the mine is sold and the operator name on its citations changes—a property that turns out to matter enormously for accountability analysis. Together with the violation_no, the mine ID uniquely identifies each citation, and it is the key that ties a violation to the same mine's inventory record and to any accidents that occurred there. The section_of_act field records which enforcement authority the inspector invoked—the difference between an ordinary citation and a withdrawal order—and the cfr_standard field records the specific mandatory standard in Title 30 of the Code of Federal Regulations that was violated. But the analytic heart of the record is the cluster of assessment fields: the Significant and Substantial flag, the negligence finding, and the gravity assessment (likelihood, expected severity, and persons affected) together describe how dangerous the violation was and how blameworthy the operator was, and they are what drive the penalty and, for analysis, what separate a paperwork lapse from a condition that could kill.
What it is and the Mine Act regulatory frame
The governing statute is the Federal Mine Safety and Health Act of 1977, universally called the Mine Act. It consolidated and dramatically strengthened the federal mine-safety regime that had grown up in pieces over the preceding decades—most importantly the Coal Mine Health and Safety Act of 1969, passed after the Farmington, West Virginia explosion killed seventy-eight miners—and it created the Mine Safety and Health Administration within the Department of Labor as the single agency responsible for safety and health at all mines. Crucially, the 1977 Act extended comprehensive federal protection beyond coal to metal and nonmetal mines as well, bringing the entire universe of US mining—coal, metal, stone, sand and gravel, and other nonmetal operations—under one statute and one inspectorate.
What makes the Mine Act unusual, and what shapes the entire dataset, is the mandatory-inspection mandate. Unlike most American workplace-safety enforcement, which is largely complaint-driven and reaches only a small fraction of establishments, the Mine Act requires MSHA to inspect every mine on a fixed schedule, in full, regardless of whether anyone has complained. Underground mines must be inspected in their entirety four times a year; surface mines must be inspected twice a year. There is no analogue to this in the general-industry world, and it is the reason the violations record is so dense and so comprehensive: nearly every operating mine in the country is visited by a federal inspector multiple times a year, and each visit generates citations. The dataset is not a sample of problem mines that drew attention—it is, as close as any enforcement record gets, a census of the conditions found across the whole industry.
The standards the inspectors enforce live in Title 30 of the Code of Federal Regulations. 30 CFR contains the mandatory safety and health standards—hundreds of them, covering everything from roof control, ventilation, and methane and respirable-dust limits in underground coal, to guarding of machinery, electrical safety, ground control, and haulage-road design at surface operations, to the training requirements that apply everywhere. When a record cites a standard, it cites a specific section of 30 CFR, and the structure of the citation universe by standard is itself informative: a handful of standards—guarding, housekeeping, electrical, examinations—account for a large share of all citations, because they are the conditions an inspector most reliably encounters. The cfr_standard field is what lets an analyst move from raw citation counts to a substantive picture of what is going wrong, mine by mine and commodity by commodity.
Two structural features of the regime distinguish it from the kind of state-administered, primacy-delegated enforcement common in environmental law. First, mine-safety enforcement is federal end to end: it is a federal MSHA inspector, not a state official, who conducts the inspection, scores the gravity and negligence, and writes the citation, so the data is uniform in a way that aggregations of fifty state programs are not. Second, the enforcement is strict-liability and non-discretionary at the front end: if a violation exists, the inspector is required to cite it, and the operator's lack of knowledge is generally not a defense to the violation itself—it goes instead to the negligence finding and the size of the penalty. This is why the citation count is so high and why the gravity and negligence assessments, rather than the bare existence of a violation, carry the analytic signal.
Citations, orders, and the Significant and Substantial tier
Not all violations are equal, and the dataset encodes a graduated severity that is the key to using it well. At the base is the ordinary Section 104(a) citation: the inspector observes a violation, issues a citation, and sets a time by which the operator must abate it. Most of the 3.07 million records are 104(a) citations, and most of those describe conditions that, while genuine violations, were unlikely to cause serious harm—a missing guard rail, an unposted record, an accumulation of combustible material that had not yet reached a dangerous state.
Sitting above the ordinary citation is the single most important designation in the entire dataset: Significant and Substantial (S&S). An S&S violation is one that the inspector judges reasonably likely to result in a serious injury or illness. The legal test, refined through decades of Review Commission and court decisions, asks whether the violation contributes to a discrete safety hazard and whether, assuming continued normal mining operations, that hazard is reasonably likely to cause an injury of a reasonably serious nature. The S&S flag is therefore not a measure of whether harm occurred—it is a forward-looking judgment about danger—and it is the field that separates the citations that matter for worker safety from the large background of technical non-compliance. Of the roughly 3.07 million violations in the record, on the order of 812,000 carry the S&S flag, and the S&S rate—the share of a mine's or an operator's citations that are S&S—is one of the most widely used summary measures of how dangerous an operation's compliance posture is.
Beneath the S&S flag sits the structured gravity assessment that produces it. For each violation the inspector records the likelihood that the hazard would result in an event (no likelihood, unlikely, reasonably likely, highly likely, or that an injury or illness has already occurred), the expected severity if the event did occur (no lost workdays, lost workdays or restricted duty, permanently disabling, or fatal), and the number of persons potentially affected. Alongside gravity sits the negligence finding, which grades the operator's culpability on a scale from none through low, moderate, and high to reckless disregard. These structured fields are the machinery that converts an inspector's judgment into a penalty under MSHA's assessment formula, and for the analyst they are far richer than the binary S&S flag: they let you distinguish a high-negligence, highly-likely, potentially-fatal violation affecting a dozen miners from a low-negligence, unlikely, no-lost-workdays paperwork lapse, even though both appear in the record as a single citation.
Unwarrantable failure, withdrawal orders, and the escalation ladder
The Mine Act gives inspectors a graduated set of more severe enforcement tools for operators who do not respond, and the section_of_act field is where that escalation ladder becomes visible in the data. Understanding the ladder is what lets an analyst tell the difference between an operator with many ordinary citations and an operator the agency considers genuinely recalcitrant.
The pivotal concept is the unwarrantable-failure finding. When an inspector concludes that a violation resulted not merely from negligence but from the operator's aggravated conduct—more than ordinary negligence, a serious lack of reasonable care, sometimes described as conduct constituting indifference or a knowing disregard—the inspector issues the violation under Section 104(d) with an unwarrantable-failure designation. The first such finding is a 104(d)(1) citation; a subsequent unwarrantable failure within a defined window triggers a 104(d) order. A 104(d) order is a withdrawal order: it does not merely require abatement, it requires the operator to withdraw miners from the affected area until the condition is fixed. Separately, Section 107(a) gives inspectors the power to issue an imminent-danger withdrawal order whenever a condition could reasonably be expected to cause death or serious injury before it could be abated through ordinary means—the emergency stop button of the regime, pulling miners out of harm's way immediately, independent of any violation. The presence of 104(d) and 107(a) actions against a mine is a far stronger signal of dangerous operation than a high raw citation count, because each one represents a deliberate agency judgment that the operator either acted with aggravated culpability or created an immediate threat to life.
At the top of the ladder is the pattern of violations (POV) process, the most powerful and most contested enforcement tool in the Mine Act. The statute provides that a mine with a pattern of S&S violations can be placed on POV status, after which any S&S violation triggers a withdrawal order—not just a citation—and the mine remains under that heightened regime until it can demonstrate a clean inspection. POV is designed for exactly the operator that treats ordinary citations and penalties as a cost of doing business: it takes the question out of the penalty calculus and puts it into the operation of the mine itself, because under POV a single S&S finding can shut a section down. For decades, however, the process had a procedural step—a “potential pattern” warning and a cure period—that, combined with the ability to contest the underlying violations, meant it was almost never actually used. That dormancy is the thread that runs directly into the most important episode in the dataset's history.
Upper Big Branch and the reform of pattern-of-violations enforcement
On April 5, 2010, an explosion at the Upper Big Branch mine in Raleigh County, West Virginia, killed twenty-nine miners—the worst US coal-mining disaster in four decades. The investigations that followed found that the mine had a long and severe history of violations, including a high rate of S&S citations and repeated findings related to the very hazards—inadequate ventilation and the accumulation of explosive coal dust and methane—that the explosion involved. The disaster became the defining case study in why a mine's violation history matters: the warning signs were not hidden, they were sitting in MSHA's own enforcement record, and the question the disaster forced was why the agency's strongest tool, the pattern-of-violations process, had never been used against a mine whose record so plainly fit the statute's description.
Part of the answer was the contest system. Because operators could—and routinely did—contest S&S violations, and because contested violations were not counted toward a pattern while they remained under appeal, an operator could insulate itself from POV simply by appealing enough citations to keep its “final” S&S count below the trigger. The reforms that followed Upper Big Branch addressed this directly. MSHA revised the POV regulation to eliminate the toothless “potential pattern” warning step and the cure period, and—most consequentially for the data—to allow the agency to consider violations that had been cited but not yet finally adjudicated when identifying a pattern, rather than only those that had survived every appeal. The practical effect was to make POV a live tool again. For the analyst, Upper Big Branch is the permanent reminder of what the dataset is for: it is not an accounting artifact but an early-warning system, and the entire reform turned on the recognition that a mine's accumulated, S&S-weighted violation history is a leading indicator of catastrophe, provided the contest backlog is not allowed to erase it.
Penalties: proposed, specially assessed, contested, and settled
A violation in the dataset carries two distinct dollar figures, and the gap between them is one of the most important things to understand before using the penalty data. The proposed_penalty is the amount MSHA assesses when it issues the citation; the amount_assessed is the amount that survives after the operator's response, contest, and any settlement. They are frequently not the same number, and treating them interchangeably will systematically misstate what mine-safety enforcement actually costs operators.
Most penalties are set by a regular-assessment formula. The Mine Act and MSHA's penalty regulation in 30 CFR Part 100 direct the agency to weigh statutory factors—the operator's history of previous violations, the size of the operation, the operator's negligence, the gravity of the violation, the operator's good faith in achieving rapid compliance after notification, and the effect of the penalty on the operator's ability to continue in business—and the regulation converts the gravity and negligence findings on the citation into a point total that maps to a dollar amount. This is why the structured gravity and negligence fields are not merely descriptive: they mechanically determine the proposed penalty. For the most serious violations MSHA can bypass the formula and impose a special assessment, a discretionary, individually justified penalty reserved for violations of unusual gravity or for flagrant conduct—the statute provides for substantially elevated penalties for flagrant violations, those reflecting a reckless or repeated failure to abate a known serious hazard.
Once proposed, a penalty enters the contest system. An operator may contest the violation, the S&S designation, the gravity or negligence findings, or the penalty amount before the Federal Mine Safety and Health Review Commission (FMSHRC), an independent adjudicatory body separate from MSHA. The great majority of contested cases settle rather than going to a full hearing before an administrative law judge, and settlements commonly reduce the penalty, vacate the S&S designation, or lower the negligence finding. This is the source of the gap between the proposed and the assessed amount, and it is why a rigorous analysis of penalty severity must decide explicitly which figure it means—proposed penalties measure what MSHA initially sought, assessed penalties measure what operators ultimately owed—and must account for the fact that recent citations are disproportionately still in contest, so their assessed amounts have not yet settled. The viol_status field, which marks records as open, closed, contested, or delinquent, is the field that lets the analyst hold these distinctions straight.
Joining to the mines inventory and the accidents data
The violations table is most valuable not in isolation but as one facet of MSHA's integrated open-data ecosystem, and the mine_id is the universal join key that makes the integration possible. Two joins matter most.
The first is to the mines inventory. MSHA maintains a master table of every mine it regulates—its name, its operator and controller, the commodity it produces (coal versus metal/nonmetal, and the specific mineral), whether it is underground or surface, its state and county, its current operating status (active, abandoned, temporarily idled), and, critically, measures of its size such as employment and the hours worked. Joining msha_violations to the inventory by mine ID is what turns a raw citation count into something interpretable. The single most important reason is normalization: a large mine that works hundreds of thousands of miner-hours a year and is inspected four times will naturally accumulate more citations than a small intermittent quarry, so comparing operators on raw counts mostly measures size and inspection exposure. Normalizing citations—and especially S&S citations—by hours worked or by inspection-hours, both of which come from the inventory and the related inspection data, is what produces a defensible violations-per-hour rate that can fairly be compared across mines. The inventory join is also what supplies the commodity and mine-type breakdowns that any honest analysis requires, since the hazards and the relevant standards differ sharply between underground coal and a surface sand-and-gravel pit.
The second join is to the accidents, injuries, and fatalities data, also keyed by mine ID, in which MSHA records the accidents, injuries, illnesses, and deaths reported at each mine. This is the join that lets an analyst test the dataset's central hypothesis: whether a mine's violation history—its raw citation rate, its S&S rate, its 104(d) and POV history—predicts the accidents and fatalities that subsequently occur there. Ordering violations and accidents in time for each mine, the analyst can ask whether the mines that go on to suffer serious injuries or deaths had elevated S&S rates beforehand, which categories of violation are most predictive of which categories of harm, and how long the lead time is between a deteriorating compliance record and an event. This is precisely the analysis Upper Big Branch made unavoidable, and it is the reason the violations data exists in a form that joins cleanly to the harm it is meant to prevent.
Analytical uses
A national, mine-resolved, S&S-scored, penalty-bearing record of mine-safety enforcement supports a distinctive set of analyses that no other industrial-safety dataset can match.
Identifying repeat and chronic violators is the most direct use. Because the mine ID persists across ownership changes, an analyst can track a mine's compliance record over its whole life, and because the operator and controller fields name the responsible parties, the same analysis rolls up to the corporate level— surfacing operators whose mines, as a group, run elevated S&S rates or accumulate unwarrantable-failure and POV actions. This is the repeat-violator detection that the pattern-of-violations process exists to act on, performed on the open data.
Analyzing penalty patterns and the contest gap exploits the dual penalty fields. Comparing proposed to assessed amounts across operators, commodities, and time reveals how much of MSHA's initially-proposed enforcement actually survives the contest process, which operators contest most aggressively, and how the gap has shifted as the contest backlog and settlement practice have changed—a window into the gap between the enforcement the agency seeks and the enforcement it collects.
Linking violation history to disasters is the analytic payoff already described: joining to the accidents data to test whether S&S-weighted violation history is a leading indicator of serious injuries and fatalities, and to build the kind of forward-looking risk score that could flag the next Upper Big Branch before it happens. Finally, standard-level and commodity-level diagnostics—aggregating citations by 30 CFR standard, mine type, and commodity—show where the industry's recurring hazards concentrate, telling the agency and researchers where the standards are most often breached and where targeted inspection and outreach would do the most good.
Python workflow: violations from the MSHA open data
The script below pulls the MSHA violations and mines files from the agency's open-data extracts, normalizes the Significant and Substantial flag and the penalty amount, and computes two of the core metrics: the S&S rate by operator (restricted to operators with enough citations for the rate to be meaningful) and a ranking of operators by total assessed penalty, plus the national S&S share. The files are published as pipe-delimited text inside ZIP archives at no charge and with no API key. Because the MSHA extract column names are upper-snake and have shifted across releases, the script resolves the working column names defensively rather than hard-coding them; any production use should be validated against the current MSHA open-data definition files and should be explicit about whether it means proposed or assessed penalties.
import requests, zipfile, io
import pandas as pd
# MSHA Open Government Data -- the agency publishes its enforcement files
# as flat delimited extracts at no charge and with no API key. The
# "Violations" file holds one row per citation or order (issued since
# 2000); the "Mines" file holds one row per mine in the inventory. Both
# are joinable on MINE_ID.
# Open data portal: https://arlweb.msha.gov/opengovernmentdata/ogimsha.asp
# data.gov listing: https://catalog.data.gov/dataset/msha-violations-dataset
# The files are pipe-delimited text shipped inside a ZIP, with a header
# row of upper-snake column names. The column set can shift between
# releases, so resolve names defensively and confirm against the current
# MSHA open-data definition files.
VIOLATIONS_ZIP = "https://arlweb.msha.gov/OpenGovernmentData/DataSets/Violations.zip"
MINES_ZIP = "https://arlweb.msha.gov/OpenGovernmentData/DataSets/Mines.zip"
def load_zip(url, sep="|"):
r = requests.get(url, timeout=600)
r.raise_for_status()
zf = zipfile.ZipFile(io.BytesIO(r.content))
name = zf.namelist()[0]
with zf.open(name) as fh:
return pd.read_csv(fh, sep=sep, dtype=str, low_memory=False,
encoding="latin-1")
def col(frame, *candidates):
# MSHA column names are upper-snake and have shifted across releases;
# resolve them rather than hard-coding a single spelling.
lower = {c.lower(): c for c in frame.columns}
for cand in candidates:
if cand.lower() in lower:
return lower[cand.lower()]
raise KeyError(f"none of {candidates} in {list(frame.columns)[:12]}...")
viol = load_zip(VIOLATIONS_ZIP)
mines = load_zip(MINES_ZIP)
print(f"Violation records loaded: {len(viol):,}")
c_mine = col(viol, "MINE_ID")
c_op = col(viol, "VIOLATOR_NAME", "CONTROLLER_NAME")
c_ss = col(viol, "SIG_SUB", "SIGNIFICANT_SUBSTANTIAL")
c_penalty = col(viol, "PROPOSED_PENALTY", "AMOUNT_PAID", "AMOUNT_DUE")
c_std = col(viol, "SECTION_OF_ACT", "PART_SECTION")
# Coerce the penalty to numeric; blanks and dashes -> 0.
viol["_penalty"] = pd.to_numeric(viol[c_penalty], errors="coerce").fillna(0)
# Normalize the S&S flag to a clean boolean (the file uses Y / N / blank).
viol["_ss"] = viol[c_ss].fillna("").str.strip().str.upper().eq("Y")
# --- 1. S&S rate by operator -------------------------------------------
# What share of an operator's citations were Significant and Substantial?
by_op = viol.groupby(c_op).agg(
violations=(c_mine, "size"),
ss=("_ss", "sum"),
assessed=("_penalty", "sum"),
)
by_op["ss_rate"] = by_op["ss"] / by_op["violations"].clip(lower=1)
# Restrict to operators with enough citations for the rate to mean something.
material = by_op[by_op["violations"] >= 100].copy()
print("\nHighest S&S rate (operators with 100+ citations):")
for op, row in material.sort_values("ss_rate", ascending=False).head(15).iterrows():
print(f" {str(op)[:42]:<42} {row['ss_rate']:6.1%} "
f"({int(row['ss']):,} of {int(row['violations']):,})")
# --- 2. Operators ranked by total assessed penalty ---------------------
print("\nTop 15 operators by total assessed penalty:")
for op, row in by_op.sort_values("assessed", ascending=False).head(15).iterrows():
print(f" {str(op)[:42]:<42} ${row['assessed']:>14,.0f} "
f"({int(row['violations']):,} citations)")
# --- 3. National S&S share ---------------------------------------------
ss_total = int(viol["_ss"].sum())
print(f"\nNational: {ss_total:,} S&S citations of {len(viol):,} "
f"({ss_total / max(len(viol), 1):.1%} of all violations)")
Two practical notes apply. First, the operator-level rollup in the script is deliberately coarse: it groups on the free-text violator name, which—as the caveats section stresses—is not a stable corporate identifier, so a rigorous corporate-accountability analysis must first resolve operator and controller names to entities, ideally using the controller fields in the mines inventory rather than the violator string on the citation. Second, the script ranks operators by raw totals; to compare operators fairly you must join to the mines inventory and normalize by hours worked or inspection exposure, because the largest operators will top a raw-count or raw-penalty ranking simply by virtue of size and the four-times-a-year inspection mandate. The MSHA open-data files supply the mine-level employment and hours needed to do exactly that, and the per-release definition files document the authoritative column definitions for each extract.
Limitations and analytical caveats
The violations dataset is the most comprehensive public record of industrial-safety enforcement in the United States, but it carries structural limitations that an analyst must internalize before drawing conclusions from it.
Proposed is not assessed, and recent citations are still in flux. The penalty an analysis reports depends entirely on which dollar field it uses, and the two diverge because of contest and settlement. Worse, the divergence is not uniform over time: the most recent citations are disproportionately still under contest, so their assessed amounts have not yet settled to a final figure and will tend to fall. Any penalty total that mixes long-final older citations with still-contested recent ones, or that treats the proposed amount as what operators paid, will misstate the cost of enforcement. State the field, and account for the contest pipeline at the leading edge.
Operator names churn, but mine IDs persist. The single most common mistake in mine-accountability analysis is to aggregate on the operator name. Mines change hands; the same physical mine can carry several operator names over its life, and a single corporate family can operate under dozens of distinct operator and subsidiary names. The operator string on a citation is free text and is not a reliable corporate key. The mine ID, by contrast, is stable, and the inventory's controller fields are the proper basis for rolling mines up to the responsible corporate parent. An analysis that ranks “operators” by the raw name field will both fragment a single bad actor across many names and conflate unrelated firms that happen to share one.
Inspection intensity confounds raw counts. Because the Mine Act mandates fixed-frequency, full inspections, citation counts are driven heavily by how much a mine is inspected, which in turn depends on its size, its type, and whether it operated at all in a given period. A large, continuously-running underground mine inspected four times a year will out-cite a small intermittent surface pit not because it is more dangerous but because it is exposed to far more inspection-hours. Comparing mines or operators on raw citation totals therefore mostly measures size and inspection exposure; only a rate normalized by hours worked or inspection-hours—both available through the inventory join—supports a fair comparison of compliance.
An S&S flag is a judgment, and a citation is not an accident. The S&S designation and the gravity findings are an inspector's forward-looking judgment about danger, made under a legal test that has been litigated for decades; they are not measurements, and a contested S&S finding can be vacated on appeal. Conversely, a violation is a finding of non-compliance, not evidence that harm occurred—the link between violations and actual accidents is the hypothesis the accidents-data join exists to test, not an assumption to be smuggled in. And reporting lag applies throughout: a citation written in the field must be entered, processed, and—for the assessed amount— adjudicated before it is final in the extract, so the most recent months are systematically incomplete. Held with these caveats in mind, the msha_violations table is a uniquely valuable resource: a mine-resolved, S&S-scored, penalty-bearing census of the conditions federal inspectors find across the entire US mining industry—the enforcement record that, when its warning signs are read in time, is meant to keep miners from becoming the next entry in the accidents data.
Related writing
OSHA Severe Injury Reports: The Federal Record of Amputations and Hospitalizations Since 2015 — The general-industry counterpart to mine-safety enforcement: where MSHA inspects every mine on a mandatory schedule and records the violations it finds, OSHA captures the severe injuries employers are required to report, and reading the two together contrasts a census-style inspection regime with a self-reported harm record.
OSHA 300A Injury and Illness Data: The Federal Database Behind Establishment-Level Workplace Injury Rates — The establishment-level injury-rate data that does for general industry what normalizing MSHA citations by hours worked does for mining: it turns raw counts into rates comparable across very differently sized workplaces.
Compliance Screening Across 30+ Federal Enforcement Lists: How the Risk Score Works — An operator's mine-safety violation history is one of the enforcement signals that feeds a cross-agency risk score, and this piece shows how S&S rates and penalty histories combine with debarment, sanctions, and other federal records into a single screening view.