Technical writing
Seismic record: using the USGS earthquake catalog to analyze fault risk and induced seismicity
The USGS National Earthquake Information Center has been compiling a continuous record of global seismicity since 1900. The result—the Comprehensive Catalog, or ComCat—is the authoritative public repository for earthquake event data: origin time, location, depth, magnitude, and dozens of derived quality metrics for every significant earthquake worldwide. Roughly 20,000 earthquakes occur every day, the vast majority too small to feel without instrumentation. The catalog makes all of them queryable, which turns earthquake science from a discipline of memorable disasters into a discipline of statistical patterns—fault activity rates, seismic recurrence intervals, and the detection of anomalous seismicity caused by human activity rather than tectonic stress.
What the USGS earthquake catalog covers
ComCat is maintained by the USGS National Earthquake Information Center (NEIC) in Golden, Colorado. It aggregates data from hundreds of seismic networks worldwide, applying consistent location and magnitude algorithms to produce a unified global catalog. The NEIC's authoritative products are supplemented by regional networks—the California Integrated Seismic Network, the Pacific Northwest Seismic Network, the New Madrid Seismic Network—each of which contributes detections at lower magnitude thresholds than the global network can achieve.
Coverage thresholds vary by region and era. Globally, ComCat provides substantially complete coverage of M4.5+ earthquakes from roughly 1960 onward, when the World-Wide Standardized Seismographic Network (WWSSN) was established. M2.5+ events are available globally from 1900 onward, but completeness at that threshold deteriorates sharply for pre-1960 events in seismically quiet regions and for pre-1930 events everywhere. In the contiguous United States, where network density is high, the catalog is substantially complete to M1.0 from regional networks, and to M2.0 in most regions from the national network.
The practical scale of the catalog: approximately 100,000 events per year exceed M4 globally. The United States alone records tens of thousands of M1.0+ events annually from its dense regional seismic networks. A full extract of the global catalog from 1900 to present at M2.5+ threshold yields several million event records.
The “historical” versus “instrumental” catalog distinction matters for research design. Pre-instrumental records (pre-1900 for most regions) are reconstructed from contemporary accounts—newspaper reports, damage surveys, felt-area studies. They have uncertain magnitudes, imprecise locations, and unknown completeness. Instrumental records from early seismograph networks (1900–1960) have better location precision but inconsistent magnitude calibration. Modern digital seismograph records (post-1970, and especially post-1990) have standardized waveform data, systematic quality metrics, and reliable magnitude estimates. Any analysis spanning these eras requires explicit completeness modeling.
Data structure
Each ComCat event record contains a structured set of fields. The core fields in CSV or GeoJSON output:
- id — the USGS event identifier, a globally unique string combining a network code and a numeric or alphanumeric sequence. Example:
us7000abc1. The network code prefix identifies the authoritative network for that event. - time — origin time in UTC milliseconds since epoch (in the API) or ISO 8601 format (in CSV). All times are UTC; converting to local time requires the event coordinates.
- latitude / longitude — WGS84 decimal degrees. Precision reflects the seismic network's locating capability; rural areas with sparse station coverage produce larger location uncertainties than well-instrumented urban corridors.
- depth — hypocenter depth in kilometers below the surface. Depth is often the least precisely constrained parameter, particularly for small events. Induced seismicity analysis relies heavily on depth: injection-related events cluster at the depth of injection formations (typically 1–4 km), while tectonic events on deeper faults occur at 5–20 km or more.
- mag / magType — the reported magnitude and its type code. Both fields are required for any inter-catalog comparison.
- magSource / locationSource — the network that computed the magnitude and the network responsible for the hypocenter location. These may differ: the NEIC may refine a location initially computed by a regional network.
- azimuthalGap — the largest azimuthal angle between adjacent stations used to locate the event, in degrees. A gap above 180° indicates the event is outside the station network, resulting in poorly constrained horizontal location. Events with large azimuthal gaps should be treated as approximate locations.
- minimumDistance — the distance in degrees from the event to the nearest seismic station. Small values indicate well-constrained locations; large values indicate the event was distant from all recording instruments.
- rms — the root-mean-square residual of the travel-time fit, in seconds. Low RMS values indicate that the computed location fits the observed arrival times well. High RMS can indicate a poorly constrained solution or unusual wave propagation.
- nst — number of seismic stations (phases) used in the location. Events located with fewer than four stations are considered poorly constrained.
- place — a human-readable description of the event location, such as “14 km SSW of Cushing, Oklahoma.” This is generated automatically from the coordinates and the GeoNames geographic database; it is useful for display but not for spatial analysis.
Magnitude types explained
The magType field is among the most commonly misunderstood aspects of earthquake catalog data. Multiple magnitude scales exist because each measures a different aspect of the seismic signal, and different scales are appropriate at different magnitude ranges and distances. Comparing magnitudes across catalogs without accounting for scale differences produces systematic errors.
Mw (moment magnitude) is the scientific standard for moderate to large earthquakes. It is derived from the seismic moment—the product of the fault area, the average slip, and the shear modulus of the surrounding rock. Mw does not saturate at large magnitudes (unlike Mb and Ms), making it the only scale that reliably distinguishes, say, an M8.5 from an M9.0. All modern authoritative magnitude estimates for earthquakes above M5.5 should be Mw or one of its moment-tensor variants (Mwc, Mwb, Mww). When you see a news report describe “the earthquake was magnitude 7.8,” they are almost certainly quoting Mw.
Ml (local magnitude, the original Richter scale)was designed by Charles Richter in 1935 for Southern California earthquakes recorded on a specific seismograph type. It remains in widespread use for small earthquakes (M1–M4) because it is computationally straightforward and well-calibrated for regional networks. Ml saturates above roughly M6 and is not valid for teleseismic distances (events more than a few hundred kilometers away). Most US regional network magnitude estimates for small events are Ml.
Mb (body-wave magnitude) measures the amplitude of short-period body waves (P-waves) and is used primarily for teleseismic events in the M4–M7 range. It saturates above M6.5–7.0, which means large earthquakes appear systematically smaller in Mb than in Mw. Cold War-era nuclear test monitoring relied on Mb for global event detection, so historical catalogs from that period are often Mb-dominated.
Ms (surface-wave magnitude) measures long-period surface waves and is better than Mb at capturing energy in the M6–M8 range, but it is not reliably measurable for deep earthquakes (below ~70 km), where surface waves are weak relative to body waves. Ms was the dominant scale in global catalogs from roughly the 1960s through the 1990s.
Md (duration magnitude) is a local-network estimate based on the duration of the seismic coda rather than the peak amplitude. It is common in regional catalogs for very small events (M<2) where amplitude-based scales are difficult to compute. Md is a rough proxy and should not be compared directly to Mw or Ml without a local calibration correction.
The practical implication: any analysis that aggregates or trends ComCat data across decades needs to either restrict to a single magnitude type (accepting the resulting completeness constraints) or apply published conversion equations to produce a homogeneous magnitude scale. Mixing Mb, Ms, and Mw records and treating them as equivalent will produce apparent magnitude trends that are artifacts of scale transitions rather than physical changes in seismicity.
The Oklahoma induced seismicity story
The most consequential finding in the ComCat data over the past two decades is the Oklahoma induced seismicity episode—a fourfold increase in M3+ earthquake rates between 2009 and 2015 that the USGS concluded was primarily caused by high-volume wastewater injection from oil and gas production. Oklahoma went from averaging one to two M3+ earthquakes per year through the 2000s to recording more than 900 M3+ events in 2015 alone. For several years Oklahoma had more M3+ earthquakes than California.
The mechanism is well-established in reservoir mechanics. Hydraulic fracturing itself is rarely the direct cause of significant induced seismicity; the primary driver is the injection of large volumes of produced water (brine that comes up with oil and gas) into deep disposal wells penetrating permeable formations. The injected fluid migrates through the formation, increasing pore pressure on pre-existing faults, and can trigger slip on faults that are already close to the failure stress threshold. The Arbuckle formation—a deep saline aquifer underlying much of Oklahoma—was the primary disposal target and sits adjacent to ancient basement faults that were seismically quiet before high-volume injection began.
The ComCat record captures the full episode with high spatial and temporal resolution. Mapping M3+ earthquake epicenters in Oklahoma by year from 2000 to 2020 shows the seismicity nucleating near high-volume injection wells starting around 2009, spreading outward as pore pressure migrated through the formation, peaking in 2015–2016, and declining after Oklahoma Corporation Commission regulations restricted injection volumes at disposal wells within proximity zones of M3+ earthquakes beginning in 2015. By 2018, M3+ rates had dropped by roughly 70 percent from the 2015 peak.
The USGS published annual induced seismicity hazard maps from 2016 onward that combine ComCat seismicity rates with the induced-seismicity distinction to produce one-year probabilistic ground-motion forecasts. These maps explicitly separate natural tectonic hazard from induced hazard—a distinction that matters for insurance underwriting, building code policy, and regulatory liability questions, since the induced component is human-controlled in a way that tectonic seismicity is not.
How to access the data
The USGS Earthquake Hazards Program provides two primary access paths for ComCat data.
The FDSN event web service API is available atearthquake.usgs.gov/fdsnws/event/1/. It follows the international FDSN (Federation of Digital Seismograph Networks) web service standard, which means the same query syntax works against USGS and many international seismological agencies. Key query parameters:
- starttime / endtime — ISO 8601 timestamps defining the time window. Example:
starttime=2015-01-01&endtime=2016-01-01. - minmagnitude — lower magnitude threshold. Setting
minmagnitude=2.5returns the global catalog minimum. Values below 2.5 require regional network coverage and are not globally complete. - minlatitude / maxlatitude / minlongitude / maxlongitude— bounding box for geographic filtering. For Oklahoma: approximately
minlatitude=33.6&maxlatitude=37.0&minlongitude=-103.0&maxlongitude=-94.4. - format — output format.
geojsonreturns a FeatureCollection directly loadable into GIS tools and mapping libraries.csvreturns a flat file suitable for data analysis.quakemlreturns the full XML format used in seismological research. - maxdepth / mindepth — depth filtering in kilometers. Useful for isolating shallow induced seismicity from deeper tectonic events.
- limit / offset — pagination controls. The API returns a maximum of 20,000 events per request; large time windows require either pagination or time-window subdivision.
The bulk download interface atearthquake.usgs.gov/earthquakes/search/ provides a graphical query builder that generates API URLs. For large historical extracts (multi-year, global), the Comprehensive Earthquake Catalog (ComCat) bulk download files are available directly from USGS as compressed CSV archives. For programmatic bulk access, thelibcomcat Python library wraps the FDSN API with automatic pagination, rate-limit handling, and response parsing.
Three research use cases
Fault proximity analysis from earthquake clusters
The USGS Quaternary Fault and Fold Database maps known active faults in the United States—structures with documented Holocene or late Quaternary surface rupture. But many seismically active faults are buried under sediment, lack surface expression, or have not been mapped by field geologists. ComCat earthquake clusters reveal these cryptic faults.
The technique is straightforward in principle. Extract all M1.5+ earthquakes in a target region over a 20-year window. Apply a spatial clustering algorithm—DBSCAN works well given the irregular density of seismicity—to group events by their horizontal location within a few kilometers. Each cluster corresponds to a fault segment or fault zone. The elongation direction of a cluster indicates the fault strike. The depth distribution of cluster members indicates the seismogenic depth range. Faults identified this way can be prioritized for paleoseismic investigation and included in site-specific hazard analyses even before surface mapping confirms them.
This approach is now standard practice in the seismic hazard community. The 2023 National Seismic Hazard Model revision incorporated cluster-derived fault geometries in regions where the instrumental catalog was dense enough to constrain fault orientation reliably. Infrastructure siting decisions—pipelines, dam foundations, critical facilities—increasingly use ComCat cluster analysis as an early-stage screening step before committing to field investigation budgets.
Induced seismicity detection
Separating induced from tectonic earthquakes in the ComCat record requires combining several lines of evidence, none individually definitive. The statistical signature of induced seismicity differs from tectonic seismicity in several ways.
Depth distribution is the first discriminant. Injection-induced events cluster at injection formation depths—typically 1–5 km in Oklahoma's Arbuckle disposal scenario—while tectonic earthquakes on basement faults in stable continental regions occur at 5–20 km. A seismicity cluster with a bimodal depth distribution suggests both induced (shallow) and triggered tectonic (deeper) events. Spatial correlation with injection well locations is the second discriminant: if M3+ events cluster within a few kilometers of a high-volume disposal well with no prior seismic history, and if the seismicity onset correlates temporally with injection volume increase, the induced hypothesis is strongly supported.
The USGS induced seismicity team has published open-source R and Python tools for this analysis, including the GMMs package for ground-motion model comparison and scripts for correlating ComCat seismicity with EPA underground injection control (UIC) well records. The EPA UIC program (Class II wells) provides injection volume and pressure data by well for commercial disposal wells, enabling quantitative correlation with spatially and temporally co-located seismicity.
Historical hazard assessment by region
Earthquake recurrence modeling—estimating how often a region experiences M6+ or M7+ events—is the foundation of building code seismic design provisions and insurance catastrophe modeling. The ComCat provides the instrumental catalog component of this analysis; paleoseismic trenching provides the longer-term record. For regions with dense instrumental coverage since 1960 and minimal completeness artifacts, the 60-year instrumental catalog is sufficient to estimate recurrence rates for M5+ events and to extrapolate to larger magnitudes using Gutenberg-Richter relations.
A practical workflow: extract all M4+ events from the ComCat for a target state or region from 1970 (post-WWSSN, high completeness) to present. Fit a Gutenberg-Richter magnitude-frequency distribution to estimate the b-value and the annual rate at the completeness threshold. Extrapolate to M6, M7, and M8 using the fitted distribution. The result is a mean recurrence interval for large events, with uncertainty bounds that reflect the catalog duration.
This analysis reveals dramatic regional variation that aggregate national statistics obscure. California averages an M6.7+ event roughly every 6–7 years on the statewide rate. The Cascadia subduction zone produces M9 events on a recurrence interval of 200–500 years—a rate not estimable from the 125-year instrumental catalog alone, requiring paleoseismic turbidite and land-subsidence evidence to constrain. The New Madrid Seismic Zone in the central US produced three M7.5+ events in 1811–1812 but has been relatively quiet instrumentally, creating a long-return-period hazard that the instrumental catalog systematically underestimates relative to paleoseismic evidence.
Cross-references to related datasets
The USGS ShakeMap system generates near-real-time ground-motion maps for significant earthquakes by combining recorded waveform data with ground-motion prediction equations. ShakeMap products—peak ground acceleration, peak ground velocity, macroseismic intensity—are available via the same ComCat API and event IDs, enabling linkage between the earthquake event record and the estimated ground-motion footprint. FEMA's HAZUS earthquake loss estimation model uses ShakeMap output as its primary hazard input, translating ground-motion maps into structural damage and economic loss estimates at the census tract level.
For induced seismicity research, the EPA Underground Injection Control (UIC) program is the most important complementary dataset. The UIC Class II program regulates injection wells used for oil and gas brines, including the high-volume disposal wells implicated in induced seismicity episodes. Well-level injection volume and pressure data, submitted to state agencies under UIC permits, provides the “forcing function” data for injection-seismicity correlation. State databases vary in accessibility; Oklahoma, Kansas, and Texas maintain public injection volume databases that can be joined to ComCat seismicity by geographic proximity and time period.
NOAA's Storm Events Database includes earthquake entries for historical felt events that caused damage or injuries, providing a parallel record useful for validating historical ComCat entries and assessing damage impacts from events in the pre-instrumental era. Multi-hazard risk assessments frequently need to correlate ComCat seismicity with NOAA storm events to assess compound exposure in regions where seismic and hydrometeorological hazards overlap.
The NOAA National Geophysical Data Center (now NCEI) maintains a historical tsunami database that links to large subduction zone earthquakes in ComCat. Coastal infrastructure risk analyses in the Pacific Northwest, Alaska, and Hawaii require integrating earthquake source parameters from ComCat with tsunami run-up models, as the ground-shaking and inundation hazards have different geographic footprints and different structural damage mechanisms.
Limitations
Detection threshold is the first structural limitation of the catalog. Every seismic network has a minimum magnitude below which events are routinely missed, and that threshold varies with network density, station noise floor, and local geology. A M2.0 earthquake in central Oklahoma within a dense monitoring network will be detected and located precisely. The same event in rural Montana, with stations 150 km apart, may not trigger the detection threshold, or may be located with ±20 km uncertainty. Analyses that assume uniform catalog completeness across geography produce biased results wherever network density varies significantly.
Historical catalog completeness degrades rapidly before 1960. Magnitude estimates for pre-1960 events rely on isoseismal area (the geographic extent of reported shaking intensity) rather than waveform data, and those estimates carry uncertainties of 0.3–0.5 magnitude units for well-documented events and larger for events with sparse felt reports. The 1906 San Francisco earthquake, the most consequential in US history, has a Mw estimate that has been revised from 8.3 (the original Richter estimate) to 7.9 in subsequent reanalyses using modern methods applied to historical seismograms. Magnitude uncertainty of this size translates to roughly a factor of two in energy release and correspondingly large uncertainty in ground-motion modeling.
Magnitude scale heterogeneity within the catalog requires careful handling. The transition from Mb-dominated records (1960s–1990s) to Mw-dominated records (1990s–present) introduces systematic apparent trends if records are aggregated without scale normalization. The USGS provides Mw conversion equations for common scale pairs, but conversions introduce their own uncertainty and are calibrated regionally; a global conversion applied to a regional catalog may introduce bias.
Finally, the catalog's temporal and spatial completeness varies with operational history. NEIC resources, station network changes, and procedural revisions all affect what enters the catalog and how. Periods of reduced network operation (station outages, communication failures in remote areas) can create apparent seismicity gaps that are observational artifacts rather than physical quiescence. Long-term trend analyses should account for known operational discontinuities in the contributing networks.
For NOAA's storm events database and how extreme weather data complements earthquake hazard analysis in multi-hazard risk assessments: 60 years of extreme weather: using NOAA Storm Events data to map tornado paths, flood losses, and climate trends →
For PHMSA pipeline safety incident data and infrastructure risk analysis adjacent to seismically active zones: Pipeline spills and explosions: using PHMSA incident data to map 50 years of pipeline failures →
For FEMA flood insurance claims data and how NFIP loss records support compound hazard analysis alongside seismic risk: Repetitive loss: what FEMA's flood insurance claims data reveals about 2.7 million paid claims →