Technical writing

EPA Greenhouse Gas Reporting Program: The Facility-Level Emissions Database Behind US Climate Accountability

· AI Analytics
EPAGreenhouse GasClimateEnvironmentalFederal Data

Every year, approximately 8,000 American industrial facilities—power plants, oil refineries, natural gas processing stations, cement kilns, aluminum smelters—submit detailed reports to the US Environmental Protection Agency quantifying how many metric tons of greenhouse gas each one emitted. The result is the Greenhouse Gas Reporting Program (GHGRP): the most granular, facility-resolved public dataset of US stationary-source emissions in existence, covering roughly 85 to 90 percent of all US greenhouse gas emissions from fixed industrial sources. For researchers tracking the power sector's coal-to-gas transition, financial analysts assessing climate transition risk, and satellite scientists validating top-down methane measurements, GHGRP data is the authoritative ground truth.

What the GHGRP is and who must report

EPA established the GHGRP under the authority of Section 114 of the Clean Air Act, implementing regulations published at 40 CFR Part 98. The reporting threshold is 25,000 metric tons of carbon dioxide equivalent (CO2e) per year of direct greenhouse gas emissions. Any facility whose emissions meet or exceed this threshold in a given year is legally required to submit an annual report to EPA by March 31 of the following year for the prior calendar year's emissions. A facility that drops below the threshold for three consecutive years may apply to exit the program.

The program covers 41 distinct industrial source categories, each governed by its own subpart of 40 CFR Part 98 with prescribed calculation or measurement methodologies. The roughly 8,000 reporting facilities represent the largest stationary emitters in the American economy. EPA processes and verifies the submitted data, then publishes it publicly approximately 18 months after the end of the reporting year: 2023 emissions data, for instance, became publicly available in late 2024. This lag reflects the time required for facilities to complete annual reports, EPA to conduct quality assurance checks, and EPA to publish verified data.

What the GHGRP does not cover is equally important for understanding its scope. Agriculture—including livestock methane and nitrous oxide from fertilized soils—is exempt because agricultural operations are almost universally below the 25,000 metric ton threshold and are not included even as a source category. Mobile sources (cars, trucks, aircraft, ships) are not covered; EPA tracks those separately through the National Emissions Inventory. Residential and commercial buildings' natural gas combustion and fuel oil use fall below the reporting threshold at the individual building level. The GHGRP therefore captures the industrial and energy sectors comprehensively while leaving out the distributed sources that collectively account for the remaining 10 to 15 percent of US emissions.

Industrial source categories

The 41 source categories span the full breadth of heavy industry. Power generation (Subpart D) covers electric generating units burning fossil fuels; this is the sector most commonly associated with US climate policy because it is historically the largest single-sector emitter, responsible for roughly 30 to 35 percent of total US greenhouse gas emissions. Petroleum and natural gas systems (Subpart W) is the largest sector by facility count, encompassing onshore petroleum and natural gas production, gathering and boosting stations, natural gas processing plants, transmission pipelines and storage facilities, liquefied natural gas (LNG) import and export terminals, and local distribution companies. Petroleum refineries (Subpart Y) cover the processing of crude oil into fuels and petrochemical feedstocks.

Iron and steel production (Subpart Q) covers integrated steelmaking at blast furnaces and basic oxygen furnaces. Cement production (Subpart H) covers both combustion emissions from kilns and the process emissions from calcination of limestone—the conversion of calcium carbonate to calcium oxide and CO2—which are substantial and independent of fuel choice. Pulp and paper manufacturing (Subpart AA), chemical manufacturing across numerous subparts, aluminum production (Subpart F), and electronics manufacturing (Subpart I, covering fluorinated gas emissions from semiconductor fabrication) round out the major manufacturing categories.

Waste-sector sources are also covered: municipal solid waste landfills (Subpart HH) must report methane generation from decomposing organic waste; wastewater treatment facilities (Subpart II) report methane and nitrous oxide from biological treatment processes; and underground coal mines (Subpart FF) and surface coal mines report methane liberated from coal seams during extraction. This breadth means the GHGRP provides a single, methodologically consistent dataset across industries that otherwise use incompatible reporting frameworks.

The six greenhouse gases

Facilities report emissions of six categories of greenhouse gases, all expressed as metric tons of CO2-equivalent (CO2e) using Global Warming Potential (GWP) values over a 100-year time horizon (GWP100) drawn from IPCC Assessment Reports—primarily AR4 values, with a regulatory shift toward AR5 values underway.

Carbon dioxide (CO2) is the dominant gas by volume, arising from combustion of fossil fuels and industrial process reactions such as limestone calcination in cement kilns. By definition CO2 has a GWP100 of 1.0 and serves as the reference gas for all other conversions.

Methane (CH4) is the second most consequential gas in the GHGRP dataset. Its GWP100 is 28 to 34 times that of CO2 (28 under IPCC AR5, 34 in some older EPA methodologies), meaning a metric ton of methane leaking from a natural gas compressor station counts as 28 to 34 metric tons CO2e. Because methane is also a short-lived climate pollutant with a 12-year atmospheric lifetime, its near-term warming impact is even more dramatic: its GWP20 is approximately 80 times CO2.

Nitrous oxide (N2O) has a GWP100 of 265 to 298 times CO2 under IPCC AR5. Industrial sources reported in the GHGRP include adipic acid and nitric acid manufacturing, combustion of fossil fuels at high temperatures, and some waste treatment processes. N2O has a roughly 114-year atmospheric lifetime, making it a very long-lived climate forcer.

Hydrofluorocarbons (HFCs) are synthetic gases used as refrigerants, foam-blowing agents, and aerosol propellants following the phase-out of ozone-depleting chlorofluorocarbons under the Montreal Protocol. Their GWP100 values range from a few hundred to over 14,800 for HFC-23 (CHF3), a byproduct of HCFC-22 (refrigerant) production.

Perfluorocarbons (PFCs) arise primarily from aluminum smelting during anode effects—brief disruptions in the electrolytic reduction process—and from semiconductor manufacturing processes. CF4 has a GWP100 of approximately 6,630 and an atmospheric lifetime exceeding 50,000 years, making it among the most persistent greenhouse gases known.

Sulfur hexafluoride (SF6) is used as an insulating and arc-quenching medium in high-voltage electrical switchgear, circuit breakers, and gas-insulated substations. Its GWP100 is approximately 23,500 times CO2, and like CF4 it has an extremely long atmospheric lifetime (approximately 3,200 years). Even small fractional leaks from aging switchgear have a significant climate impact in CO2e terms, and SF6 emissions from the electric power sector show up distinctively in GHGRP data for utilities operating large transmission infrastructure.

What the data reveals: power sector decline and natural gas controversy

Year-over-year GHGRP data traces the structural shift in the US power sector with unusual precision. Electric power generation, historically responsible for 30 to 35 percent of US GHG emissions, has seen aggregate GHGRP-reported CO2 fall substantially as coal-fired generating units retire and are replaced by natural gas combined-cycle plants, wind, and solar. The GHGRP captures this not just in aggregate but plant by plant: individual generating units are identified by their EPA Plant ID, and annual CO2 data for each unit traces its operating history, fuel-switching decisions, and eventual retirement. The James H. Miller Jr. Electric Plant in Alabama—operated by Alabama Power—has historically ranked among the highest-emitting single facilities in the GHGRP dataset, typical of large coal-fired generators operating at high capacity factors.

Natural gas systems present the most contested segment of the GHGRP. Subpart W (petroleum and natural gas systems) covers the broadest array of facility types and the most complex calculation methodologies. Gathering and boosting stations, gas processing plants, transmission compressor stations, underground storage fields, and local distribution pipelines each have distinct prescribed measurement and estimation approaches. Critics of Subpart W methodology have consistently argued that the bottom-up, engineering-calculation-based approach underestimates actual methane emissions compared to direct measurement. EPA's 2024 update to Subpart W regulations—the largest revision since the program's inception—substantially revised the calculation methodologies upward to better reflect measured emission factors from field studies, and added new requirements for super-emitter response triggered by third-party satellite or aerial detection.

The top-10 highest-emitting facilities in any given GHGRP year are dominated by large coal-fired power plants and, increasingly, large oil refineries and natural gas processing complexes. ExxonMobil's integrated refinery and petrochemical complex in Baytown, Texas consistently ranks among the top individual facilities. Because facility-level data includes parent company attribution, GHGRP enables analysis of which corporations are responsible for the largest shares of US stationary-source emissions—a capability used extensively by financial analysts constructing transition-risk assessments.

Scope 1 only: the FLIGHT tool and what it shows

A fundamental boundary condition of the GHGRP is that it reports only Scope 1 emissions—direct emissions from sources owned or controlled by the reporting facility. A manufacturing plant that purchases electricity from the grid does not include the power plant's emissions in its GHGRP report; those appear in the power plant's own report. Scope 2 (purchased electricity and heat) and Scope 3 (value chain emissions upstream and downstream) are entirely outside the GHGRP framework. This distinction is critical when using GHGRP data for corporate accountability analysis: a company that has electrified its manufacturing and shifted to grid power has reduced its own GHGRP footprint but may have done nothing to reduce aggregate system emissions if the grid is still coal-heavy.

EPA's primary public interface to the GHGRP dataset is the Facility Level Information on GreenHouse Gases Tool (FLIGHT), accessible at ghgdata.epa.gov/ghgp/main.do. FLIGHT provides an interactive map of all reporting facilities color-coded by emission level, facility search by name or location, sector filter to isolate specific industry categories, and year-over-year time-series visualization for individual facilities or aggregated sector totals. Users can download facility-level CSV exports filtered by state, sector, gas type, and reporting year. FLIGHT is the entry point for most non-programmatic GHGRP research and is regularly used by journalists and NGOs to identify the largest emitting facilities in a given state or congressional district.

Subpart W data within FLIGHT deserves particular attention for its complexity. Petroleum and natural gas systems report using one of several prescribed methodologies depending on facility type— gathering and boosting stations use one set of emission factors, gas processing plants use another, transmission and storage operators use still others. LNG facilities have their own calculation procedures. This methodological heterogeneity means that Subpart W reported totals cannot be directly compared across facility types without understanding the underlying estimation approach, and it creates systematic uncertainty when aggregating to sector-level totals.

Data access: ECHO, Envirofacts, and FLIGHT API

The GHGRP data is available through multiple programmatic access channels. EPA's primary bulk download mechanism is through ECHO—the Enforcement and Compliance History Online system at echo.epa.gov. ECHO hosts annual GHG bulk download ZIP files containing facility-level aggregated data (GHGRP_RLPS_YYYY.csv or equivalent) and subpart-level detail files covering each of the 41 reporting subparts. These files are the most complete representation of the GHGRP dataset and are the appropriate starting point for large-scale quantitative analysis.

The Envirofacts database at envirofacts.epa.gov/enviro/ provides a query-based API interface to GHGRP data and numerous other EPA databases. Envirofacts supports SQL-like queries via URL parameters, returning results in JSON or XML, and is useful when programmatic filtering by facility, year, or subpart is needed without downloading the full bulk file. The EPA ENVIRO RESTful API provides documentation at envirofacts.epa.gov/enviro/ef-metadata/ describing available tables and field definitions.

Reported fields in GHGRP data include: facility name and EPA facility ID, parent company name, street address, city, state, zip code, latitude and longitude (enabling GIS analysis and satellite data joining), primary NAICS industry code, reporting year, total reported direct emissions in metric tons CO2e, emissions disaggregated by greenhouse gas, subpart-level emission breakdowns, and whether the facility has a linked operating permit under the Title V Clean Air Act program. The latitude/longitude fields are particularly valuable for cross-referencing GHGRP-reported emissions with satellite-derived emission plumes.

Climate policy applications

GHGRP data underpins a wide range of climate policy and accountability work. EPA uses GHGRP facility-level data as the empirical foundation for major regulatory rulemakings under the Clean Air Act—most importantly the greenhouse gas New Source Performance Standards for power plants under Section 111(b) and (d), which set CO2 emission rate limits for new and existing generating units. The facility-level data quantifies the emission reductions achievable under proposed standards and informs the regulatory impact analysis.

Non-governmental organizations and climate accountability projects cross-reference GHGRP data with CDP (formerly Carbon Disclosure Project) corporate self-reported emissions to identify discrepancies between what companies report to voluntary frameworks versus what their facilities are legally required to report to EPA. Because CDP captures Scope 1 corporate totals and the GHGRP captures facility-level Scope 1 data with parent company attribution, reconciling the two datasets reveals whether corporate disclosures are consistent with the underlying regulatory reporting.

Financial analysts and ESG rating providers use GHGRP data to construct transition risk assessments under the Task Force on Climate-related Financial Disclosures (TCFD) framework. A company with significant GHGRP-reported emissions from coal or oil assets faces stranded-asset risk under plausible carbon pricing scenarios; GHGRP data makes this risk quantifiable at the facility level rather than relying on company-provided disclosures. Academic researchers use GHGRP data as the baseline for econometric studies of the relationship between carbon pricing, plant investment decisions, and emission trajectories.

State-level cap-and-trade programs use their own facility reporting systems rather than GHGRP directly. California's program under AB 32 uses CARB (California Air Resources Board) mandatory reporting, which has overlapping but distinct methodology from EPA's. However, the existence of the GHGRP provides a federal crosswalk: facilities in California that report to CARB generally also report to the GHGRP, and researchers routinely use both datasets together to validate emission estimates across methodological frameworks.

Methane satellite validation and the measurement controversy

The most technically significant debate surrounding GHGRP data in recent years concerns methane emissions from the petroleum and natural gas sector. Multiple independent satellite and aerial measurement programs have systematically found higher methane emissions from oil and gas producing regions than Subpart W bottom-up calculations suggest—sometimes by factors of two to five in aggregate.

The Copernicus Sentinel-5P satellite, carrying the TROPOMI (TROPOspheric Monitoring Instrument) sensor, has generated global methane column measurements since 2018 at a spatial resolution of approximately 5×7 km. TROPOMI data has been used in dozens of peer-reviewed studies comparing observed methane columns over the Permian Basin, Appalachian Basin, and Haynesville Shale to bottom-up inventories including GHGRP-reported values. The consistent finding across these studies is that top-down satellite-derived emission estimates exceed bottom-up calculations by 30 to 100 percent in the most actively studied regions.

GHGSat, a commercial satellite operator, and Carbon Mapper—a collaboration between NASA's Jet Propulsion Laboratory and Planet Labs—provide higher-resolution point-source attribution, capable of identifying individual facility-level super-emitters that contribute disproportionately to aggregate sector emissions. MethaneSAT, launched in 2024 by the Environmental Defense Fund, is specifically designed to quantify regional oil and gas methane emissions with sub-field spatial resolution and to enable attribution to individual operators.

This top-down vs. bottom-up discrepancy has a specific technical explanation. Subpart W calculation methodologies are based on emission factors derived from industry-provided equipment surveys and engineering models. These approaches tend to undercount episodic, high-volume emission events—blowdowns, equipment failures, venting during unplanned maintenance—because such events are infrequent and poorly captured by continuous monitoring or annual averages. Satellite data, by contrast, integrates across all emission modes including super-emitter events. EPA's 2024 Subpart W revision acknowledged this discrepancy directly, significantly revising emission factors upward for several source types and adding requirements for super-emitter response when third-party aerial or satellite detection identifies anomalously high emissions from a specific facility. The revised rule represents the most significant methodological update to GHGRP petroleum and natural gas reporting since the program began.

Python example: analyzing 2022 facility-level GHG data

The following script downloads the ECHO GHGRP bulk data, normalizes column names across year variations, filters to the 2022 reporting year, and performs four analyses: top-10 emitting states by aggregate CO2e; average and total emissions per facility by broad industry sector using NAICS code prefixes; identification of the top-20 highest-emitting individual facilities with their parent companies; and a sector composition pie chart of total US reported emissions saved to a PNG file. The script uses only pandas, matplotlib, and Python standard library modules. No API key is required.

import io
import zipfile
import urllib.request
import pandas as pd
import matplotlib.pyplot as plt

# ---------------------------------------------------------------------------
# EPA GHGRP Bulk Download via ECHO
# Facility-level aggregated emissions for 2022
# Source: https://echo.epa.gov/files/echodownloads/ghg_download.zip
# ---------------------------------------------------------------------------

GHG_ZIP_URL = (
    "https://echo.epa.gov/files/echodownloads/ghg_download.zip"
)

def fetch_ghg_data() -> pd.DataFrame:
    """Download the ECHO GHGRP bulk zip and load the facility summary CSV."""
    print("Downloading ECHO GHGRP bulk data...")
    with urllib.request.urlopen(GHG_ZIP_URL, timeout=120) as resp:
        raw = resp.read()
    zf = zipfile.ZipFile(io.BytesIO(raw))
    # The archive contains a single CSV named GHGRP_RLPS_<year>.csv or similar
    csv_names = [n for n in zf.namelist() if n.lower().endswith(".csv")]
    if not csv_names:
        raise ValueError(f"No CSV found in zip. Contents: {zf.namelist()}")
    print(f"  Loading: {csv_names[0]}")
    with zf.open(csv_names[0]) as fh:
        df = pd.read_csv(fh, encoding="latin-1", low_memory=False)
    return df

def normalize_columns(df: pd.DataFrame) -> pd.DataFrame:
    """Normalize column names to lowercase with underscores."""
    df.columns = (
        df.columns
        .str.strip()
        .str.lower()
        .str.replace(r"[^a-z0-9]+", "_", regex=True)
        .str.strip("_")
    )
    return df

def find_col(df: pd.DataFrame, candidates: list[str]) -> str:
    """Return the first candidate column name that exists in df."""
    for c in candidates:
        if c in df.columns:
            return c
    raise KeyError(f"None of {candidates} found. Available: {list(df.columns[:30])}")

def main() -> None:
    df = fetch_ghg_data()
    df = normalize_columns(df)

    print(f"\nShape: {df.shape}")
    print(f"Columns (first 20): {list(df.columns[:20])}\n")

    # Identify key columns (column names vary slightly by year)
    col_co2e  = find_col(df, ["total_reported_direct_emissions", "co2e_emission", "ghg_quantity", "total_co2e"])
    col_state = find_col(df, ["state", "facility_state", "state_name"])
    col_sector = find_col(df, ["primary_naics_code", "industry_type", "sector", "primary_naics"])
    col_name   = find_col(df, ["facility_name", "name"])
    col_year   = find_col(df, ["reporting_year", "year", "ghg_reporting_year"])

    # Coerce emissions to numeric
    df[col_co2e] = pd.to_numeric(df[col_co2e], errors="coerce")

    # Filter to 2022 if multi-year file
    if df[col_year].nunique() > 1:
        df = df[df[col_year] == 2022].copy()
        print(f"Filtered to 2022: {len(df):,} facilities")
    else:
        print(f"Single-year file: {len(df):,} facilities, year={df[col_year].iloc[0]}")

    total_mt = df[col_co2e].sum()
    print(f"Total reported CO2e (metric tons): {total_mt:,.0f}")
    print(f"Total reported CO2e (billion metric tons): {total_mt / 1e9:.3f}\n")

    # ------------------------------------------------------------------
    # 1. Top-10 emitting states
    # ------------------------------------------------------------------
    by_state = (
        df.groupby(col_state)[col_co2e]
        .sum()
        .sort_values(ascending=False)
        .head(10)
        .reset_index()
    )
    by_state.columns = ["state", "total_co2e_mt"]
    by_state["share_pct"] = by_state["total_co2e_mt"] / total_mt * 100
    print("Top-10 emitting states (2022):")
    print(f"  {'State':<6} {'Total CO2e (Mt)':>18}  {'Share':>7}")
    print("  " + "-" * 40)
    for _, row in by_state.iterrows():
        print(f"  {row['state']:<6} {row['total_co2e_mt']:>18,.0f}  {row['share_pct']:>6.1f}%")
    print()

    # ------------------------------------------------------------------
    # 2. Average emissions per facility by sector (NAICS prefix)
    # ------------------------------------------------------------------
    df["sector_group"] = df[col_sector].astype(str).str[:2]
    sector_map = {
        "22": "Power Generation",
        "32": "Manufacturing",
        "21": "Mining/Oil & Gas",
        "48": "Transportation/Pipeline",
        "56": "Waste Management",
        "31": "Food/Paper Mfg",
        "23": "Construction",
        "11": "Agriculture/Forestry",
    }
    df["sector_label"] = df["sector_group"].map(sector_map).fillna("Other")
    by_sector = (
        df.groupby("sector_label")[col_co2e]
        .agg(["sum", "count", "mean"])
        .rename(columns={"sum": "total_mt", "count": "facilities", "mean": "avg_mt"})
        .sort_values("total_mt", ascending=False)
    )
    print("Emissions by sector:")
    print(f"  {'Sector':<25} {'Facilities':>12}  {'Total CO2e (Mt)':>18}  {'Avg per Facility':>18}")
    print("  " + "-" * 80)
    for label, row in by_sector.iterrows():
        print(
            f"  {label:<25} {int(row['facilities']):>12,}  "
            f"{row['total_mt']:>18,.0f}  {row['avg_mt']:>18,.0f}"
        )
    print()

    # ------------------------------------------------------------------
    # 3. Top-20 individual highest-emitting facilities
    # ------------------------------------------------------------------
    col_company = None
    for c in ["parent_company_name", "company_name", "parent_company"]:
        if c in df.columns:
            col_company = c
            break
    col_city = None
    for c in ["city", "facility_city"]:
        if c in df.columns:
            col_city = c
            break

    top20_cols = [col_name, col_state, col_co2e]
    if col_company:
        top20_cols.insert(1, col_company)
    if col_city:
        top20_cols.insert(-1, col_city)

    top20 = (
        df[top20_cols]
        .dropna(subset=[col_co2e])
        .nlargest(20, col_co2e)
        .reset_index(drop=True)
    )
    top20.index += 1
    print("Top-20 highest-emitting individual facilities (2022):")
    print(f"  {'#':>3}  {'Facility':<45}  {'State':<6}  {'CO2e (Mt)':>14}")
    print("  " + "-" * 80)
    for rank, row in top20.iterrows():
        name = str(row[col_name])[:44]
        state = str(row[col_state])
        co2e = row[col_co2e]
        print(f"  {rank:>3}  {name:<45}  {state:<6}  {co2e:>14,.0f}")
    print()

    # ------------------------------------------------------------------
    # 4. Sector composition pie chart
    # ------------------------------------------------------------------
    sector_totals = by_sector["total_mt"].copy()
    threshold = total_mt * 0.02          # merge sectors < 2% into "Other"
    small = sector_totals[sector_totals < threshold]
    large = sector_totals[sector_totals >= threshold]
    if not small.empty:
        other_val = small.sum()
        if "Other" in large.index:
            large = large.copy()
            large["Other"] += other_val
        else:
            large = pd.concat([large, pd.Series({"Other": other_val})])

    fig, ax = plt.subplots(figsize=(9, 7))
    wedges, texts, autotexts = ax.pie(
        large,
        labels=large.index,
        autopct="%1.1f%%",
        startangle=140,
        pctdistance=0.78,
        wedgeprops={"linewidth": 0.8, "edgecolor": "white"},
    )
    for at in autotexts:
        at.set_fontsize(9)
    ax.set_title(
        "US GHGRP Reported Emissions by Sector (2022)\n"
        "Total = facility-reported CO\u2082-equivalent, metric tons",
        fontsize=12,
        pad=16,
    )
    plt.tight_layout()
    plt.savefig("ghgrp_sector_composition_2022.png", dpi=150)
    print("Pie chart saved to ghgrp_sector_composition_2022.png")

if __name__ == "__main__":
    main()

Running this script against a typical GHGRP year file will show approximately 1.8 to 2.0 billion metric tons of CO2e reported by roughly 8,000 facilities—representing the covered portion of US stationary-source emissions. Texas consistently leads state totals by a large margin due to its concentration of petrochemical manufacturing, natural gas processing, and power generation. Louisiana, Florida, Pennsylvania, and Indiana typically follow. The sector composition pie will show electric power generation and petroleum and natural gas systems jointly accounting for 60 to 70 percent of reported CO2e, with refineries, chemical manufacturing, and cement/lime production making up most of the remainder. The top-20 facility list will be dominated by large coal power plants and integrated refinery/petrochemical complexes, with per-facility CO2e figures typically ranging from 15 to 40 million metric tons for the largest emitters.

For the regulatory rulemaking pipeline through which EPA translates GHGRP emission data into binding Clean Air Act standards—including the APA notice-and-comment process, OIRA review, and the Congressional Review Act mechanisms that govern how power plant CO2 rules and Subpart W methane rules are finalized—see Federal Register: The Official Rulemaking Journal Behind 90,000 Pages of Annual US Regulatory Activity, covering the full federal rulemaking process from NPRM to final rule.

For campaign finance data that tracks the political economy of climate regulation—including contributions from energy sector PACs and trade associations that participate in GHGRP comment proceedings and lobby on Subpart W methodology—see FEC Committee Filings: The Campaign Finance Database Behind $14 Billion in Election Spending, covering the OpenFEC API, Super PAC disclosures, and bulk contributor file analysis.