Technical writing
By the numbers: using EEOC charge statistics to find discrimination patterns by industry and employer
Every year, roughly 70,000 workers file a discrimination charge with the Equal Employment Opportunity Commission. The agency publishes annual aggregate statistics showing how many charges alleged race discrimination, sex discrimination, disability, age, or retaliation — broken down by industry, state, and resolution type. Since 2017, charge-level data has also been released under FOIA to journalists and researchers. Together, these two datasets constitute a systematic record of alleged workplace discrimination in the United States. Almost nobody uses them rigorously.
This article covers the EEOC's statutory authority, the structure of both data tiers, the fields available in each, the analytical approaches that surface industry patterns and employer repeat appearances, and the limitations that make the data a floor on discrimination rather than a ceiling.
The EEOC's statutory mandate
The Equal Employment Opportunity Commission is an independent federal agency created by Title VII of the Civil Rights Act of 1964, codified at 42 U.S.C. § 2000e et seq. Its enforcement jurisdiction spans five major federal statutes:
- Title VII of the Civil Rights Act (1964)— prohibits employment discrimination based on race, color, religion, sex, and national origin. Covers employers with 15 or more employees.
- Age Discrimination in Employment Act (ADEA, 1967)— prohibits discrimination against workers age 40 and older. Covers employers with 20 or more employees.
- Americans with Disabilities Act (ADA, 1990)— prohibits discrimination against qualified individuals with disabilities; requires reasonable accommodation. The ADA Amendments Act of 2008 substantially broadened the definition of disability, generating a measurable spike in disability-basis charges after 2009. Covers employers with 15 or more employees.
- Equal Pay Act (EPA, 1963)— requires equal pay for equal work regardless of sex, without the requirement that a charge be filed before suit. The EPA is enforced jointly by the EEOC and private plaintiffs.
- Genetic Information Nondiscrimination Act (GINA, 2008)— prohibits the use of genetic information in employment decisions. Charge volumes under GINA are small but nonzero.
Before a worker can file a lawsuit under Title VII or the ADA, they must first exhaust administrative remedies by filing a charge with the EEOC within 180 days (or 300 days in states with a fair employment practices agency) of the alleged discriminatory act. This exhaustion requirement channels nearly all employment discrimination litigation through the EEOC charge process — meaning the charge record is a near-complete census of attempted federal enforcement, not a sample.
The EEOC's primary remedial tool is conciliation: mediation between the charging party and the respondent employer, conducted under confidentiality. If conciliation fails and the EEOC finds reasonable cause, the agency may file suit in federal district court. In practice, the agency litigates roughly 200 cases per year out of 70,000 charges — about 0.3%. The rest are resolved through conciliation, closed without action, or result in the charging party receiving a right-to-sue letter to pursue their own claim.
The two data tiers
EEOC data comes in two fundamentally different forms with different access requirements, different granularity, and different analytical utility.
Tier 1: public annual aggregate statistics
The EEOC publishes annual charge statistics at:
https://www.eeoc.gov/data/charge-statistics
The statistics cover 1997 through the most recent completed fiscal year (the EEOC operates on an October 1 – September 30 fiscal year). The data is available as Excel files and as downloadable tables. The aggregate statistics are the entry point for pattern analysis because they are freely available, consistent across years, and sufficient to answer most distributional questions about the charge record.
The published aggregate tables organize charges along several dimensions. Thebasis dimension classifies each charge by the legal theory alleged: race, sex (including pregnancy), national origin, religion, color, reprisal (retaliation for protected activity), age, disability, equal pay/compensation, and genetic information. A single charge can allege multiple bases simultaneously — the EEOC counts each basis separately, so aggregate basis counts sum to more than the total charge count.
The statute dimension identifies which law the charge was filed under: Title VII, ADEA, ADA, EPA, or GINA. The resolution_type field classifies outcomes: no reasonable cause, reasonable cause, merit resolution (which includes reasonable cause findings and withdrawals with benefits, i.e., settlements), administrative closure, and right-to-sue letters. Industry breakdowns use NAICS sector codes. State breakdowns use the state where the charge was filed, not necessarily where the employer is headquartered.
Tier 2: FOIA-released charge-level data
The more granular dataset is the charge-level extract released under FOIA requests. ProPublica, the Washington Post, and several academic researchers have obtained versions of this data going back to the 2010s. The EEOC has resisted broad release of charge-level data on the grounds that it could identify charging parties who sought confidential proceedings. FOIA-released extracts typically include:
- Charge number — an alphanumeric identifier tied to the district office that received the charge. The format encodes the year and office, enabling time-series analysis at the office level.
- Filing date and closure date — enabling processing time analysis by office, which shows dramatic variation. Some district offices resolve charges in under a year; others exceed three years.
- Respondent name and size — the employer named in the charge. Size fields use employer-reported employee count at the time of filing. This is the field that enables employer repeat-appearance analysis.
- Basis and statute — same taxonomy as the aggregate statistics.
- Resolution type and monetary benefits— dollar amounts recovered through conciliation or settlement, if any. This is the field that drives the “$535M+ recovered annually” headline figure the EEOC publishes.
The charge-level data does not contain the charging party's name, the specific facts alleged, or the details of any conciliation agreement — all of which are protected by 42 U.S.C. § 2000e-5(b). What it does contain is sufficient to identify which employers appear most frequently, in which jurisdictions, under which legal theories.
The merit resolution rate
The single most important figure in the aggregate statistics is the merit resolution rate: the fraction of charges that result in either a reasonable cause finding or a monetary settlement. In fiscal year 2023, the EEOC resolved approximately 67,000 charges. Of those, about 18% were merit resolutions — meaning roughly four in five charges were closed without any finding in the charging party's favor and without monetary benefit to the employee.
This figure is widely cited as evidence that most discrimination charges are meritless. That interpretation is wrong in a specific technical sense. The EEOC classifies a charge as “no reasonable cause” when it closes without a reasonable cause determination — but this classification includes charges that were abandoned by the charging party, charges where the employer provided a facially neutral explanation that the EEOC investigators found plausible without conducting a full investigation, and charges from employers who are too small to be covered by the relevant statute. It does not mean the EEOC affirmatively concluded no discrimination occurred.
The administrative closure category is particularly important to understand. Charges are administratively closed when the EEOC cannot locate the charging party, when the charging party fails to respond to requests, or when the charge is referred to a state agency. Administrative closure does not reflect any assessment of merit. Removing administrative closures from the denominator, the merit resolution rate is materially higher than 18%.
What the merit resolution rate does tell you — accurately — is that the EEOC's enforcement capacity is vastly outpaced by charge volume. With roughly 2,000 investigators handling 70,000 charges annually, many investigations are necessarily superficial. ProPublica's 2013 investigation found that the agency was closing thousands of charges each year under what investigators called “rapid charge processing” — a speed-over-thoroughness approach that treated no-cause closures as a workload management tool rather than a substantive legal conclusion.
Monetary recovery without admission
The EEOC reports recovering $535 million or more in monetary benefits annually through pre-litigation administrative channels — conciliation agreements and mediation settlements — before any lawsuit is filed. This figure excludes amounts recovered through litigation, which the agency tracks separately.
Nearly every EEOC conciliation agreement and virtually every employer settlement of an EEOC charge contains a no-admission-of-wrongdoing clause. The employer pays a specified amount, may agree to implement training programs or submit to monitoring, and explicitly does not admit that any discrimination occurred. This is not unique to EEOC conciliations — it is the standard structure of regulatory settlements across federal enforcement — but it has specific consequences for how the charge-level data should be read.
An employer that appears in the charge record with multiple merit resolutions is not an employer that has admitted to discrimination. It is an employer that has paid money to make charges go away. Those are empirically correlated with actual discrimination, but the correlation is imperfect. Settlement value is driven by litigation risk, litigation cost, and the employer's business incentive to avoid adverse publicity — not purely by the underlying merits of the discrimination claim. A large employer with high litigation exposure will settle cases that a small employer would fight, independent of the underlying conduct.
This caveat does not make employer-level charge analysis useless. Controlling for employer size (using the respondent size field in the charge-level data), charge rates per employee are a meaningful signal. An employer generating disability discrimination charges at three times the sector average, controlling for size, is worth investigating further — whether through DOL OFCCP compliance reviews, EEO-1 workforce composition data, or direct records requests.
Large employer repeat patterns and pattern-or-practice authority
The charge-level data enables employer-level analysis that the aggregate statistics cannot support. Fortune 500 employers appear repeatedly in the charge record, in patterns that reflect both their size and, in some cases, structural features of their workforces and management practices. Amazon, Walmart, and several major logistics and hospitality employers have appeared in disability discrimination charges at rates that journalists and advocates have argued exceed what size alone explains.
The EEOC has independent authority to address systemic discrimination under 42 U.S.C. § 2000e-6, which authorizes the Attorney General (and, since 1972, the EEOC itself) to bring “pattern or practice” suits against employers engaged in a unlawful employment practice as a regular course of conduct. Pattern or practice suits bypass the individual charge process: the agency can bring suit directly without a specific charging party, using statistical evidence and anecdotal case samples to establish that discrimination was the employer's standard operating procedure.
Pattern or practice litigation is expensive, takes years, and the EEOC brings fewer than ten such cases per year. But the investigative threshold for pattern or practice findings — the aggregate charge rate plus employer-level workforce composition data — is exactly what the combination of EEOC charge statistics and EEO-1 data enables a researcher to compute.
EEO-1 companion data: workforce composition filings
The EEOC collects workforce composition data from private employers with 100 or more employees and federal contractors with 50 or more employees through the EEO-1 Component 1 report. EEO-1 data classifies employees by sex and by race/ ethnicity across ten job categories:
- Executive/Senior-Level Officials and Managers
- First/Mid-Level Officials and Managers
- Professionals
- Technicians
- Sales Workers
- Administrative Support Workers
- Craft Workers
- Operatives
- Laborers and Helpers
- Service Workers
The EEOC historically kept EEO-1 aggregate data closely held, releasing only national and industry-level summaries. In 2022, after a prolonged FOIA fight led by the Center for Investigative Reporting and joined by several civil rights organizations, the EEOC released company-level EEO-1 data for the 2016 and 2017 reporting years covering tens of thousands of employers. Subsequent releases have extended this coverage forward.
The analytical value of joining charge data to EEO-1 is the ability to compute a charge-rate-per-employee, broken down by demographic group and job category. A company with a high disability discrimination charge rate whose EEO-1 filing shows very few employees in the “disabled” workforce category is expressing a consistent signal: the workforce composition and the enforcement record are telling the same story. The join requires entity resolution between the EEOC's respondent name field (free text, inconsistently spelled) and the employer name in the EEO-1 data. This is non-trivial: a single company with multiple subsidiaries may appear under dozens of different names across the two datasets.
Python: loading and analyzing EEOC aggregate statistics
The EEOC aggregate statistics are published as Excel workbooks. The charge count tables are the most useful starting point. Here is a pipeline to download, parse, and compute year-over-year trends by basis, with a focus on the disability charge surge after the ADA Amendments Act took effect in January 2009:
import pandas as pd
import requests
import matplotlib.pyplot as plt
from io import BytesIO
# The EEOC charge statistics Excel file for all-charge totals by basis
# Download the "Charge Statistics (Charges filed with EEOC)" Excel from:
# https://www.eeoc.gov/data/charge-statistics
# The workbook contains multiple sheets; "All Statutes" is the broadest view.
# Example: loading a locally downloaded Excel workbook
WORKBOOK_PATH = "eeoc_charge_stats.xlsx"
xl = pd.ExcelFile(WORKBOOK_PATH)
print(xl.sheet_names)
# Typical sheets: 'Title VII', 'ADEA', 'ADA', 'EPA', 'GINA', 'All Statutes'
# Load the "All Statutes" sheet — this gives total charges across all bases
df_all = pd.read_excel(WORKBOOK_PATH, sheet_name="All Statutes", header=3)
# The EEOC Excel layout: year as column headers, basis as rows
# Transpose to long format for analysis
df_all = df_all.rename(columns={df_all.columns[0]: 'basis'})
df_all = df_all.dropna(subset=['basis'])
df_long = df_all.melt(id_vars='basis', var_name='year', value_name='charge_count')
df_long['year'] = pd.to_numeric(df_long['year'], errors='coerce')
df_long = df_long.dropna(subset=['year'])
df_long['charge_count'] = pd.to_numeric(df_long['charge_count'], errors='coerce')
# Pivot to basis-by-year matrix
pivot = df_long.pivot(index='year', columns='basis', values='charge_count')
# Year-over-year change in disability charges (ADA + ADA Amendments)
disability_cols = [c for c in pivot.columns if 'Disability' in str(c)]
pivot['disability_total'] = pivot[disability_cols].sum(axis=1)
pivot['disability_yoy'] = pivot['disability_total'].pct_change() * 100
print("Year-over-year disability charge change (%):")
print(pivot[['disability_total', 'disability_yoy']].loc[2005:2015].round(1))
# Identify sectors with rising disability charges post-ADAAA (2009+)
# Requires the industry-level breakdown file from the same EEOC data page
INDUSTRY_PATH = "eeoc_charges_by_industry.xlsx"
df_ind = pd.read_excel(INDUSTRY_PATH, sheet_name="Disability", header=3)
# Compute 2008 vs 2013 disability charge count by NAICS sector
# to identify which industries absorbed the post-ADAAA surge
df_ind_long = df_ind.melt(id_vars=['NAICS Sector', 'Industry'], var_name='year',
value_name='charges')
df_ind_long['year'] = pd.to_numeric(df_ind_long['year'], errors='coerce')
df_ind_long = df_ind_long.dropna(subset=['year'])
pre = df_ind_long[df_ind_long['year'] == 2008].set_index('Industry')['charges']
post = df_ind_long[df_ind_long['year'] == 2013].set_index('Industry')['charges']
change = ((post - pre) / pre * 100).sort_values(ascending=False)
print("Industries with largest disability charge increase 2008-2013:")
print(change.head(10).round(1))The post-2009 disability charge data reveals a pattern consistent with the ADAAA's expanded definition of disability: manufacturing, transportation, and warehousing sectors — industries with physical job demands where employers historically relied on narrow definitions of disability to avoid accommodation obligations — show the steepest charge increases. Healthcare and social assistance, where disability-related issues often arise in the context of direct care workers, also shows sustained elevated charge volumes.
Cross-reference: NLRB, OSHA, and OFCCP
EEOC charge data produces its strongest analytical results when combined with enforcement records from three other agencies that operate in adjacent legal territory.
NLRB unfair labor practice cases
Retaliation for union activity frequently coincides with EEOC charges alleging retaliation for protected activity. An employer facing a union organizing campaign who simultaneously fires several workers who were union supporters will often generate both an NLRB Section 8(a)(3) charge (discrimination based on union membership) and an EEOC retaliation charge (for complaining about working conditions or filing a prior EEOC charge). The NLRB case management system is publicly searchable at nlrb.gov, and bulk case data is available via the NLRB data download. Matching employer names across NLRB ULP cases and EEOC charges by date range surfaces employers who face coordinated labor and civil rights enforcement pressure simultaneously.
OSHA Section 11(c) retaliation complaints
Section 11(c) of the Occupational Safety and Health Act prohibits retaliation against workers who report safety violations. OSHA investigates Section 11(c) complaints under a separate administrative process, and the complaint data is obtainable through FOIA from OSHA's Integrated Management Information System (IMIS). Workers who experienced both unsafe conditions and retaliation often file both an OSHA 11(c) complaint and an EEOC charge simultaneously, particularly in manufacturing and construction sectors. The overlap is most visible at the employer level: companies with elevated EEOC retaliation charge rates that also appear frequently in OSHA 11(c) data are expressing a consistent organizational pattern.
DOL OFCCP compliance evaluations
Federal contractors with 50 or more employees and federal contracts of $50,000 or more are subject to affirmative action requirements under Executive Order 11246 and Section 503 of the Rehabilitation Act. The Department of Labor's Office of Federal Contract Compliance Programs conducts compliance evaluations — desk audits and onsite reviews — and pursues conciliation agreements when it finds systemic compensation disparities or workforce underutilization. OFCCP conciliation agreement data is public and includes the employer name, the contract number, the violation type, and the monetary remedy. Joining OFCCP conciliation data to EEOC charge data by employer name identifies federal contractors whose EEOC charge profiles are reinforced by OFCCP enforcement findings: the same demographic disparities visible in the EEOC charge record are also visible in the workforce composition data the OFCCP reviews in its compliance evaluations.
The OFCCP also makes available a database of compliance evaluations by establishment at dol.gov/agencies/ofccp, including whether evaluations were closed with violations or without. This is a lower-bar indicator than a conciliation agreement: a “closure with violations” means the OFCCP found problems but did not pursue formal enforcement. Employers with frequent OFCCP violations and elevated EEOC charge rates are the intersection the pattern-or-practice framework is designed to address.
Limitations of the charge record as an enforcement signal
The EEOC charge record is a floor on employment discrimination, not a ceiling. Several structural features of the administrative process cause systematic undercounting:
- Mandatory arbitration agreements. Many large employers require employees to sign arbitration agreements as a condition of employment, waiving the right to pursue discrimination claims in court. Workers subject to these agreements may still file EEOC charges — the right to file a charge cannot be waived — but if the EEOC issues a right-to-sue letter, the worker cannot litigate in federal court; they must arbitrate. Workers who understand that their arbitration clause limits their post-EEOC options may be less likely to file in the first place.
- Geographic concentration of charge filing.EEOC district office location shapes charge filing patterns. Workers far from an EEOC office, or in states with weak state fair employment agencies, file at lower rates. The charge-per-employee rate varies enormously across states and sectors for reasons that reflect access to the process as much as underlying discrimination rates.
- Documentation requirements favor sophisticated claimants.Workers who are able to preserve documentary evidence — emails, performance reviews, pay stubs — and who can articulate a legal theory of discrimination produce charges that are more likely to survive initial screening. Low-wage workers in industries with limited documentation trails, including agricultural, domestic, and gig-economy work, are structurally disadvantaged at the charge filing stage.
- Employer size thresholds exclude large swaths of employment.The 15-employee threshold for Title VII and ADA leaves uncovered roughly 11% of private-sector workers employed by the smallest firms. These workers have remedies only if their state has a broader fair employment practices law.
Where to get the data
The aggregate charge statistics are freely downloadable from the EEOC website. The charge-level dataset requires a FOIA request to the EEOC's FOIA office at foia.eeoc.gov. Prior FOIA releases obtained by ProPublica and other news organizations are available through their published data repositories and through PACER if they were produced in litigation. The EEO-1 company-level data released after the 2022 FOIA fight is available through the EEOC's employer information report portal at eeocdata.org/pdfs/2016-EEO-1-Data-Release-Notes.pdf and subsequent release notes.
The EEOC also publishes a separate data resource at eeocdata.org covering workforce statistics from EEO-1 reports at the aggregate level, organized by industry, metropolitan area, and job category. This site is the companion to the charge statistics and enables the charge-rate-per-employee computation when company-level EEO-1 data is not available: you can benchmark a sector's charge rate against its reported workforce demographic composition to ask whether sectors with lower minority or female workforce representation also generate different patterns of discrimination charges by basis.
For researchers seeking to replicate or extend prior investigative work, the best starting point is ProPublica's EEOC data documentation from their 2013 investigation and the Reveal/CIR reporting on EEO-1 data. Both organizations published detailed methodology notes on the entity resolution challenges and the meaning of each field. The technical problems — employer name normalization, joining charge data to EEO-1 across subsidiary structures, distinguishing individual from systemic enforcement findings — are well-documented at this point. The data, the tools, and the legal framework are all available. What the charge record requires is the analytical attention it rarely receives.
Related writing
The mortgage map: using HMDA loan-level data to find lending disparities — How to acquire and analyze HMDA loan-level data from the CFPB bulk download to surface redlining, reverse redlining, and lender-level racial denial rate disparities.
Who won, who lost: five years of union elections in NLRB data — How to pull, clean, and analyze NLRB union election records — RC and RD cases, the 2021–2024 organizing surge, the 100k export cap workaround, and cross-dataset correlations.