Technical writing

CFPB Consumer Complaint Database: The Federal Record Behind 3 Million Financial Product Complaints

May 25, 2026· 14 min read· AI Analytics

CFPBConsumer FinanceComplaintsBankingFederal Data

The Consumer Financial Protection Bureau complaint database contains every consumer complaint submitted to the CFPB since 2012 — 3 million+ complaints about mortgages, credit cards, student loans, debt collection, and credit reporting — with the company response, resolution outcome, and optional consumer narrative, making it the most comprehensive federal record of retail financial product failures and the primary data source for consumer finance enforcement prioritization.

What the CFPB complaint database is

The Consumer Financial Protection Bureau was created by Title X of the Dodd-Frank Wall Street Reform and Consumer Protection Act, signed into law on July 21, 2010. Dodd-Frank vested the CFPB with supervisory and enforcement authority over banks with more than $10 billion in assets, nonbank mortgage originators, payday lenders, private student lenders, debt collectors, credit reporting agencies, and other providers of consumer financial products and services. Among its statutory mandates was the collection of consumer complaints and the creation of a public-facing complaint database.

The CFPB launched its complaint intake portal in July 2011, initially accepting complaints about credit cards only. Mortgage complaints were added in December 2011, bank accounts and services in March 2012. The public database — making individual complaint records accessible to anyone — went live in June 2012. By 2013 the database covered student loans, vehicle loans, consumer loans, and money transfers. Debt collection was added in July 2013, credit reporting in October 2012, and prepaid cards in 2014. The database has grown continuously since, reaching approximately 3 million published complaints as of early 2027.

The legal basis for the public database sits in 12 U.S.C. § 5493(b)(3), which requires the CFPB to “facilitate the centralized collection of, monitoring of, and response to consumer complaints regarding consumer financial products or services.” The CFPB's implementing policy interpreting that authority to permit public disclosure has been challenged by industry but has survived each challenge; the database remains fully public with no registration requirement.

The database serves several distinct audiences. Regulators use complaint volume and patterns to prioritize supervisory examinations and enforcement investigations. Researchers use it to study consumer harm, product design failures, and racial disparities in service quality. Journalists use it to identify companies with anomalous complaint rates. Consumers use it to check whether their complaint experience with a company is typical or exceptional. Companies use it to monitor competitor complaint rates and benchmark their own performance.

Complaint lifecycle

A complaint enters the database through a defined sequence of steps. The CFPB describes this as the complaint process, and each stage leaves a record in the published data.

Submission. A consumer submits a complaint via the CFPB website at consumerfinance.gov, by phone at 855-411-CFPB, by mail, or by fax. The CFPB screens each submission to verify that it involves a financial product or service within the Bureau's jurisdiction and that it contains sufficient information to route to the company. Complaints outside CFPB jurisdiction (e.g., about healthcare billing or retail purchases not involving a financial product) are redirected to the appropriate agency.

Routing. Accepted complaints are routed to the named company, typically within one business day via the CFPB's secure company portal. The company receives the full complaint text, the product and issue classification, the consumer's contact information, and any documents the consumer attached. The CFPB also notifies relevant state and federal agencies with oversight authority over the company.

Company response window. Companies are expected to respond within 15 calendar days of receiving the complaint. A response must include the company's characterization of the complaint and the action taken or proposed. If the company needs more than 15 days to resolve the matter, it may provide an interim response and then a final response, with a hard deadline of 60 calendar days. The CFPB tracks whether each response was timely and publishes that indicator in the database as a binary “Yes / No” field.

Publication. The CFPB publishes complaints in the database approximately 15 days after the company responds, or 15 days after the company has had the complaint for 60 days — whichever comes first. This means the published database typically lags the current date by about two to three months for recently filed complaints. The published record includes the product, issue, company name, company response type, timely response indicator, state, ZIP code (sometimes masked to three digits), date received, and date sent to company.

Consumer feedback. After the company responds, the CFPB notifies the consumer and gives them 60 days to review the response and indicate whether they are satisfied. A consumer who believes the company's response did not address their complaint can “dispute” the resolution. This dispute indicator is published alongside the response record and is the primary measure of consumer dissatisfaction with company handling.

Product and issue taxonomy

The CFPB classifies every complaint into a two-level product hierarchy and a two-level issue hierarchy. The product taxonomy has been revised several times since 2011; the current version consolidates earlier categories into 12 top-level product groups.

Product category	Representative sub-products
Credit reporting / personal consumer reports	Credit reporting, Background check, Personal financial data
Debt collection	Credit card debt, Medical debt, Auto debt, Mortgage debt, I do not know
Mortgage	Conventional home mortgage, FHA mortgage, VA mortgage, Home equity loan or line
Credit card or prepaid card	General-purpose credit card, Store credit card, Government benefit card, Payroll card
Checking or savings account	Checking account, Savings account, CD (certificate of deposit)
Student loan	Federal student loan servicing, Private student loan
Vehicle loan or lease	Loan, Lease, Title loan
Money transfer / virtual currency / money service	Domestic wire transfer, International money transfer, Virtual currency
Personal loan	Installment loan, Line of credit, Pawn loan
Payday loan / rent-to-own / title loan	Payday loan, Installment loan (payday), Rent-to-own
Credit reporting — other	Employment background check, Rental background check, Other personal consumer report
Debt or credit management	Debt settlement, Credit repair services, Mortgage modification or foreclosure avoidance

The issue taxonomy contains approximately 76 top-level issue types, each with multiple sub-issues. Issues vary by product; a credit reporting complaint can carry issues like “Incorrect information on your report” (the most common issue in the entire database by a large margin), “Problem with a credit reporting company's investigation into an existing problem,” “Improper use of your report,” or “Unable to get your credit report or credit score.” A debt collection complaint might carry “Attempts to collect debt not owed,” “Communication tactics,” “False statements or representation,” or “Took or threatened to take negative or legal action.” Issue codes are standardized across companies within a product category, enabling cross-company comparison of complaint type distribution.

Company coverage

More than 4,700 distinct companies appear in the CFPB complaint database. The CFPB normalizes company names when routing complaints — subsidiary names are typically mapped to a parent entity to ensure consistency. Volume is heavily concentrated in a small number of large institutions. The three national credit bureaus (Equifax, Experian, and TransUnion) alone account for a plurality of all complaints in the database, reflecting both the frequency of credit report errors and the scale of consumer contact with these institutions.

Company	Primary product category	Est. total complaints
Equifax, Inc.	Credit reporting	~760,000
Experian Information Solutions Inc.	Credit reporting	~680,000
TransUnion Intermediate Holdings, Inc.	Credit reporting	~530,000
Ocwen Financial Corporation	Mortgage servicing	~90,000
Navient Solutions, LLC	Student loan servicing	~85,000
Bank of America, N.A.	Credit card / mortgage / checking	~82,000
Wells Fargo & Company	Mortgage / credit card / checking	~80,000
JPMorgan Chase & Co.	Credit card / mortgage / checking	~75,000
Citibank, N.A.	Credit card / checking	~60,000
Portfolio Recovery Associates, LLC	Debt collection	~55,000

Raw complaint counts are not a direct measure of company quality. Large companies serve more consumers and therefore receive more complaints in absolute terms. The CFPB does not publish complaint rates normalized by customer base size, because companies do not publicly disclose active customer counts. Analysts who wish to compute normalized rates must independently estimate denominator values from FDIC call report data, annual reports, or regulatory filings. The CFPB itself uses complaint data alongside supervisory examination findings, not as a standalone performance metric.

Consumer narratives

When submitting a complaint, consumers are given the option to include a free-text narrative describing what happened. The CFPB publishes these narratives only when the consumer explicitly provides consent, using a checkbox labeled “Yes, I consent to publishing my narrative.” Approximately 60 percent of complaints in the database include a published narrative.

Narratives are the most analytically rich component of the database. While the product, issue, and response fields are structured categorical data, the narrative captures the specifics of the consumer experience — the name of the representative who gave incorrect information, the exact dollar amounts at issue, the number of dispute attempts made, the specific language in a collection letter. Narratives have been used in academic research to train text classifiers for complaint issue detection, to identify novel complaint patterns before they appear in structured fields, and to study linguistic characteristics of complaints from different demographic groups.

The CFPB scrubs narratives before publication. The Bureau removes company-specific identifying information that could allow a company to identify the consumer (names of specific employees, internal account numbers not needed for context), and redacts personal identifying information such as Social Security numbers, full account numbers, and precise dates of birth. Narratives are published as submitted in all other respects; the CFPB does not correct spelling, grammar, or factual claims.

The presence of a narrative correlates with complaint complexity and consumer engagement. Complaints with narratives tend to have higher dispute rates, suggesting that consumers who took the time to write a narrative were more invested in the outcome and more likely to challenge an unsatisfactory response. Complaints with narratives also tend to involve more serious alleged harm — fraudulent accounts, identity theft, prolonged servicer errors, and wrongful collections appear at higher rates in narrative complaints than in complaints submitted without one.

Geographic and demographic patterns

Every published complaint includes a two-letter state code and, where the consumer provided it and the CFPB determined that publication would not identify the consumer, a five-digit ZIP code. For less populous ZIPs, the CFPB truncates to three digits and appends “XX” to signal that the full code has been masked. This produces a small number of unresolvable geographic observations but preserves state-level analysis for the entire dataset.

Per-capita complaint rates vary substantially by state, even after adjusting for product mix differences. States with large populations of consumers carrying subprime credit profiles generate more credit-reporting and debt-collection complaints per capita than states with wealthier consumer bases. States with historically large concentrations of subprime mortgage originations generated disproportionate mortgage complaint volume during the period 2012–2016. Florida, California, Georgia, Texas, and New York typically rank in the top five states by absolute complaint volume; on a per-capita basis, Delaware, Nevada, and Georgia often outpace larger states due to product mix and consumer demographics.

The database does not collect or publish race, ethnicity, age, income, or other demographic characteristics of the complaining consumer. Demographic analysis requires linking complaint ZIP codes to American Community Survey census tract data as a proxy. Researchers using this method have documented that ZIP codes with higher proportions of Black and Hispanic residents generate higher per-unit complaint rates for debt collection and credit reporting products, consistent with documented patterns of predatory collection practices targeted at minority communities.

ZIP-level analysis is complicated by the masking policy described above and by the fact that the ZIP is the consumer's mailing address at the time of complaint, which may not correspond to the location where the financial product was originated or where the harm occurred. A consumer who moved from Georgia to Ohio after taking out a Georgia mortgage will appear in the Ohio ZIP. For most research purposes, state-level analysis avoids these complications.

Company response rates and timeliness

Every complaint in the published database carries a “timely response” indicator reflecting whether the company responded within 15 calendar days of receiving the complaint. Across the full database, approximately 97 percent of complaints have a timely response. The CFPB treats untimely responses as a supervisory concern; persistent untimely response rates flag companies for closer examination during supervisory reviews.

Product category	Est. total complaints	Avg. timely response
Credit reporting / personal consumer reports	~1,500,000	98%
Debt collection	~440,000	97%
Mortgage	~370,000	97%
Credit card or prepaid card	~230,000	98%
Checking or savings account	~180,000	97%
Student loan	~90,000	97%
Vehicle loan or lease	~60,000	97%
Money transfer / virtual currency	~50,000	96%

Company response types are categorical and appear in the published record as one of: “Closed with explanation,” “Closed with monetary relief,” “Closed with non-monetary relief,” “Closed without relief,” “Closed with relief,” “In progress,” and “Untimely response.” The dominant response type across the database is “Closed with explanation,” which accounts for roughly 60 to 70 percent of all closed complaints depending on the product category. “Closed with monetary relief” typically accounts for 10 to 15 percent and represents the most commercially significant outcome for consumers.

The consumer dispute rate measures how often consumers challenged the company response as inadequate. Overall dispute rates run approximately 18 to 22 percent across the database, but vary substantially by product. Mortgage complaints have historically carried higher dispute rates than credit card complaints, reflecting the higher stakes of the underlying dispute and the complexity of servicer error resolution. Credit reporting dispute rates have varied with the surge dynamics described in the next section.

The credit reporting complaint surge of 2020–2023

The single most dramatic trend in the complaint database since its inception is the extraordinary growth of credit reporting complaints between 2020 and 2023. Before 2020, credit reporting was already the largest product category in the database. The pandemic years triggered a surge that dwarfed prior volume and reshaped the database's composition. In 2020, the CFPB received approximately 282,000 credit reporting complaints. By 2021 this had risen to approximately 465,000. In 2022 the database recorded more than 500,000 credit reporting complaints in a single year — a category that had received fewer than 60,000 complaints in 2015.

Equifax, Experian, and TransUnion each received more than 200,000 complaints per year at the peak. In 2022, Equifax alone received approximately 290,000 complaints — more than the entire database had received across all companies and all products in its first two years of operation. The concentration of volume at the three national bureaus made credit reporting the defining characteristic of the database during this period.

The surge had identifiable structural causes. The CARES Act of 2020 required creditors to report accounts placed in forbearance as current rather than delinquent, but implementation errors by furnishers and bureaus created a wave of inaccurate negative tradelines. CFPB research published in 2022 documented that hundreds of thousands of consumers who had entered COVID-19 forbearance programs found incorrect derogatory marks on their credit reports as a result of furnisher reporting errors. The same period saw rapid growth in credit repair service providers — both legitimate and fraudulent — that incentivized consumers to mass-file disputes through the CFPB complaint portal as a dispute escalation mechanism.

The CFPB's response to the surge included enforcement actions against Equifax in 2017 (pre-surge, for the data breach that exposed 147 million consumers) and heightened supervisory attention to bureau dispute processing practices. In 2022, the CFPB issued guidance characterizing certain bureau practices for handling mass disputes — including automatic deletion without investigation of disputes submitted by credit repair companies — as potentially unfair or deceptive. The bureaus disputed this characterization. Complaint volumes declined modestly from their 2022 peak but remained elevated through 2023 and 2024 relative to pre-pandemic levels.

The surge illustrated both the database's value and its limitations as an enforcement signal. The volume spike was a genuine early warning of systematic bureau error at scale. But the signal was complicated by the noise introduced by credit repair companies filing complaints on behalf of consumers who had not independently identified a specific inaccuracy — a form of database use that the CFPB has not definitively prohibited but has discouraged. Analysts working with 2020–2023 credit reporting data should treat the surge period as structurally distinct from prior years.

Data access: CFPB public API and bulk download

The CFPB complaint database is accessible through two primary channels: a REST API and a bulk CSV download.

REST API. The API is available at https://www.consumerfinance.gov/data-research/consumer-complaints/search/api/v1/. It requires no API key or authentication. The API supports filtering by company name, product, issue, date range, state, ZIP code, response type, and several other fields. Results are returned as JSON with an Elasticsearch-style response structure containing a hits object with total andhits arrays. Each hit's _source field contains the complaint record. Pagination uses frm (offset) and sizeparameters; the maximum page size is 100 records. Aggregation queries are supported via the same endpoint when no_aggs is omitted. The API has no documented rate limit but responds slowly to large paginated requests; adding 200 to 300 milliseconds between requests is sufficient to avoid server-side throttling.

Bulk download. The full database is available as a CSV download from the CFPB's consumer complaint search page at consumerfinance.gov/data-research/consumer-complaints/. The CSV export returns all published complaints matching the current search criteria; without any filters applied, this is the full database of 3+ million records. The CSV is UTF-8 encoded and contains the same fields available via the API, with the exception of the consumer narrative, which is included in the full CSV only when narrative consent was provided. File size for the full database is approximately 1.5 GB uncompressed. For most analytical purposes, downloading the full CSV and querying it locally is faster than paginating through the API.

Consumer Complaint Database API v2. The CFPB has piloted an updated API endpoint at api.consumerfinance.gov/data-research/consumer-complaints/search(note: different domain, no /api/v1/ path). This older endpoint uses similar parameters but has different JSON response structure. Both endpoints return the same underlying complaint data; the v1 endpoint atconsumerfinance.gov is the current production version and is more reliable for bulk access.

Data dictionary. The CFPB publishes a data dictionary covering all published fields at consumerfinance.gov/data-research/consumer-complaints/ under the “About this data” section. The dictionary describes field names, allowed values, and the encoding for masked ZIP codes and other partially redacted values.

Python workflow: querying the CFPB API for complaints by company and product

The following script queries the CFPB complaint API for Equifax credit reporting complaints, analyzes the top issues and company response types, displays geographic distribution, surfaces a sample of consumer narratives, and then runs a comparison query to retrieve aggregate complaint totals for all three national credit bureaus across the same date range. No third-party libraries are required; only the Python standard library is used.

import json
import time
import urllib.request
import urllib.parse
from collections import defaultdict
from datetime import datetime, timedelta

# ---------------------------------------------------------------------------
# CFPB Consumer Complaint Database -- Equifax Credit Reporting Analysis
#
# Public API endpoint (no API key required):
#   https://www.consumerfinance.gov/data-research/consumer-complaints/search/api/v1/
#
# Key query parameters:
#   company          -- exact company name filter (URL-encoded)
#   product          -- product filter (partial match supported)
#   date_received_min -- ISO date, inclusive lower bound (YYYY-MM-DD)
#   date_received_max -- ISO date, inclusive upper bound (YYYY-MM-DD)
#   field            -- fields to return (comma-separated)
#   size             -- records per page (max 100 for this endpoint)
#   frm              -- pagination offset
#   sort             -- sort order (e.g. "created_date_desc")
#   no_aggs          -- set to True to skip aggregation computation
#
# Response structure:
#   {
#     "hits": {
#       "total": N,
#       "hits": [ { "_source": { ...complaint fields... } }, ... ]
#     },
#     "aggregations": { ... }  # if no_aggs not set
#   }
#
# Key complaint fields (_source):
#   complaint_id          -- unique CFPB complaint identifier
#   product               -- top-level product (e.g. "Credit reporting")
#   sub_product           -- sub-product (e.g. "Credit reporting")
#   issue                 -- primary issue type
#   sub_issue             -- sub-issue detail
#   company               -- company name as normalized by CFPB
#   company_response      -- company response to consumer
#   timely                -- "Yes"/"No" -- responded within 15 calendar days
#   consumer_disputed     -- "Yes"/"No"/"N/A" -- consumer disputed resolution
#   date_received         -- date CFPB received the complaint (YYYY-MM-DD)
#   date_sent_to_company  -- date forwarded to company
#   consumer_consent_provided -- whether narrative is published
#   complaint_what_happened   -- consumer narrative (if consent given)
#   state                     -- two-letter state code
#   zip_code                  -- 5-digit ZIP (may be masked to 3 digits)
# ---------------------------------------------------------------------------

BASE_URL = (
    "https://www.consumerfinance.gov"
    "/data-research/consumer-complaints/search/api/v1/"
)
PAGE_SIZE = 100      # max allowed per request for this endpoint
REQUEST_DELAY = 0.3  # seconds between paginated requests

COMPANY   = "EQUIFAX, INC."
PRODUCT   = "Credit reporting, credit repair services, or other personal consumer reports"
DATE_FROM = "2022-01-01"
DATE_TO   = "2023-12-31"


def fetch_complaints(company: str, product: str,
                     date_from: str, date_to: str) -> list[dict]:
    """Paginate through all complaints for a given company/product/date range."""
    all_hits: list[dict] = []
    offset = 0

    params_base = {
        "company": company,
        "product": product,
        "date_received_min": date_from,
        "date_received_max": date_to,
        "size": str(PAGE_SIZE),
        "sort": "created_date_desc",
        "no_aggs": "true",
    }

    print(f"Fetching complaints: company={company}, product={product[:30]}...")
    print(f"  Date range: {date_from} to {date_to}")

    while True:
        params = dict(params_base)
        params["frm"] = str(offset)
        url = BASE_URL + "?" + urllib.parse.urlencode(params)

        try:
            with urllib.request.urlopen(url, timeout=30) as resp:
                data = json.loads(resp.read().decode("utf-8"))
        except Exception as exc:
            print(f"  Request error at offset {offset}: {exc}")
            break

        hits = data.get("hits", {}).get("hits", [])
        total = data.get("hits", {}).get("total", 0)

        if not hits:
            break

        all_hits.extend(h["_source"] for h in hits)

        if offset == 0:
            print(f"  Total matching complaints: {total:,}")

        offset += PAGE_SIZE
        if offset >= total:
            break

        if offset % 1000 == 0:
            print(f"  Fetched {offset:,} / {total:,}...")

        time.sleep(REQUEST_DELAY)

    print(f"  Retrieved {len(all_hits):,} complaint records\n")
    return all_hits


def analyze_issues(complaints: list[dict]) -> None:
    """Tabulate top issues and sub-issues."""
    issue_counts: dict[str, int] = defaultdict(int)
    sub_issue_counts: dict[str, int] = defaultdict(int)

    for c in complaints:
        issue = c.get("issue") or "Unknown"
        sub   = c.get("sub_issue") or "Not specified"
        issue_counts[issue] += 1
        sub_issue_counts[f"{issue} > {sub}"] += 1

    total = len(complaints) or 1

    print("=== Top 15 Issues ===")
    print(f"{'Issue':<55} {'Count':>8} {'Pct':>7}")
    print("-" * 73)
    for issue, cnt in sorted(issue_counts.items(), key=lambda x: x[1], reverse=True)[:15]:
        print(f"{issue[:54]:<55} {cnt:>8,} {cnt / total * 100:>6.1f}%")
    print()

    print("=== Top 10 Sub-Issues ===")
    print(f"{'Sub-Issue':<70} {'Count':>8}")
    print("-" * 80)
    for key, cnt in sorted(sub_issue_counts.items(), key=lambda x: x[1], reverse=True)[:10]:
        print(f"{key[:69]:<70} {cnt:>8,}")
    print()


def analyze_responses(complaints: list[dict]) -> None:
    """Tabulate company response types and timeliness."""
    response_counts: dict[str, int] = defaultdict(int)
    timely_counts:   dict[str, int] = defaultdict(int)
    disputed_total  = 0
    disputed_yes    = 0

    for c in complaints:
        resp    = c.get("company_response") or "No response"
        timely  = c.get("timely", "")
        dispute = c.get("consumer_disputed", "")

        response_counts[resp] += 1
        if timely in ("Yes", "No"):
            timely_counts[timely] += 1
        if dispute in ("Yes", "No"):
            disputed_total += 1
            if dispute == "Yes":
                disputed_yes += 1

    total = len(complaints) or 1
    timely_total = sum(timely_counts.values()) or 1

    print("=== Company Response Types ===")
    print(f"{'Response':<55} {'Count':>8} {'Pct':>7}")
    print("-" * 73)
    for resp, cnt in sorted(response_counts.items(), key=lambda x: x[1], reverse=True):
        print(f"{resp[:54]:<55} {cnt:>8,} {cnt / total * 100:>6.1f}%")
    print()

    timely_yes = timely_counts.get("Yes", 0)
    print("=== Timely Response Rate ===")
    print(f"  Timely (within 15 days): {timely_yes:,}  ({timely_yes / timely_total * 100:.1f}%)")
    print(f"  Not timely:              {timely_counts.get('No', 0):,}  "
          f"({timely_counts.get('No', 0) / timely_total * 100:.1f}%)")
    print()

    if disputed_total:
        print("=== Consumer Dispute Rate ===")
        print(f"  Disputed:     {disputed_yes:,}  ({disputed_yes / disputed_total * 100:.1f}%)")
        print(f"  Not disputed: {disputed_total - disputed_yes:,}  "
              f"({(disputed_total - disputed_yes) / disputed_total * 100:.1f}%)")
        print()


def analyze_narratives(complaints: list[dict], sample_n: int = 5) -> None:
    """Show a sample of consumer narratives where consent was provided."""
    narratives = [
        c for c in complaints
        if c.get("complaint_what_happened", "").strip()
        and c.get("consumer_consent_provided") == "Consent provided"
    ]

    print(f"=== Consumer Narratives ===")
    print(f"  Complaints with published narrative: {len(narratives):,} "
          f"({len(narratives) / (len(complaints) or 1) * 100:.1f}%)")
    print()

    for i, c in enumerate(narratives[:sample_n], 1):
        cid   = c.get("complaint_id", "N/A")
        date  = c.get("date_received", "N/A")
        issue = c.get("issue", "N/A")
        text  = (c.get("complaint_what_happened") or "").strip()
        # Truncate long narratives for display
        snippet = text[:300] + "..." if len(text) > 300 else text
        print(f"--- Complaint {i}: ID={cid}  Date={date}  Issue={issue} ---")
        print(snippet)
        print()


def analyze_geography(complaints: list[dict], top_n: int = 15) -> None:
    """Complaint count by state."""
    state_counts: dict[str, int] = defaultdict(int)
    for c in complaints:
        st = c.get("state") or "Unknown"
        state_counts[st] += 1

    total = len(complaints) or 1
    print(f"=== Top {top_n} States by Complaint Volume ===")
    print(f"{'State':<8} {'Count':>10} {'Pct':>7}")
    print("-" * 28)
    for st, cnt in sorted(state_counts.items(), key=lambda x: x[1], reverse=True)[:top_n]:
        print(f"{st:<8} {cnt:>10,} {cnt / total * 100:>6.1f}%")
    print()


# ---------------------------------------------------------------------------
# Main: run all analyses for Equifax credit reporting complaints (2022-2023)
# ---------------------------------------------------------------------------

complaints = fetch_complaints(COMPANY, PRODUCT, DATE_FROM, DATE_TO)

if complaints:
    analyze_issues(complaints)
    analyze_responses(complaints)
    analyze_geography(complaints)
    analyze_narratives(complaints, sample_n=3)
else:
    print("No complaints retrieved -- check parameters and network access.")


# ---------------------------------------------------------------------------
# Bonus: compare Equifax vs Experian vs TransUnion complaint volumes
# in the same period using the aggregation endpoint
# ---------------------------------------------------------------------------

def get_company_total(company: str, product: str,
                      date_from: str, date_to: str) -> int:
    """Return the total complaint count for a company using aggregation."""
    params = {
        "company": company,
        "product": product,
        "date_received_min": date_from,
        "date_received_max": date_to,
        "size": "0",
    }
    url = BASE_URL + "?" + urllib.parse.urlencode(params)
    try:
        with urllib.request.urlopen(url, timeout=30) as resp:
            data = json.loads(resp.read().decode("utf-8"))
        return data.get("hits", {}).get("total", 0)
    except Exception as exc:
        print(f"  Error fetching total for {company}: {exc}")
        return 0


bureaus = [
    ("EQUIFAX, INC.",           "Equifax"),
    ("EXPERIAN INFORMATION SOLUTIONS INC.", "Experian"),
    ("TRANSUNION INTERMEDIATE HOLDINGS, INC.", "TransUnion"),
]

print("=== Credit Bureau Complaint Comparison (2022-2023) ===")
print(f"{'Bureau':<16} {'Total Complaints':>18}")
print("-" * 36)
for company_key, label in bureaus:
    total = get_company_total(company_key, PRODUCT, DATE_FROM, DATE_TO)
    print(f"{label:<16} {total:>18,}")
    time.sleep(REQUEST_DELAY)
print()

Key implementation notes. The fetch_complaints function handles pagination automatically, logging progress at every 1,000-record interval. The API's maximum size parameter of 100 records per request means that a dataset of 50,000 complaints requires 500 separate HTTP requests; the 300-millisecond delay keeps the total request time under five minutes for that scale. For very large pulls (200,000+ complaints), the bulk CSV download is substantially faster. The get_company_total function usessize=0 to retrieve only the total hit count without fetching individual records, making it efficient for comparison queries across multiple companies.

Company name matching in the API uses exact string matching against the CFPB's normalized company name field. The canonical names can be found by browsing the CFPB's complaint search UI at consumerfinance.gov, selecting a company from the autocomplete suggestions, and noting the exact string used. Common pitfalls include punctuation differences (the API requires EQUIFAX, INC. with a period, not EQUIFAX INC without one) and the use of full legal entity names rather than trade names. Subsidiary complaints may be filed under the subsidiary name even when the CFPB has routed them to the parent; querying both parent and subsidiary names and de-duplicating by complaint ID is the safest approach for comprehensive company-level analysis.

Narrative analysis at scale benefits from standard NLP tooling. After retrieving narratives, a common workflow is to tokenize and vectorize them using scikit-learn's TF-IDF vectorizer, run k-means clustering to identify common complaint themes not captured by the structured issue taxonomy, and then review cluster centroids to assign labels. This approach has been used in published research to identify emerging complaint categories — such as chatbot-driven customer service failures and buy-now-pay-later billing disputes — before those categories appeared in the CFPB's structured product and issue taxonomy.