Technical writing

HMDA Mortgage Lender Filings: The Federal Record of Who Reports Under the Home Mortgage Disclosure Act

· 12 min read· AI Analytics
CFPBHMDAMortgageFair LendingFederal Data

Every year, thousands of banks, credit unions, and mortgage companies hand the federal government a line-by-line account of who applied for a home loan, who got one, and on what terms—and before any of that loan-level detail makes sense, there has to be a register of which institutions filed in the first place. That register is the HMDA filer panel: roughly 34,700 filer-year records spanning 2018 through 2023, one row per institution per filing year, each naming a lender by its Legal Entity Identifier and recording which federal agency oversees it. It is the index to the most important fair-lending dataset in the United States—the answer to the question that has to be settled before any analysis of mortgage discrimination can begin: who is in the data, and who is not?

This article covers what the filer panel is and how it relates to the loan-level data it indexes; the Home Mortgage Disclosure Act and its implementing rule, Regulation C, and the transparency purpose that animates the whole regime; the migration of HMDA oversight from the Federal Reserve to the Consumer Financial Protection Bureau under the Dodd-Frank Act, with the FFIEC coordinating the multi-agency collection; the loan/application register itself and the 2015 final rule that vastly expanded the data points reported; the privacy and re-identification debate that expansion provoked; the coverage thresholds that decide which institutions must report and the litigation that has reshaped them; how the filer panel keys the loan-level LAR through the Legal Entity Identifier; the analytical uses, from panel turnover and market concentration to fair-lending screening; a Python workflow that pulls the filer panel and lender aggregations from the FFIEC HMDA Data Browser API; and the caveats—coverage gaps, threshold changes over time, and the limits of a registry that records who filed rather than what they did—that every analyst must internalize.

What the dataset is

HMDA data comes in two layers, and confusing them is the most common mistake an analyst makes. The first layer is the loan/application register (LAR): one row per mortgage application or loan, the line-level record of an individual borrower's request, the property, the action the lender took, and—since the 2015 rule—dozens of attributes about the applicant and the loan. The second layer, the subject of this piece, is the filer panel: the registry of the institutions that filed those registers. Each panel row corresponds not to a loan but to an institution in a given year—a filer-year record. The panel is assembled from the transmittal sheet each reporter submits with its LAR, the cover document that identifies the filing institution and summarizes its submission.

In our database this registry is stored as the table hmda_filers, with the grain of one row per institution per filing year: a bank that reports in each of six years contributes six rows. The roughly 34,700 filer-year records span filing years 2018 through 2023—the period governed by the modern, expanded Regulation C and identified by the Legal Entity Identifier. The columns identify the filer and describe its reporting posture rather than any individual loan:

lei                    -- Legal Entity Identifier (the global 20-char ID)
respondent_name        -- the filing institution's name
activity_year          -- the calendar year the data covers (filing year)
agency_code            -- the federal supervisor (OCC, FRS, FDIC, NCUA,
                          CFPB, HUD) that oversees this filer
respondent_state       -- the institution's home state
tax_id                 -- employer identification number of the filer
record_count           -- number of LAR rows the institution submitted
edit_status            -- whether the submission passed validation
total_lines            -- transmittal-sheet line count for the filing
quarterly_filer        -- whether the institution also files quarterly
prior_lei              -- predecessor LEI after a merger / identifier change

The lei is the load-bearing column. The Legal Entity Identifier is a twenty-character, globally unique code assigned to a financial entity under an international standard, and HMDA adopted it with the 2015 rule as the permanent key for every filer. It is the join key that ties a panel row to the same institution's loan-level LAR rows, and—because the LEI is used far beyond HMDA—to that institution's identity across other regulatory datasets. The agency_code records which of the federal financial regulators supervises the filer, a consequence of the fact that HMDA is a multi-agency collection rather than a single agency's database. The respondent_name and activity_year make each row human-readable and time-stamped, and record_count turns the panel into a lightweight volume index: even before touching the LAR, the panel tells you how many loan rows each institution reported in each year. The panel, in short, is the table of contents for the loan-level data, and a dataset in its own right about the structure of the mortgage industry that reports.

HMDA, Regulation C, and the transparency purpose

The Home Mortgage Disclosure Act (HMDA) was enacted in 1975, in the aftermath of a national reckoning with redlining—the practice by which lenders and insurers systematically refused to extend credit in particular neighborhoods, frequently ones defined by the race of their residents, regardless of the creditworthiness of individual applicants. Congress's diagnosis was that the public and regulators could not see the pattern: lending decisions happened one application at a time, behind closed doors, and there was no systematic record of where mortgage credit flowed and where it did not. HMDA's answer was disclosure. Rather than dictating to whom lenders must lend, the statute compels them to report their lending—to make the geographic and, over time, the demographic distribution of mortgage credit visible—on the theory that sunlight is the most effective regulator of discriminatory patterns.

HMDA is implemented through Regulation C, the body of rules that specifies who must report, what they must report, and how. Over its history HMDA has served three intertwined purposes, all written into the statute and the regulation. First, it helps determine whether financial institutions are serving the housing needs of their communities, the transparency function that connects HMDA to the Community Reinvestment Act. Second, it gives public officials information to guide the distribution of public investment to attract private capital where it is needed. Third—and increasingly the dominant use—it assists in identifying possible discriminatory lending patterns and enforcing the antidiscrimination laws. HMDA itself does not prohibit discrimination; it is a disclosure statute. The prohibitions live in the Equal Credit Opportunity Act (ECOA) and the Fair Housing Act, and HMDA data is the principal evidentiary engine that makes enforcement of those laws possible—the dataset regulators and litigants mine to find the disparities that prompt a closer look.

From the Federal Reserve to the CFPB

For most of HMDA's history, rule-writing authority sat with the Federal Reserve Board, which maintained Regulation C. That changed with the Dodd-Frank Wall Street Reform and Consumer Protection Act of 2010, the post-financial-crisis statute that created the Consumer Financial Protection Bureau (CFPB) and consolidated consumer-financial rulemaking authority in it. Dodd-Frank transferred HMDA rulemaking to the CFPB and—significantly—directed the Bureau to expand the data HMDA collects, adding several new data points by statute and authorizing the Bureau to require still more. The crisis had made the mortgage market's opacity intolerable: HMDA as it stood had not captured the pricing and underwriting characteristics that would have exposed the spread of high-cost and nontraditional loans, and Congress wanted the next iteration of the data to see what the last one had missed.

But HMDA was never a single-agency program, and it remains a coordinated, multi-agency collection. The mechanism that holds it together is the Federal Financial Institutions Examination Council (FFIEC), the interagency body whose members include the CFPB, the Office of the Comptroller of the Currency, the Federal Reserve, the Federal Deposit Insurance Corporation, and the National Credit Union Administration. The FFIEC operates the HMDA data collection and publication platform—the system that receives the registers, runs validation, assembles the national dataset, and publishes it. This is why the panel carries an agency_code: a given filer is supervised by whichever federal regulator has jurisdiction over it—national banks by the OCC, state member banks by the Federal Reserve, state nonmember banks by the FDIC, credit unions by the NCUA, and many independent mortgage companies by the CFPB—and that supervisor is the agency that examines the institution for HMDA compliance. The CFPB writes the rule; the FFIEC runs the plumbing; the individual agencies supervise their own filers. The panel is the place where that division of labor becomes visible, one institution at a time.

The loan/application register and the 2015 expansion

The loan/application register is what the panel indexes, and understanding the panel requires understanding what the LAR contains. The original HMDA data was comparatively thin: the geography of the property, the loan type and purpose, the action taken (originated, denied, withdrawn, approved-not-accepted), the amount, and—after a 1989 amendment—the applicant's race, ethnicity, sex, and income. That was enough to map the geographic and demographic distribution of lending, but not enough to control for the legitimate, credit-based factors a lender weighs, which left every disparity open to the rejoinder that it reflected differences in creditworthiness rather than discrimination.

The 2015 final rule, which the CFPB issued under its new Dodd-Frank authority and which took effect for data collected beginning in 2018, transformed the register. It substantially expanded the number of data points, adding fields that for the first time let an analyst account for the pricing and underwriting of a loan. Among the most consequential additions were the rate spread (how far the loan's price sat above a benchmark, the central indicator of a high-cost loan), the applicant's age, the credit score relied on, the debt-to-income ratio, the property value, the loan term and other loan features, and detail about the channel and the loan officer. Taken together, these fields moved HMDA from a map of where credit flowed to a far richer record of on what terms and to whom—the difference between observing that a neighborhood received fewer loans and being able to model whether two otherwise-similar applicants were treated differently. The choice of 2018 as the start year for the expanded data is exactly why our filer panel begins there: it is the first year of the modern, LEI-keyed, data-rich HMDA regime, and the panel from 2018 forward indexes registers built to the new specification.

The privacy and re-identification debate

The same expansion that made HMDA so much more powerful for fair-lending analysis created a genuine tension that the CFPB has had to manage ever since: the richer the loan-level record, the easier it becomes to re-identify an individual borrower from data that is supposed to be public and anonymous. A single mortgage record that combines a precise property location, an exact loan amount, the borrower's age, income, credit score, debt-to-income ratio, and property value is, in a small enough geography, very nearly a fingerprint. Someone who already knows a few facts about a neighbor—roughly when they bought, roughly what they paid—could plausibly pick their record out of the public file and learn their income and credit score. The whole point of HMDA is public disclosure, but disclosing sensitive financial details about identifiable individuals is a different thing entirely.

The CFPB's response was to draw a deliberate line between the data that institutions report and the data the public can see. The full, unmodified loan-level data is collected and is available to regulators for supervision and enforcement; but the publicly released version is modified to reduce re-identification risk. The Bureau applied a balancing test—weighing the benefit of each data point's public disclosure against the privacy risk it created—and decided to withhold some fields from the public file entirely, to release others only as ranges or rounded values rather than exact figures, and to bin or truncate the most sensitive combinations. The result is that the public HMDA data is intentionally coarser than the data the institutions filed. For an analyst this is a structural fact, not a defect: a model of lending disparities built on the public file is working with binned ages, rounded amounts, and redacted fields, and conclusions must be framed accordingly. The filer panel is unaffected by this—it identifies institutions, not borrowers—which is part of why the panel is such a clean, low-risk way to reason about the structure of who reports without touching any of the privacy-sensitive loan-level detail.

Reporting thresholds and who must file

HMDA does not require every entity that ever makes a home loan to report. Coverage is threshold-based, and the thresholds are the single most important fact about the panel, because they define its boundary—every analysis of the filer registry is, at bottom, an analysis of who crossed the reporting threshold. An institution is a covered HMDA reporter only if it meets a set of criteria: it is a financial institution of a covered type, it has a home or branch office in a metropolitan area, it meets an asset-size test (for depository institutions), and—the pivotal criterion—it originated at least a specified number of covered loans in each of the two preceding calendar years.

Those loan-count thresholds are set separately for the two product families HMDA covers. Closed-end mortgage loans—ordinary purchase and refinance mortgages—carry one origination threshold, and open-end lines of credit—principally home equity lines—carry another. An institution that originates fewer than the threshold number of a product in either of the two prior years is not required to report that product. The closed-end threshold has been set at 25 originations in each of the two preceding years; the open-end threshold has been higher and has changed over time. The practical effect is to exclude the smallest-volume lenders from reporting, on the rationale that the compliance burden of reporting outweighs the analytic value of a handful of loans from a marginal originator.

The thresholds have been litigated and adjusted, and this is not an academic point for anyone studying the panel over time. When the CFPB raised the open-end threshold and later raised the closed-end threshold, it changed which institutions were required to file—and a legal challenge to a threshold increase resulted in a court vacating one of the higher thresholds, restoring a lower one and pulling a band of smaller lenders back into the reporting population. The upshot is that the composition of the panel from one year to the next reflects not only real entry and exit in the mortgage market but also regulatory and judicial changes to the coverage line. A year-over-year change in the count of filers can mean that lenders entered or left the business, or it can mean that the threshold moved and a stratum of small lenders crossed in or out of coverage. Distinguishing the two requires knowing the threshold history for each year, and treating the panel's boundary as a moving, policy-determined object rather than a fixed census of mortgage lenders.

Joining the panel to the loan-level data and beyond

The filer panel is most valuable as the index that makes the loan-level data tractable, and the lei is the universal join key that connects it. Three joins matter most.

The first and most important is to the loan/application register itself. Every LAR row carries the LEI of the institution that filed it, so joining hmda_filers to the LAR by LEI and activity year attaches an institution's full identity—name, supervising agency, home state, total reported volume—to each of its loans. This is what lets an analyst move fluidly between the two grains: aggregate the LAR up to the lender to validate a panel's record_count, or push a panel's institution attributes down onto every loan to compare, say, the denial-rate behavior of bank filers versus independent mortgage company filers. Because the panel is small and the LAR is enormous, the panel is the natural place to do filtering and selection— pick the institutions you care about from the panel, then pull only their LAR rows.

The second join is across years, by LEI. Because the LEI is persistent, the panel can be read longitudinally: the same identifier appearing in successive activity years traces an institution's continuous presence as a reporter, and its disappearance flags an exit—whether through failure, acquisition, or simply falling below the reporting threshold. The prior_lei linkage, where present, stitches together identifiers across mergers and reorganizations so that a renamed or acquired institution can be followed through the change. This longitudinal view is the foundation of panel-turnover and market-structure analysis.

The third, broader join is to other datasets keyed on the same entity. The LEI is a global standard used well beyond HMDA, and the supervising agency, name, and tax identifier in the panel make it possible to relate a HMDA filer to the same institution's call reports, its CRA evaluation, its consumer-complaint record, and its enforcement history. A bank that shows striking denial-rate disparities in its HMDA data is a more interesting object when its CRA rating, its complaint volume, and any enforcement actions can be set beside it—and the panel, by pinning down exactly which legal entity filed, is what makes that cross-dataset assembly reliable rather than a guess based on name matching.

Analytical uses

A national, LEI-keyed, multi-year registry of mortgage reporters supports a distinctive set of analyses, some that the panel answers on its own and some where it is the necessary scaffolding for work on the loan-level data.

Panel composition and turnover is the use the registry answers directly. Counting filers per year, by supervising agency and by home state, and tracking which LEIs enter and exit, produces a map of the structure of the reporting mortgage industry and how it shifts—the rise of independent mortgage companies relative to depository institutions, consolidation among banks, and the churn at the small end of the market. The essential caveat, developed in the next section, is that some of the observed turnover is driven by threshold changes rather than real market movement, so turnover metrics must be read against the coverage history.

Market concentration exploits the record_count in the panel, or the application counts from the Data Browser. By ranking filers by reported volume and computing the share of all records held by the largest handful, an analyst can measure how concentrated mortgage origination has become and watch that concentration change over time—a structural question about competition in the mortgage market that the panel can answer before any loan-level work begins.

Fair-lending screening is where the panel earns its keep as scaffolding. The classic HMDA analysis—comparing denial rates, or the incidence of high-cost loans, across racial and ethnic groups, controlling for income and, since 2018, for credit characteristics—is performed on the loan-level data, but it is structured by the panel: the panel is how an analyst selects the institutions to examine, attaches the supervising agency that would handle a referral, and sizes each lender's footprint. Finally, CRA and community-needs assessment brings the geographic distribution of a panel's lending to bear on whether an institution is serving its assessment area—the original transparency purpose of the statute, now executed with a far richer dataset than the drafters of 1975 could have imagined.

Python workflow: the filer panel and lender aggregations

The script below uses the FFIEC HMDA Data Browser API—the public service the FFIEC and CFPB operate over the published HMDA data—to pull the filer panel for each activity year and a single lender's loan aggregations. It computes three of the core metrics: the panel size and turnover across filing years (how many institutions reported each year, and how many stopped or started filing between the first and last available year); market concentration (the share of all reported records held by the fifty largest filers in the most recent year); and a single large lender's originated-loan footprint by state. No API key is required. The /filers endpoint returns the panel-style registry our table mirrors, and the /aggregations endpoint returns counts and loan-amount sums for a lender keyed by its LEI; any production use should page through results and validate against the current Data Browser documentation, and should remember that the threshold history governs how the year-over-year panel counts must be interpreted.

import requests
import pandas as pd
from collections import defaultdict

# FFIEC HMDA Data Browser API -- no API key required for public data.
# Two endpoints are used together:
#   1. /filers       -- the registry of institutions that filed in a year
#                       (this is the panel-style record our table mirrors)
#   2. /aggregations -- counts and loan-amount sums for a lender, sliced
#                       by year / state / action taken, keyed by the LEI
# Docs: https://ffiec.cfpb.gov/documentation/api/data-browser/
BASE = "https://ffiec.cfpb.gov/v2/data-browser-api/view"


def filers(year):
    # Returns {"institutions": [{lei, name, count, period}, ...]} -- one
    # entry per institution that filed a loan/application register that year.
    r = requests.get(f"{BASE}/filers", params={"years": year}, timeout=120)
    r.raise_for_status()
    return r.json().get("institutions", [])


def aggregations(lei, years, action="1"):
    # /aggregations needs at least one HMDA data filter; actions_taken=1
    # restricts to originated loans. Returns {"aggregations": [...]} with
    # a count and a sum per requested slice.
    params = {"years": years, "leis": lei, "actions_taken": action}
    r = requests.get(f"{BASE}/aggregations", params=params, timeout=120)
    r.raise_for_status()
    return r.json().get("aggregations", [])


# --- 1. Panel size and turnover across filing years --------------------
years = ["2018", "2019", "2020", "2021", "2022", "2023"]
panels = {}
for y in years:
    try:
        panels[y] = {f["lei"]: f["name"] for f in filers(y)}
    except requests.HTTPError:
        continue  # a year may not yet be published

print("Filers reporting per year:")
for y in years:
    if y in panels:
        print(f"  {y}: {len(panels[y]):,} institutions")

# Filers present in the first available year but gone by the last.
if len(panels) >= 2:
    first, last = sorted(panels)[0], sorted(panels)[-1]
    dropped = set(panels[first]) - set(panels[last])
    added = set(panels[last]) - set(panels[first])
    print(f"\nPanel turnover {first} -> {last}: "
          f"{len(dropped):,} stopped filing, {len(added):,} newly filing")


# --- 2. Concentration: share of filers vs share of originations -------
# Use the most recent year's panel and rank by reported application count.
if panels:
    latest = sorted(panels)[-1]
    rows = sorted(filers(latest), key=lambda f: -f.get("count", 0))
    total = sum(f.get("count", 0) for f in rows) or 1
    top50 = sum(f.get("count", 0) for f in rows[:50])
    print(f"\n{latest}: top 50 of {len(rows):,} filers account for "
          f"{top50 / total:.1%} of all reported records")


# --- 3. A single lender's originated-loan footprint by state ----------
# Pick the largest filer in the latest year and aggregate its originations.
if panels:
    biggest = max(filers(latest), key=lambda f: f.get("count", 0))
    aggs = aggregations(biggest["lei"], latest, action="1")
    by_state = defaultdict(lambda: [0, 0.0])
    for a in aggs:
        st = a.get("state") or "(nationwide)"
        by_state[st][0] += a.get("count", 0)
        by_state[st][1] += float(a.get("sum", 0) or 0)
    top = sorted(by_state.items(), key=lambda kv: -kv[1][0])[:10]
    print(f"\n{biggest['name']} ({latest}) -- top states by originations:")
    for st, (n, amt) in top:
        print(f"  {st:<14} {n:>8,} loans   ${amt/1e9:>6.2f}B")

Two practical notes apply. First, the panel-turnover calculation in the script is deliberately simple: it treats any LEI present in the first year but absent in the last as a departure, and the reverse as an arrival. A rigorous reading must net out the threshold-driven churn— an institution that “left” the panel may simply have dropped below a raised reporting threshold rather than exited the mortgage business—and should follow the prior_lei links so that an institution that merely changed identifiers through a merger is not double-counted as both a departure and an arrival. Second, for national-scale work—reproducing the full LAR, building the multi-year fair-lending panel, or assembling the inventory-joined market-structure picture—the FFIEC's bulk filer/panel files and the full LAR downloads are far more efficient than thousands of paginated API calls and ship with the authoritative, version-stamped field definitions for each year's collection.

Limitations and analytical caveats

The filer panel is the most authoritative public record of who reports mortgage lending in the United States, but it is a coverage-defined registry, and several features must be held in mind before drawing conclusions from it.

The panel is not a census of mortgage lenders. Because coverage is threshold-based, the registry deliberately omits the smallest-volume originators and any lender that fails the type, location, or asset tests. A study that treats the panel as the universe of mortgage lenders will systematically miss the long tail of small and non-covered originators—and in some local markets that tail is not negligible. The panel answers the question “who is required to report?” precisely; it does not answer “who makes home loans?” The two questions have meaningfully different answers, and conflating them overstates how complete the picture is at the small end of the market.

Threshold changes confound year-over-year comparisons.The reporting thresholds have moved over the panel's span—raised by rule, and in at least one instance restored by court order—and each change reshapes the population of filers independently of any real movement in the market. A drop or jump in the filer count between two years may be a coverage artifact rather than a market event. Any longitudinal analysis of the panel must be anchored to the threshold history for each year, and turnover metrics should be computed only across years with comparable coverage, or explicitly adjusted for the threshold that applied.

The panel records who filed, not what they did. The registry identifies institutions and summarizes their submissions; it does not, by itself, say anything about lending outcomes—denials, pricing, disparities. Those questions live in the loan-level LAR, and the LAR's own limitations then apply: the public file is coarsened to protect privacy, missing and “not applicable” codes are common in particular fields, and the data describes the action taken on an application without capturing the full credit context behind it. The panel is the right tool for market-structure and coverage questions and the indispensable index for the loan-level work, but it cannot be made to answer fair-lending questions on its own.

Identity is stable but not seamless. The LEI is a durable key, which is a major strength, but mergers, acquisitions, and reorganizations still introduce seams: an institution can change identifiers, and following a lender continuously across such events requires the prior_lei linkage and, sometimes, manual reconciliation. Name matching to non-LEI datasets is fragile for the usual reasons, and even within HMDA the mapping of an institution to its corporate parent is not always explicit in the panel. Analysts who need a clean corporate-family view—all the entities under one holding company—must build that mapping from outside the panel rather than assuming the registry supplies it.

Held with these caveats in mind, the hmda_filers table is a uniquely useful resource: roughly 34,700 filer-year records that name, year by year and agency by agency, exactly which institutions opened their mortgage lending to public scrutiny under the Home Mortgage Disclosure Act—the index that turns the most important fair-lending dataset in the country from a wall of loan rows into a record you can navigate, one reporter at a time.

Related writing

CFPB Consumer Complaint Database: The Federal Record Behind 3 Million Financial Product Complaints — The other major CFPB-run public dataset: where HMDA shows the mortgage lending an institution reported, the complaint database shows what consumers said went wrong, and the two can be set beside the same lender to read disparities against grievances.

SBA PPP Loans: The Federal Database Behind 11.8 Million Pandemic Paycheck Protection Loans — Another loan-level federal disclosure regime built on lender reporting, where the same fair-lending questions about who received credit and on what terms recur, and where the registry of participating lenders is again the key to the loan-level data.

SBA Loan Programs: The Federal Database Behind $50 Billion in Annual Small Business Financing — A parallel lender-keyed federal credit dataset whose participating-lender structure, like the HMDA panel, indexes the loan-level records and supports the same concentration and access-to-credit analysis across geography and borrower group.