Four times a year, the people paid to change the mind of the United States government have to write down what they are doing. Every registered lobbyist and every firm or organization that employs one must file a public report naming the client, the specific bills and rules they pushed, the agencies and chambers of Congress they contacted, and the money that changed hands. The result is the closest thing the country has to a ledger of paid influence—and this dataset captures its most analytically useful dimension: roughly 1,300 lobbying issue-area records, the standardized topic codes that let an analyst see, across thousands of filings, which issues the influence industry is spending its money on.
This article covers what the lobbying-disclosure record is and how the Lobbying Disclosure Act of 1995 created it; the 2007 reforms that strengthened it; who has to register and file, and the distinction between the lobbying firm and the client it works for; the two core filings—the quarterly LD-2 activity report and the semiannual LD-203 contribution report—and the fields each carries; the standardized general issue-area codes that are the subject of this dataset and how they turn free-text lobbying into a countable taxonomy; the Clerk of the House and Secretary of the Senate systems that publish the data and the public APIs they expose; how the record joins to the campaign-finance and foreign-agent disclosures to form the third leg of money-in-politics transparency; a Python workflow that pulls filings, tallies activity by issue-area code, and ranks clients by reported spend; the analytical uses, from tracking issue surges around major legislation to mapping the revolving door; and the caveats—self-reporting, the registration threshold, and the shadow lobbying it leaves uncounted—that every analyst must keep in mind.
What the dataset is
Federal lobbying disclosure is the body of public reports that paid lobbyists, and the entities that employ them, are required by law to file describing whom they lobby and on what. The reports are submitted to, and published by, the Clerk of the United States House of Representatives and the Secretary of the United States Senate. Each filing is rich: it names the client on whose behalf the lobbying was done, the lobbying firm or in-house filerdoing it, the specific issues lobbied (in free text), the general issue-area codesthat classify those issues, the government entities contacted (chambers of Congress and federal agencies), the individual lobbyists who did the work, and the income or expenses for the reporting period.
This dataset captures the issue-area dimension of those filings. A single quarterly report can list several distinct lobbying activities, each tagged with one of the standardized topic codes—TAX for taxation, HCR for health, DEF for defense, TRDfor trade, and so on—and it is those issue-area records, roughly 1,300 of them in our table, that let an analyst see which subjects attract the most lobbying attention. In our database the record is stored as the table lobbying_activity, joinable to the underlying filers and clients and complementary to the campaign-finance and foreign-agent records. The grain is one row per lobbying activity (a client–issue-area pairing within a filing), so a firm that lobbied on tax, trade, and health for one client in one quarter contributes three rows. The columns answer who lobbied for whom, on which issue area, contacting which parts of government, and at what reported cost:
filing_uuid -- the parent LD-2 filing this activity belongs to
filing_year -- the calendar year of the reporting period
filing_period -- quarter: Q1, Q2, Q3, Q4 (LD-2 is quarterly)
registrant_name -- the lobbying firm or in-house organization filing
client_name -- the client on whose behalf the lobbying was done
general_issue_code -- standardized issue-area code (TAX, HCR, DEF, TRD...)
general_issue_desc -- the human-readable label for the code
specific_issues -- free-text description of bills/rules lobbied
government_entities -- chambers and agencies contacted (House, Senate, EPA...)
lobbyist_names -- individual lobbyists credited on this activity
income -- reported income (lobbying-firm filers)
expenses -- reported expenses (in-house/self filers)The general_issue_code is the load-bearing column for this dataset. It is what turns a mountain of free-text descriptions into a countable taxonomy: because every activity must be tagged with one of a fixed set of roughly eighty codes, an analyst can aggregate across the entire corpus to ask which issue areas draw the most lobbying, how that mix shifts over time, and which clients and firms concentrate on which subjects. The filing_uuid and filing_year/filing_periodfields anchor each activity to its parent filing and reporting period, so the data can be rolled up by quarter or year. The registrant_name and client_name are the two parties whose relationship the disclosure exists to expose—the firm being paid and the interest paying it—and the income and expenses fields, reported in ranges or exact figures depending on the filer, are the dollar dimension that lets the influence industry be sized and ranked.
The Lobbying Disclosure Act and the 2007 reforms
The legal backbone of US lobbying transparency is the Lobbying Disclosure Act of 1995 (LDA). Before it, federal lobbying was governed by a 1946 statute so riddled with loopholes—narrow definitions, weak registration triggers, almost no enforcement—that vast amounts of paid influence went entirely unreported. The 1995 Act replaced that regime with a far more comprehensive one. It defined a lobbyist functionally, by what the person actually does and how much of their time it consumes, rather than by a formal title; it required lobbyists and the firms and organizations that employ them to register and then to file regular public reports on their activity; and it centralized the filings with the two congressional officers—the Clerk of the House and the Secretary of the Senate—who publish them. The animating principle was disclosure rather than prohibition: the Act does not limit lobbying, which is a constitutionally protected exercise of the right to petition the government; it makes lobbying visible.
A decade later the regime was substantially strengthened by the Honest Leadership and Open Government Act of 2007 (HLOGA), passed in the wake of high-profile lobbying scandals. HLOGA tightened the system in several consequential ways. It moved the LD-2 activity report from a semiannual to a quarterly schedule, making the data timelier. It created a new semiannual contribution report—the LD-203—requiring registrants and individual lobbyists to disclose their political contributions and certain payments honoring or benefiting covered officials, and to certify compliance with the gift and travel rules. It mandated electronic filing and public, searchable, downloadable online databases, which is precisely what makes bulk analysis of the kind this article describes possible. And it stiffened the civil and criminal penalties for failing to file or filing falsely. Together, the 1995 Act and the 2007 reforms produced the modern structure: a functional definition of lobbying, mandatory registration, quarterly activity reporting, semiannual contribution reporting, and electronic public databases.
Who registers and files
The disclosure system turns on three roles, and keeping them straight is essential to reading the data. The registrant is the entity that files: it is either a lobbying firm retained to lobby on behalf of others, or an organization that employs in-house lobbyiststo lobby on its own behalf. The client is the interest the lobbying is done for—a corporation, a trade association, a labor union, a university, a foreign or domestic government, an advocacy group. And the lobbyist is the individual who does the contacting. When a corporation hires a K Street firm, the firm is the registrant and the corporation is the client, and the two are different parties; when a corporation lobbies through its own government-affairs staff, the corporation is simultaneously the registrant and the client, and the income/expenses distinction in the data reflects exactly this—firms report income received from clients, while in-house filers report expenses incurred.
The obligation to register is not triggered by any lobbying contact whatsoever; it kicks in once activity crosses defined thresholds. In broad terms, an individual must register as a lobbyist for a particular client only if they make more than one lobbying contact for that client and spend at least a meaningful share—twenty percent—of their time over a quarter on lobbying activities for that client, and the registrant must register only once its lobbying income from or expenses for a client exceed dollar thresholds that are periodically adjusted for inflation. These thresholds are deliberate policy choices, and they matter enormously for interpreting the data: a great deal of real influence activity falls below them by design and therefore never appears in the disclosure record at all. The data is a complete and authoritative account of registered lobbying; it is not a complete account of all efforts to influence the federal government, a distinction the caveats section returns to.
The LD-2, the LD-203, and the issue-area codes
The disclosure regime rests on two core forms. The LD-2 is the quarterly lobbying activity report—the workhorse filing and the source of this dataset. Each LD-2 covers one registrant–client relationship for one quarter, and it lists the specific issues lobbied, the general issue-area codes for those issues, the houses of Congress and the federal agencies contacted, the individual lobbyists who worked the account, and the income (for firms) or expenses (for in-house filers) attributable to the lobbying. A registrant with many clients files many LD-2s; a single LD-2 with several distinct lobbying activities generates several rows in a dataset built at the activity grain. The LD-203 is the semiannual contribution report introduced by HLOGA: filed by both registrants and individual lobbyists, it discloses federal political contributions and certain other payments, and carries the lobbyist's certification of compliance with the congressional gift and travel rules. The LD-2 answers “who lobbied on what?”; the LD-203 answers “and where did their political money go?”
The feature that makes the LD-2 data tractable at scale—and that defines this dataset—is the system of general issue-area codes. Rather than leaving every filing to describe its subjects only in idiosyncratic free text, the disclosure forms require each lobbying activity to be classified under one of a fixed set of roughly eighty standardized three-letter codes. Familiar examples include TAX (taxation), HCR (health issues), DEF (defense), TRD (trade, both domestic and foreign), BUD (budget and appropriations), ENV (environment and superfund), TEC (telecommunications), FIN(financial institutions, investments, and securities), and TRA (transportation). Because the codes are standardized and mandatory, they convert a corpus of millions of free-text descriptions into a countable taxonomy: an analyst can tally activity by code to find the most-lobbied issues, watch a code surge as a major bill moves, or profile a client by the mix of codes it lobbies. The codes are coarse—a single code like HCRspans everything from drug pricing to hospital reimbursement to medical research—so the free-text specific_issues field remains indispensable for fine-grained work, but the codes are what make the high-level picture possible.
The House and Senate disclosure systems
Lobbying filings are published through two parallel channels, reflecting the LDA's requirement that filings go to both chambers. The Secretary of the Senate's Office of Public Records operates the lobbying-disclosure system at lda.senate.gov, and the Clerk of the House operates a parallel system, with the public-facing disclosure search at disclosurespreview.house.gov. Registrants file the same information with both offices, so the two systems carry substantially the same underlying record. Both expose the data for analysis beyond the simple web search: the Senate system provides a documented public REST API and the House system provides bulk downloads of filings by year and quarter, and basic use of either requires no special access—the Senate API offers an optional free registration that raises rate limits but is not required for light querying.
For an analyst, the practical implication is that the lobbying record is one of the more accessible federal disclosure datasets. The Senate API returns structured JSON with the filing, its client and registrant, the nested lobbying activities and their issue-area codes, the government entities contacted, and the dollar figures, so the issue-area dataset described here can be reconstructed directly from the API response. The House bulk files are the more efficient route for whole-corpus, multi-year analysis, since they ship every filing for a period in one download rather than requiring thousands of paginated API calls. Because both chambers publish the same filings, a careful pipeline can also use one as a check on the other—a useful guard against the occasional missing, amended, or terminated filing.
Joining to campaign finance and foreign-agent data
The lobbying record is most powerful as one leg of a three-legged stool of money-in-politics transparency, and it is designed to be joined to the other two. Together they answer overlapping but distinct questions about how organized interests engage the federal government.
The first companion is campaign-finance datafrom the Federal Election Commission. Where the lobbying record shows who is paid to press the government on which issues, the FEC record shows who funds the campaigns of the officials being pressed—the contributions, the political action committees, the independent expenditures. The two intersect directly: the HLOGA-mandated LD-203 contribution report links a lobbyist's own political giving to their lobbying registration, and at the organizational level an analyst can set a corporation's or trade association's lobbying spend on a given issue beside the contributions its affiliated PAC makes to the members of the committees with jurisdiction over that issue. That juxtaposition—lobbying activity plus campaign money, aligned to the same issue and the same lawmakers—is the heart of influence analysis.
The second companion is foreign-agent registration under the Foreign Agents Registration Act (FARA). The LDA and FARA are deliberately complementary regimes that divide the universe of influence work: lobbying on behalf of most domestic clients is disclosed under the LDA, while representation of foreign governments and foreign political parties—and certain other foreign principals—is disclosed under FARA, which demands more detailed disclosure of the relationship and the activity. The statutes contain an exemption mechanism that channels a registrant toward one regime or the other, so an analyst tracing influence on a foreign-policy or trade question must consult both: the LDA issue-area record (with codes like TRD for trade or FOR for foreign relations) for domestically retained lobbying, and the FARA record for work done on behalf of foreign principals. Read together, the FEC, LDA, and FARA datasets give a far more complete map of organized influence than any one of them alone.
Analytical uses
A standardized, dollar-tagged, time-stamped record of registered lobbying supports a distinctive set of analyses, and the issue-area codes are what make most of them possible at scale.
Sizing the influence industry is the most immediate use. Summing reported income and expenses by registrant and by client ranks the biggest lobbying firms and the biggest-spending interests, and tracking those totals over time measures whether lobbying spending is rising or falling and where it is concentrating. Because the data carries both the firm and the client, an analyst can study the structure of the market—which firms dominate, which clients spread their work across many firms, how the in-house and contract segments compare.
Mapping issue surges around legislationexploits the issue-area codes and the quarterly cadence together. When a major bill moves—a tax overhaul, a health-care reform, a defense authorization, a financial regulation—the relevant issue-area code typically spikes, as interests on every side staff up to shape the outcome. Plotting activity by code over quarters turns the lobbying record into a real-time map of where the legislative pressure is, often visible before the policy outcome is. Profiling agencies and committees uses the government-entities field to see which parts of government draw the most lobbying attention on which issues—which agencies are the most-contacted, which congressional committees are the focus of which industries.
Finally, tracing the revolving door and the money-in-politics picture brings the joins to bear. The individual-lobbyist field, matched against records of prior government service, surfaces former officials and staff who now lobby—the revolving door between government and the influence industry. Combined with the FEC and FARA joins, the lobbying record lets an analyst assemble a rounded portrait of how a given interest engages Washington: what it pays firms to lobby on, which lawmakers and agencies it targets, where its campaign money goes, and whether any of its work is done on behalf of foreign principals. That integrated view—not any single number from any single dataset—is the payoff of treating the lobbying record as a queryable corpus rather than a search box.
Python workflow: tallying issue areas and ranking clients
The script below pulls LD-2 filings for a quarter from the Senate's public Lobbying Disclosure Act REST API, then computes three of the core metrics: lobbying activity by general issue-area code (which subjects drew the most filings), the top clients by reported spending for the period (combining firm income and in-house expenses), and the most-contacted government entities. No API key is required for light use, though the Senate offers a free registration that raises the rate limit; the House bulk downloads at disclosurespreview.house.gov are the more efficient route for whole-corpus, multi-year work. Requirements: requests and pandas.
import requests
import pandas as pd
from collections import Counter
# Senate LDA REST API -- the Office of Public Records of the Secretary of
# the Senate publishes Lobbying Disclosure Act filings through a public API.
# Registration for a free key raises the rate limit; unauthenticated calls
# work for light use. The House (disclosurespreview.house.gov) exposes a
# parallel public search and bulk download of the same filings.
# filings: https://lda.senate.gov/api/v1/filings/
# docs: https://lda.senate.gov/api/redoc/v1/
BASE = "https://lda.senate.gov/api/v1"
HEADERS = {"User-Agent": "lda-analysis/1.0"}
# HEADERS["Authorization"] = "Token YOUR_API_KEY" # optional, raises limits
def fetch_filings(year, filing_type="Q2", pages=5, page_size=100):
# filing_type Q1..Q4 are the quarterly LD-2 activity reports.
rows = []
url = f"{BASE}/filings/"
params = {
"filing_year": year,
"filing_type": filing_type,
"page_size": page_size,
}
for _ in range(pages):
r = requests.get(url, params=params, headers=HEADERS, timeout=120)
r.raise_for_status()
body = r.json()
rows.extend(body.get("results", []))
url = body.get("next")
params = None # the "next" link already carries paging
if not url:
break
return rows
def money(x):
# income (firm filers) or expenses (in-house filers) as a float.
try:
return float(x)
except (TypeError, ValueError):
return 0.0
def analyze(filings):
# --- 1. Lobbying activity by general issue-area code ------------------
issue_counts = Counter()
for f in filings:
for act in f.get("lobbying_activities", []) or []:
code = (act.get("general_issue_area_code") or "").strip()
if code:
issue_counts[code] += 1
print("Lobbying activity by issue-area code (top 15):")
for code, n in issue_counts.most_common(15):
print(f" {code:<6} {n:>6,} activities")
# --- 2. Top clients by reported spending -----------------------------
spend = Counter()
for f in filings:
client = (f.get("client") or {}).get("name") or "(unknown)"
amt = money(f.get("income")) + money(f.get("expenses"))
spend[client] += amt
print("\nTop 15 clients by reported lobbying spend this period:")
for client, amt in spend.most_common(15):
print(f" {client[:42]:<42} ${amt:>14,.2f}")
# --- 3. Most-lobbied government entities ------------------------------
entities = Counter()
for f in filings:
for act in f.get("lobbying_activities", []) or []:
for ent in act.get("government_entities", []) or []:
name = ent.get("name") if isinstance(ent, dict) else str(ent)
if name:
entities[name] += 1
print("\nMost-contacted government entities (top 15):")
for name, n in entities.most_common(15):
print(f" {name[:46]:<46} {n:>6,}")
filings = fetch_filings(2024, "Q2", pages=5)
print(f"LD-2 filings pulled: {len(filings):,}\n")
analyze(filings)
Two things about this script deserve emphasis. First, the spending calculation sums both income and expenses precisely because the two filer types report on opposite sides of the ledger—a lobbying firm reports the income it received from a client, while an organization lobbying in-house reports the expenses it incurred—so a client total that ignored one or the other would systematically undercount whole categories of filers. Production work should also account for the fact that some filers report dollar figures as ranges rather than exact amounts, and that amended and terminated filings can double-count a relationship if not deduplicated by filing identifier. Second, the issue-area tally is only as clean as the codes, and the same activity can legitimately carry multiple codes; the script counts each code occurrence, which is the right grain for “how much activity touched this issue” but not for “how many filings were primarily about this issue.” Choosing the grain to match the question is the whole art of working with this data.
Limitations and analytical caveats
The lobbying-disclosure record is the most comprehensive public account of paid federal influence in existence, but it is a self-reported regime with structural limits that an analyst must internalize before drawing conclusions.
The data is self-reported, and the categories are coarse. Filers describe their own activity, choose their own issue-area codes, and write their own specific-issue text, and they do so under deadline pressure and with an understandable interest in describing their work in the most favorable light. The result is real variation in how comparable activities get coded—two firms doing similar work may tag it under different issue-area codes, and the free-text descriptions range from precise bill numbers to anodyne phrases like “issues related to appropriations.” The issue-area codes are also broad: a single code can span an entire policy domain, so code-level counts measure attention to a domain, not to any specific bill. Cross-filer and cross-issue comparisons should be made with this coding variation firmly in mind.
The registration threshold leaves real activity uncounted. The LDA's registration triggers—the time-spent test for individuals and the dollar thresholds for registrants—are policy choices that, by design, exclude a great deal of genuine influence work. Activity that falls below the thresholds, or that is structured to fall below them, never enters the record. Critics describe this gap as shadow lobbying: influence work—strategic advice, grassroots and grasstops campaigns, coalition management, advertising aimed at policymakers—that shapes outcomes but is not classified as registrable lobbying and therefore goes undisclosed. The disclosure record is a complete account of registered lobbying; treating it as the complete account of all efforts to influence the government overstates what it covers.
Dollar figures are imprecise, and reporting lags.The income and expense figures are estimates the filer attributes to lobbying, often reported in good-faith ranges and not audited; they do not capture the full cost of an influence campaign, much of which sits in non-lobbying line items. And the data appears on a delay: a quarter's LD-2s are filed after the quarter ends and are subject to later amendment, so the most recent period in any snapshot is incomplete and the dollar totals can move as amendments come in. The record is authoritative for established patterns and multi-quarter trends; it is not a real-time meter of this week's lobbying.
Entity resolution is the hard part of every join.The registrant and client names are free text, and the same corporation appears under many variants—parent and subsidiary, abbreviated and spelled out, with and without legal suffixes—so any analysis that aggregates by organization, or that joins the lobbying record to the FEC or FARA data, lives or dies on the quality of the name normalization applied first. A naive count by raw name will fragment a single interest across a dozen spellings and badly understate its true footprint. The codes are clean; the names are not, and the names are where the analytically interesting joins have to happen.
Held with those caveats, the lobbying_activity table is a uniquely valuable resource: a standardized, dollar-tagged, time-stamped record of who is paid to influence the federal government and on which issues—the legislative half of the money-in-politics picture that, joined to the campaign-finance and foreign-agent records, lets an analyst trace organized influence from the issue, to the firm, to the lawmaker, to the money.
Related writing
FARA Foreign Agent Registrations: The Federal Database Behind Foreign Lobbying and Influence Disclosure — The foreign-principal companion to the LDA: where this dataset discloses lobbying on behalf of domestic clients, FARA discloses representation of foreign governments and parties, and the two regimes deliberately divide the universe of paid influence between them.
FEC Committees: The Federal Registry of Every PAC, Super PAC, and Campaign Committee — The campaign-finance leg of the money-in-politics picture: lobbying shows who is paid to press the government on which issues, while the FEC record shows who funds the campaigns of the officials being pressed, and the LD-203 ties the two together.
Congressional Voting Records: The Federal Database Behind Every House and Senate Roll Call Vote — The outcome side of the influence story: lobbying activity by issue area maps where the legislative pressure is, and the roll-call record shows how that pressure resolved into the votes of the very members and committees the filings name.