CRS Reports: The Federal Database Behind Congress’s Own Nonpartisan Research

For most of the past century, the single most authoritative plain-English explainer of almost any federal law or program existed—and members of Congress could read it—but the public could not. The Congressional Research Service writes nonpartisan analysis on demand for the lawmakers it serves, and for decades those reports circulated only inside the Capitol. A 2018 appropriations law finally pried them open. The result is a corpus of roughly 23,200 public CRS reports—one row per report, each carrying a unique product number, a title, a subject, a date, and the full text—a uniquely citable, neutral account of how American government actually works, written by Congress for Congress and now readable by everyone.

This article covers what the Congressional Research Service is and why it sits inside the Library of Congress rather than in an executive agency; the long history of reports that were written for Congress but withheld from the public, and the Consolidated Appropriations Act of 2018 that ended that arrangement; the anatomy of a CRS report—the product number, the title, the subject or policy area, the date, and the full text—and the all-important fact that the same product number is revised over time into multiple versions; how CRS earns its reputation as the most citable nonpartisan secondary source on US federal policy and who relies on it; how the corpus joins to statutes, the CFR, and the committees it serves; the analytical uses, from mapping the policy agenda over time to tracing how an explainer of a law evolves as the law changes; a Python workflow that pulls the EveryCRSReport bulk index, tallies reports by subject and year, and searches titles by keyword; and the caveats—the public corpus is not the whole corpus, reports are secondary sources, and a research archive was never designed to be a clean dataset—that every analyst must keep in mind.

What the dataset is

The Congressional Research Service, universally abbreviated CRS, is the in-house public-policy research arm of the United States Congress. It is staffed by analysts, attorneys, economists, and subject specialists whose job is to answer, confidentially and on demand, the substantive questions that members and committees confront across the entire range of federal policy—tax, defense, health, immigration, energy, trade, appropriations, and everything else Congress legislates on. Much of that work is private: a memo to a single office, a briefing for a committee. But a large share of it takes the form of CRS reports: standing written products that explain a statute, a program, an agency, or an emerging issue in careful, neutral, heavily sourced prose. Those reports—the non-confidential ones—are the subject of this piece.

In our database the corpus is stored as the table crs_reports, with the grain of one row per report: each row is a single CRS product, identified by its product number, with the title, the subject or policy area, the publication or update date, and a link to the document itself. The corpus comprises roughly 23,200 reports, searchable by topic and keyword. The columns capture what each report is, what it is about, when it was last revised, and where to read it:

product_number   -- unique CRS product ID (e.g. R44636, RL30315, IF10244)
title            -- the report title (carried on each version)
subject          -- subject / policy area topic(s) assigned to the report
policy_area      -- broad policy domain the report falls under
date             -- latest publication or update date
version_date     -- date of a specific revision (same number, many versions)
summary          -- the report's abstract / overview
url              -- link to the full document (HTML and PDF)
source           -- crsreports.congress.gov / community mirror

The product_number is the load-bearing column. Every CRS report carries a unique product number, and the leading letters encode the product type: the long-form analytical reports begin with R or the older RL, while the short, two-page In Focus briefs begin with IF. That number is the persistent identifier that lets a report be cited, retrieved, and—crucially—tracked across its revisions. Because CRS revises a report whenever the law or program it describes changes, the same product number can have several versions over time, each with its own date and its own text. A report on a tax provision written before a reform, and the same report rewritten after the reform passes, share a product number but are different documents. Any analysis of the corpus has to decide, deliberately, whether its unit is the product number (the report as an evolving thing) or the version (a specific dated snapshot)—a distinction the limitations section returns to.

What CRS is and why it lives in the Library of Congress

CRS is a department of the Library of Congress, and that placement is not an accident of bureaucracy—it is the structural guarantee of the service's independence. The Library of Congress is a legislative-branch institution, the research library of the Congress itself, and locating CRS inside it keeps the service firmly on the legislative side of the constitutional line, answerable to Congress rather than to the executive agencies whose programs it so often analyzes. A report on how the Department of Energy administers a program is written by an arm of Congress, not by the Department of Energy—and that separation is precisely what makes the report worth reading.

The service traces its origins to the early twentieth century, when Congress created a Legislative Reference Service within the Library to supply lawmakers with factual research. The Legislative Reorganization Act of 1970 renamed it the Congressional Research Service and substantially expanded its mandate, charging it with providing Congress not merely with reference facts but with analysis—objective, nonpartisan assessment of policy alternatives and their consequences. That 1970 charge is the institutional DNA of the modern service: CRS does not advocate, does not recommend policy, and does not take sides. It lays out what a law does, how a program operates, what the options are, and what is known and contested about their effects, and it leaves the choosing to the elected members who are its clients.

That client relationship is the second defining structural fact. CRS works for the Congress—exclusively, confidentially, and on demand. Its primary audience is the member offices and committees that request its work, and historically its products were treated as a resource for those clients rather than as publications for the world. The analysts are accountable to Congress as an institution, not to either party or to any individual member, which is what lets a report be genuinely nonpartisan: it is not produced to win an argument but to inform a body whose members hold every position on the argument. Understanding that CRS is Congress's own research staff—not a think tank, not an executive agency, not an advocacy organization—is the key to understanding both the authority of the reports and the history of their public availability.

The reports the public could not read, and the 2018 law that opened them

For most of the service's history, CRS reports were not systematically available to the public. They were written for members and committees, and the prevailing position was that they were a work product of the legislative branch for the use of Congress—not documents Congress had chosen to publish. There was no legal bar on a member sharing a report with a constituent, and reports leaked into public view constantly: individual offices posted them, advocacy organizations collected and republished them, and over the years private and nonprofit efforts assembled large unofficial archives. But there was no official, comprehensive, public release. The paradox was stark—some of the most trusted, neutral analysis of federal policy in existence, paid for by taxpayers, was available to the public only through back channels and incomplete mirrors.

That changed with the Consolidated Appropriations Act of 2018, which directed the public release of non-confidential CRS reports. The law instructed the Library of Congress to establish a public website and to publish the agency's non-confidential reports there going forward, along with a body of existing reports. The result is crsreports.congress.gov, the official public home of the corpus, where reports are now posted and updated as a matter of course. The word that does the work in the statute is non-confidential: what the 2018 Act opened was the standing report products, not the confidential memoranda and custom analyses that CRS prepares for individual offices. The confidential client work—the bespoke briefing for a single committee, the private memo—remains private, as the service's confidential relationship with its congressional clients requires.

The practical consequence is the dataset this article describes. Since 2018 there has been an authoritative public source for non-confidential CRS reports, and alongside the official site the long-running community mirrors—most prominently EveryCRSReport, which predates the official release and continues to offer a clean bulk index—make the corpus accessible at scale for analysis. The reports were always public goods in spirit; the 2018 law made them public goods in fact, and turned a scattered set of leaked PDFs into a structured, queryable archive of Congress's own nonpartisan research.

Anatomy of a report: product number, subject, date, versions

A CRS report has a recognizable, stable structure, and understanding its components is what lets the corpus be read as data rather than as a pile of documents. Each report addresses a single policy topic—a statute, a program, an agency, an emerging issue—and is built to be a self-contained, neutral explainer of that one thing.

The product number, discussed above, is the spine. It uniquely identifies the report and signals its type: the R and RL series are the substantial analytical reports, often running to dozens of pages, while the IF (In Focus) series are the short, dense, two-page primers designed to be read quickly. The title states the report's subject in plain terms—CRS titles are descriptive rather than clever, because the report is a reference work, not a piece of advocacy. The subject or policy area situates the report within the landscape of federal policy: reports are tagged with topic terms that let a reader find all the reports on, say, trade, or immigration, or appropriations, and those tags are what make subject-level analysis of the corpus possible.

The date is more subtle than it looks, and it is the feature most likely to trip up an analyst. A CRS report is not a one-time publication; it is a living document that the service revises whenever the underlying law or program changes. When a tax bill passes, the report explaining the relevant section of the tax code is updated; when a program is reauthorized, the report on that program is rewritten. Each revision is a new version carrying its own date, but all the versions share the same product number. The corpus therefore has two natural units. One is the report as an evolving entity—the product number, with a most-recent date and a chain of prior versions behind it. The other is the version—a specific dated snapshot of the text as it stood at one moment. A count of reports and a count of versions are different numbers, and a question like “how did the explanation of this statute change after the law was amended?” can only be answered by working at the version level. The full text of each version—in HTML and PDF—is the payload: the actual neutral exposition that is the reason the corpus is valuable at all.

Why CRS reports are the most citable nonpartisan secondary source on federal policy

CRS reports occupy a peculiar and valuable niche in the ecosystem of writing about American government. They are not primary sources—they are not the statute, the regulation, or the court opinion itself. But they are the most authoritative secondary source available on how those primary sources actually work, and several features combine to give them that standing.

The first is neutrality by design. Because CRS exists to serve a body whose members hold every political position, its reports are written to be usable by all of them. They describe what a law does and what the competing arguments are without endorsing any of them; they explain the structure of a debate rather than taking a side in it. That discipline—explaining without advocating—is exactly what makes a CRS report trustworthy to a reader who does not share the writer's politics, because the writer's politics are deliberately absent. The second is authority and sourcing: the reports are written by subject-matter specialists, are carefully and transparently sourced, and are revised as the facts change, which gives them an accuracy and currency that ordinary journalism and most advocacy writing cannot match. The third is accessibility: a CRS report translates the dense machinery of a statute or a program into clear prose, making it the document a reader reaches for to understand a law before—or instead of—reading the law itself.

The consequence is that CRS reports are relied on far beyond the chamber that commissions them. Journalists cite them as the neutral baseline account of how a contested program works; researchers and scholars use them as authoritative starting points and as citable summaries of current law; courts have cited CRS reports as persuasive secondary authority on the operation of federal statutes; and the public—now that the reports are freely available—turns to them as the plain-language explainer of policy questions that the primary sources answer only opaquely. A CRS report is, in effect, the document that a careful person reaches for when they want to understand a federal law accurately and without spin, which is a rarer thing than it should be and is exactly why opening the corpus to the public mattered.

Joining to statutes, the CFR, and the committees CRS serves

The corpus is most powerful not as a standalone library but as a connective layer over the rest of the federal record, and several joins make that connectivity concrete—though, as the caveats stress, most of them run through the report text rather than through a tidy key column.

The most natural join is to statutes and the bills that become them. A great many CRS reports are, in substance, explainers of a particular law: its purpose, its provisions, its history, and the issues it raises. Because the reports name and discuss the public laws, the bill numbers, and the United States Code sections they describe, the corpus can be linked—through citation extraction from the text—to the legislative record and to enacted law, turning a CRS report into the human-readable companion to a statute. A reader who has the text of a law can find the CRS report that explains it; an analyst who has a CRS report can trace it back to the legislation it analyzes.

A parallel join runs to the Code of Federal Regulations. Where a statute directs an agency to write rules, the CRS report on that statute frequently discusses the implementing regulations, citing the relevant CFR parts and explaining how the agency has carried out the legislative mandate. The reports therefore sit at the seam between the statute and the regulation, narrating the path from what Congress enacted to what the executive branch actually did with it—the same seam that much of federal regulatory analysis lives on. And the corpus joins, conceptually, to the committees CRS serves: the service's work tracks the jurisdiction of the congressional committees that request it, so the distribution of reports across subjects maps onto the committee structure of Congress and, through it, onto the policy agenda of the legislature at any given moment. Reading the corpus by subject over time is, in effect, reading the changing preoccupations of the committees that drive American lawmaking.

Analytical uses

A complete, dated, subject-tagged archive of Congress's own nonpartisan research supports a distinctive set of analyses that no other federal corpus can.

Mapping the policy agenda over time is the most direct use. Because every report carries a subject and a date, tallying reports by subject and year reveals what Congress was paying analytical attention to and how that attention shifted— the rise of a subject area as a new issue forces itself onto the agenda, the surge of reports around a major piece of legislation, the steady baseline of perennial topics like appropriations and the budget. The corpus becomes a barometer of the legislative agenda, with the volume of nonpartisan research on a topic standing in as a measure of how much that topic occupied Congress.

Tracing how an explainer evolves exploits the version structure. Because the same product number is revised as the underlying law changes, an analyst can line up the versions of a single report in time and read how CRS's neutral account of a statute or program changed—watching, for instance, the report on a tax provision rewrite itself after a reform, or the report on a program update its description as the program is reauthorized. This turns the corpus into a record not just of policy but of how policy was understood and explained at each stage, a kind of running, authoritative commentary on the moving target of federal law.

Citation and full-text mining brings the report bodies into play. Because the full text is available, the corpus can be searched and parsed at scale: extracting the statutes, regulations, agencies, and prior reports each report cites builds a graph linking the explainers to the law they explain and to one another; running keyword and topic searches across titles and bodies surfaces every report touching a given issue. And grounding other analysis is the quiet, pervasive use: because CRS reports are neutral and authoritative, they serve as the reference layer that makes sense of harder datasets—the explainer of the program behind a spending series, the account of the statute behind an enforcement record—so that a number in another federal dataset can be interpreted against an authoritative description of the law that produced it.

Python workflow: the EveryCRSReport bulk index by subject and year

CRS reports are published officially at crsreports.congress.gov, but for bulk analysis the community mirror EveryCRSReport is the more convenient source: it offers a flat CSV index of the entire corpus plus one JSON metadata file per report, with no API key and no rate limit on the index. The script below downloads the index, computes the count of reports by latest-update year, samples a slice of the index to tally reports by subject area (subjects live in each report's metadata rather than in the index), and runs a keyword search across report titles (which are carried in the index itself). To keep the subject example fast it samples a few hundred reports rather than fetching all of the roughly 23,200 metadata files; a production run would page through the full index and cache the JSON locally. Requirements: requests and pandas.

import requests, csv, io
import pandas as pd
from collections import Counter

# EveryCRSReport.com publishes the full corpus of public CRS reports as a
# bulk index plus one JSON metadata file per report. No API key is required.
# The official source is crsreports.congress.gov, but the community mirror
# below offers a flat index that is far easier to pull at scale.
#   - index (CSV):  https://www.everycrsreport.com/reports.csv
#   - per report:   https://www.everycrsreport.com/reports/<NUMBER>.json
INDEX_URL = "https://www.everycrsreport.com/reports.csv"
BASE      = "https://www.everycrsreport.com/"


def load_index(url=INDEX_URL):
    r = requests.get(url, timeout=120)
    r.raise_for_status()
    text = r.content.decode("utf-8", errors="replace")
    reader = csv.reader(io.StringIO(text))
    rows = list(reader)
    # The index is a header row + one row per report. Columns:
    #   number, metadata_path, sha1, latest_pub_date, title,
    #   latest_pdf_filename, latest_html_filename
    cols = ["number", "metadata", "sha1", "date", "title", "pdf", "html"]
    return pd.DataFrame(rows[1:], columns=cols[: len(rows[0])])


def report_meta(number):
    # The metadata JSON carries the report's subject "topics" (a list of
    # strings at the top level) plus a "versions" array -- the same product
    # number revised over time, most recent first.
    r = requests.get(f"{BASE}reports/{number}.json", timeout=60)
    r.raise_for_status()
    return r.json()


idx = load_index()
idx["date"] = pd.to_datetime(idx["date"], errors="coerce")
print(f"Public CRS reports in index: {len(idx):,}")

# --- 1. Reports per year (by latest publication / update date) ----------
idx["year"] = idx["date"].dt.year
print("\nReports by latest-update year (most recent 10):")
for yr, n in idx["year"].value_counts().sort_index(ascending=False).head(10).items():
    print(f"  {int(yr)}  {n:>5,}")

# --- 2. Tally by subject area --------------------------------------------
# Subjects are NOT in the flat index; they live in each report's metadata
# "topics" array, so we sample a slice of the index and pull topics rather
# than fetching all ~23k JSON files.
sample = idx.head(400)
subjects = Counter()
for number in sample["number"]:
    try:
        meta = report_meta(number)
    except requests.HTTPError:
        continue
    for topic in meta.get("topics", []):
        name = topic.get("topic") if isinstance(topic, dict) else str(topic)
        if name:
            subjects[name] += 1

print("\nTop 15 subject areas (sampled):")
for name, n in subjects.most_common(15):
    print(f"  {name[:44]:<44} {n:>4,}")

# --- 3. Keyword search across report titles ------------------------------
# Titles ARE in the index, so this needs no per-report fetch: just filter
# the title column of the DataFrame we already loaded.
def search_titles(df, keyword):
    mask = df["title"].fillna("").str.contains(keyword, case=False)
    return df.loc[mask, ["number", "title"]]

for _, row in search_titles(idx, "tariff").head(20).iterrows():
    print(f"  {row['number']}: {row['title'][:60]}")

Two design choices in the script are worth drawing out. First, the index and the metadata carry different things on purpose: the flat reports.csv index is small and cheap and gives you the product number, the latest date, the title, and the document links, but the subject topics live only in the per-report JSON, which is why the subject tally has to fetch the metadata files while the title search can run entirely off the index. For anything beyond exploration you should download the JSON files once and store them, rather than re-fetching on every run. Second, the date in the index is the latest publication or update date, so a count by year built from the index measures when reports were most recently touched, not when they first appeared—a report originally written years ago and updated last month counts in the current year. To study original-publication dates or the full revision history you have to descend into the versions array in each report's metadata, where every dated snapshot is recorded. The distinction between the report and its versions, raised throughout this article, is not a pedantic one—it is the difference between two genuinely different counts.

Limitations and analytical caveats

The public CRS corpus is an extraordinary resource, but it is a research archive opened to the public by statute—not a dataset engineered for analysis—and several features must be held in mind before drawing conclusions from it.

The public corpus is not the whole corpus. What the 2018 law opened was the non-confidential standing reports; the confidential client work—the custom memoranda and bespoke analyses CRS prepares for individual offices and committees—remains private, as it must, given the confidential relationship between the service and its congressional clients. The public reports are a large and representative slice of CRS's output, but they are a slice. Any inference about the total volume or distribution of the service's work from the public corpus alone will understate it, because the most tailored, request-specific work is by design not in the public set.

Reports are secondary sources, not the law. A CRS report is an authoritative description of a statute, a regulation, or a program—but it is a description, written by an analyst, not the binding text itself. It carries no legal force; it can simplify, it reflects the state of the law at its revision date, and where it characterizes contested questions it is summarizing a debate rather than resolving it. Citing a CRS report as if it were the statute, or treating its account as the final word on a disputed legal question, over-reads what a secondary source can bear. The report is the best available guide to the primary sources; it is not a substitute for them.

The report-versus-version distinction breaks naive counts.Because the same product number is revised repeatedly, the corpus has two units, and conflating them produces wrong numbers. A count of distinct product numbers and a count of versions are different totals; a tally of “reports per year” built on latest-update dates attributes an old report to the year of its most recent revision, not its origin. Trend analysis that does not decide deliberately whether its unit is the report or the version, and whether its dates are first-publication or latest-update, will measure an artifact of the revision process rather than a real change in the underlying activity. The single most common mistake with this corpus is to treat the latest-update date as a publication date.

Subject tagging and metadata are uneven, and a research archive is not a clean dataset. The subject and policy-area tags are a real and useful structure, but they were created to help readers find reports, not to support rigorous classification, so they vary in granularity and consistency across the corpus and over time. The metadata is shaped around documents—titles, dates, file links—rather than around the normalized fields an analyst would design, which is why subject-level work requires reaching into per-report metadata and why citation linking has to be extracted from free text rather than read from a key column. Held with these caveats in mind, the crs_reports table is a uniquely valuable resource: roughly 23,200 neutral, authoritative, citable explanations of how American federal law and policy actually work—Congress's own research, written for Congress and, since 2018, readable by everyone.

Related writing

Congressional Voting Records: The Federal Database Behind Every House and Senate Roll Call Vote — CRS reports explain the policy questions Congress confronts; the roll-call record shows how members actually voted on them, and reading the two together connects the nonpartisan analysis of a bill to the partisan tally that decided it.

Regulations.gov: The Federal Database Behind 25 Million Public Comments on US Rulemaking — Where a CRS report narrates the statute and the agency rule it authorizes, the rulemaking docket records the public's response to that rule, and the two sit on opposite ends of the same statute-to-regulation seam.

FARA Foreign Agent Registrations: The Federal Database Behind Foreign Lobbying and Influence Disclosure — CRS supplies Congress with neutral analysis of the issues before it; the FARA registry discloses the foreign interests trying to shape those same issues, the disinterested and the interested halves of the information environment lawmakers operate in.