GAO Reports: The Federal Database Behind Congress’s Watchdog

When Congress wants to know whether a federal program is working—whether the money reached the people it was meant for, whether a contract was let cleanly, whether an agency is vulnerable to fraud—it does not, in the first instance, hold a hearing. It asks the Government Accountability Office, the nonpartisan audit, evaluation, and investigative arm of the legislative branch, often called the congressional watchdog. GAO answers in writing, and each answer carries a product number, a title, the agency it examined, a date, and a set of recommendations. Our catalog holds roughly 2,800 of those products: the institutional memory of federal oversight, keyed and searchable.

This article covers what the GAO reports dataset is and how the office that produces it works; GAO's place in the legislative branch and its independence from the agencies it examines; the request-and-mandate model that determines what GAO investigates; the structure of a GAO product and the product-number identifier that keys the data; the two flagship recurring products—the biennial High-Risk List and the annual duplication-and-fragmentation report; the recommendation-tracking system and the financial benefits GAO attributes to its work; the bid-protest decisions that make GAO a quasi-judicial forum for federal contracting disputes; how the reports catalog joins to the spending and legislative datasets it scrutinizes; a Python workflow that pulls reports from gao.gov, tallies them by topic and year, and searches titles by keyword or agency; and the caveats—a request-driven workload, the gap between recommendation and implementation, and the limits of a title-and-topic catalog—that every analyst must hold before drawing conclusions.

What the dataset is

The Government Accountability Office publishes the results of its work as a stream of discrete products: reports (the detailed written studies that are GAO's primary output), testimonies(the statements GAO officials deliver before congressional committees, usually distilling a body of report work), and legal decisions (formal rulings, most prominently the bid-protest decisions discussed below, along with appropriations-law opinions). Every one of these is assigned a unique product number and posted to the office's public website. Our gao_reports table catalogs roughly 2,800 of those products, with the grain of one row per product: each report and each testimony is a single row, keyed by its product number and searchable by topic and by the agency examined.

The columns capture who and what each product is about, when it was issued, and the office's findings—the identifier, the title, the subject and agency examined, the publication date, and the recommendations GAO made:

product_number      -- GAO product ID, e.g. GAO-24-106 (the primary key)
title               -- the report or testimony title
product_type        -- report, testimony, or legal decision
agency_examined     -- the department or agency the product evaluates
topic               -- subject area, e.g. health care, defense, cybersecurity
publish_date        -- date the product was released to the public
requester           -- the committee or member who requested the work (if any)
recommendations     -- count of recommendations GAO made
recs_open_closed    -- implementation status of those recommendations
financial_benefit   -- dollar benefit GAO attributes to the work (if any)
summary             -- the highlights / what-GAO-found abstract
url                 -- link to the full product on gao.gov

The product_number is the load-bearing column. The GAO product number—the familiar GAO-YY-NNN identifier, where YY is the fiscal year and NNN is a sequence number—is a persistent, citable handle that uniquely names a single product across the office's entire output. It is how a GAO report is cited in a committee print, in a subsequent GAO product, and in the agency responses that the report provokes; and it is the key that lets the catalog be joined to anything else that references a report by number. The agency_examined and topic columns are what make the catalog navigable: because GAO's remit spans the entire federal government, the ability to filter to the products that examine a particular department—or that bear on a particular subject—is what turns a flat list into an oversight index. The recommendations and recs_open_closed columns carry the substance of the office's influence: a GAO report's power is not the finding but the recommendation, and whether that recommendation has been implemented is the measure of whether the report changed anything.

What GAO is and where it sits

The Government Accountability Office was created in 1921 by the Budget and Accounting Act, the same law that established the modern federal budget process and the Bureau of the Budget (now the Office of Management and Budget). For most of its history it was called the General Accounting Office, a name that reflected its original mission as an auditor of government expenditures; in 2004 Congress renamed it the Government Accountability Office to capture the far broader role it had grown into—not merely checking the books but evaluating whether programs achieve their goals and recommending how government can work better and cost less. The office is led by the Comptroller General of the United States, who is appointed by the President with the advice and consent of the Senate to a single, fixed fifteen-year term—a term deliberately longer than any President's and removable only by impeachment or joint resolution for cause, structural guarantees of the independence the office's work requires.

The most important structural fact about GAO, and the one that gives its products their authority, is that it is part of the legislative branch. It works for Congress, not for the President, and it is independent of the executive-branch agencies whose programs it examines. This separation is what distinguishes GAO from an agency inspector general, who sits inside a single department and reports within it. GAO is outside the agencies entirely, answerable only to Congress, and that vantage point lets it take a government-wide view—to study a problem, like improper payments or fragmentation of effort, that cuts across many agencies and that no single inspector general is positioned to see whole. The office is also rigorously nonpartisan: its staff are career analysts, auditors, economists, scientists, and lawyers, not political appointees, and the office's credibility—the reason a GAO finding carries weight with members of both parties—rests on the perception, carefully guarded, that its conclusions follow the evidence rather than any political program.

The request-and-mandate model

GAO does not pick most of its own work. Its agenda is set largely by Congress, and understanding how a piece of GAO work comes to exist is essential to interpreting the dataset, because the distribution of reports across topics and agencies reflects what Congress asked about as much as where the problems are. Work arrives through three channels. The first and largest is the congressional request: a committee chair, a ranking member, or an individual member writes to GAO asking it to study a question. GAO prioritizes requests from committee and subcommittee leadership—the chairs and ranking members with jurisdiction over the relevant program—because a chamber's oversight committees are the office's principal clients. The second is the statutory mandate: Congress writes into law a standing requirement that GAO review a particular program, audit a particular fund, or report on a particular subject, often annually. The third, smaller channel is work the Comptroller General initiates under the office's own authority to study matters within its broad statutory remit.

The consequence for the data is direct. The topical shape of the catalog—why there are many products on, say, defense acquisition or health-care programs and fewer on some other corner of government—tracks congressional attention and the committees that are most active in requesting oversight. A surge of reports on a subject usually signals that the subject became politically salient, that a crisis drew committee attention, or that a new statutory mandate kicked in—not necessarily that the underlying problem worsened in that year. An analyst reading the volume of GAO products on a topic as a proxy for the severity of the problem is, in part, measuring the intensity of congressional interest. This request-driven character is GAO's defining feature and the first thing the caveats section returns to.

The High-Risk List and the duplication report

Two recurring GAO products are far better known than any single study, and both repay attention because they distill the office's government-wide vantage into a standing scorecard. The first is the High-Risk List. Beginning in 1990, GAO has published—now at the start of each new Congress, on a biennial cadence—a list of the federal programs and operations it judges most vulnerable to fraud, waste, abuse, and mismanagement, or most in need of transformation to address economy, efficiency, or effectiveness challenges. The list has named some areas for decades: the management of major weapons-system acquisitions, the financial condition of certain federal programs, the government's exposure to improper payments, and the security of federal information systems have all been long-running high-risk designations. The list is not merely a complaint; for each high-risk area GAO defines the criteria for removal and tracks progress against them, so that areas can be added when a problem emerges and removed when an agency has demonstrably fixed it. Because the same areas recur, the High-Risk List is one of the few instruments that lets Congress and the public watch a chronic management problem either improve or fester over a span of years.

The second flagship is the annual duplication and fragmentation report. Under a statutory mandate enacted in 2010, GAO reports each year on federal programs, agencies, offices, and initiatives that have duplicative goals or activities, and it identifies opportunities to reduce fragmentation, overlap, and duplication and to achieve cost savings or enhance revenue. The report has surfaced, year after year, the surprising extent to which the federal government runs many separate programs aimed at the same objective—multiple job-training programs, overlapping food-safety responsibilities split across agencies, redundant data systems—and it attaches to each finding a recommendation and, where GAO can estimate one, a potential financial benefit from consolidation or reform. Both flagship products illustrate the through-line of GAO's value: the office's comparative advantage is seeing across the whole of government what no single agency, and no single inspector general, is positioned to see, and converting that cross-cutting view into specific, trackable recommendations.

Recommendations and reported financial benefits

The unit of GAO's influence is the recommendation. A report does not merely describe a problem; it tells a named agency, specifically, what to do about it—tighten a control, change a process, collect data it is not collecting, close a gap in oversight. GAO then does something that distinguishes it from a one-off study: it tracks the implementation status of every recommendation it makes. Each recommendation is carried as open or closed, and a closed recommendation is marked as either implemented or, less often, closed without implementation when circumstances have made it moot or the agency has declined to act and GAO judges further pursuit unproductive. The office follows up with agencies over the years it can take for a recommendation to be acted on, and it publicizes the priority recommendations it considers most important for an agency to address. This tracking is what makes the catalog more than a library: it turns each report into an accountability instrument whose effect can be measured, and it lets an analyst ask not just what GAO recommended but whether anyone did it.

GAO also reports the financial benefits it attributes to its work—the dollars saved, recovered, or reallocated, and the revenue enhanced, that flow from agencies and Congress acting on its findings and recommendations. The office tallies these benefits annually and uses the total, set against its own appropriation, to express a return on the public investment in oversight. These figures should be read for what they are—GAO's own attribution of benefits to its work, which necessarily involves judgment about causation and is not an audited, independently verified accounting—but the recommendation-tracking machinery behind them is rigorous, and the financial-benefit and recommendation-status fields in the catalog are exactly the columns that let a reader move from cataloguing reports to evaluating whether the office's work changes how the government spends money.

Bid-protest and legal decisions

One category of GAO product is not a study at all but a ruling, and it is large enough and consequential enough to deserve its own treatment: the bid-protest decision. Under the Competition in Contracting Act, GAO serves as a forum where a disappointed bidder for a federal contract—a company that believes an agency conducted a procurement improperly—can file a protest, and GAO adjudicates it. The Comptroller General's office reviews whether the agency followed procurement law and the terms of the solicitation, and issues a decision sustaining or denying the protest. A protest can challenge the terms of a solicitation before bids are due, or the evaluation and award after the fact; and a timely protest generally triggers an automatic stay that pauses award or performance while GAO decides. These decisions make GAO a quasi-judicial actor in the tens-of-billions-of-dollars-a-year machinery of federal contracting, and they are a primary reason GAO sits at the intersection of oversight and procurement.

For the dataset, bid-protest and other legal decisions enter the catalog as products with their own identifiers and dates, distinguishable by product type. They are analytically distinct from reports and testimonies: where a report evaluates a program and recommends improvements, a bid-protest decision resolves a specific dispute about a specific procurement and names the parties—the protester, the agency, and the awardee. That makes the legal-decision side of the catalog the natural bridge to the contract-spending record: a sustained protest is a signal that an award shown in the spending data was challenged, and potentially overturned or recompeted, for reasons GAO has documented in writing. An analyst tracing the contested awards behind a program's spending will find that thread running through GAO's decisions.

Joining to the spending and legislative data

The GAO reports catalog is most valuable not in isolation but as the oversight layer over the programs, dollars, and statutes that the rest of the federal record describes. Several joins matter.

The first is to the spending record. A great deal of GAO work is, at bottom, about money—whether a program spent its appropriation effectively, whether a contract delivered value, whether improper payments are leaking out of a benefit program. Joining the reports catalog to the federal spending data by the agency examined— and, for bid-protest decisions, by the specific procurement and the parties—lets an analyst put a GAO finding next to the dollars it concerns: to see the size of the program a high-risk designation flags, the value of a contract a protest contested, or the magnitude of the improper payments a report quantifies. The reports tell you where the oversight concern is; the spending data tells you how much is at stake.

The second join is to the legislative record. GAO sits at the center of a feedback loop with Congress: members and committees request the work, and the findings in turn feed hearings, inform legislation, and drive the appropriations and authorizing decisions that fund and shape the programs GAO examined. Relating the reports catalog to the committees and members who requested the work—and to the bills, hearings, and votes that follow—reconstructs that loop: it shows which oversight findings members acted on, which recommendations were written into law, and how a GAO report on a problem connects to the legislative response. The same nonpartisan analytic support that CRS provides to Congress and the roll-call record of how members ultimately voted are the natural companions to GAO's evaluations of whether the programs those votes created are working.

The third join is to the agency record itself. Because every product names the agency it examines, the catalog can be pivoted into an agency-by-agency oversight profile—how many products examine each department, how many open recommendations each carries, which sit on the High-Risk List—and that profile can be set against the agency's regulatory output, its inspector-general findings, and its spending. The product number that keys the catalog is also the citation by which agency responses, follow-up reports, and congressional references point back to a given finding, so the identifier itself is the thread that stitches the oversight record together over time.

Analytical uses

A keyed, dated, agency-tagged catalog of federal oversight supports a distinctive set of analyses that no single report conveys.

Oversight intensity by agency and topic is the most immediate use. Aggregating products by the agency examined and by topic, and trending those counts over time, reveals where congressional oversight concentrates and how its focus shifts with events—the rise of cybersecurity products, the recurring weight of defense acquisition, the surges that follow a crisis or a new mandate. The necessary caution, developed in the caveats, is that this measures attention as much as it measures the underlying problem, because GAO's agenda is set by what Congress asks about.

Recommendation follow-through exploits the recommendation-status fields: computing, by agency, the share of GAO recommendations that have been implemented versus those that remain open for years turns the catalog into a scorecard of which agencies act on oversight and which let findings languish. Paired with the High-Risk List, it tells a story about whether a chronic problem is being addressed. Tracking the High-Risk List across Congresses follows the additions and removals to see which long-standing management problems finally got fixed and which have resisted every recommendation. And linking findings to dollars and votes brings the spending and legislative joins to bear—putting the financial benefit GAO attributes to a body of work next to the appropriations it concerned, or tracing a sustained bid protest from GAO's decision to the contested award in the spending record—which is exactly the cross-cutting, follow-the-money analysis the watchdog exists to enable.

Python workflow: pulling reports from gao.gov

The script below pulls GAO reports and testimonies from gao.gov's public report listing, tallies them two ways—by publication year and by topic—and provides a title search that filters the catalog by keyword or by the name of an agency. No API key is required for public data. Because the parameter and field names on gao.gov change between site releases, the script isolates the listing path and discovers the working product-number, title, date, and topic column names at runtime rather than hard-coding them; any production use should be validated against the current gao.gov reports browse interface and should page through the full result set rather than the small sample the example fetches.

import requests
import pandas as pd
from collections import Counter

# GAO publishes its product catalog through gao.gov. The site exposes
# report listings and feeds; no API key is required for public data.
# Product numbers follow the GAO-YY-NNN pattern (for example GAO-24-106) and
# are the stable primary key for every report, testimony, and legal decision.
#
# Endpoints and parameter names on gao.gov change between site releases, so
# the report-listing path and field names are isolated here; confirm them
# against the current gao.gov reports browse/search before production use.
BASE = "https://www.gao.gov"
LIST = BASE + "/reports-testimonies"


def fetch_reports(params=None, pages=20, page_size=50):
    # Walk the paginated report-listing feed and collect raw records.
    rows = []
    params = dict(params or {})
    for page in range(pages):
        params["page"] = page
        params["items_per_page"] = page_size
        r = requests.get(LIST, params=params,
                         headers={"Accept": "application/json"}, timeout=60)
        r.raise_for_status()
        batch = r.json()
        items = batch.get("results", batch if isinstance(batch, list) else [])
        if not items:
            break
        rows.extend(items)
    return pd.DataFrame(rows)


def _col(df, *needles):
    # Return the first column whose name contains all of the needles.
    for c in df.columns:
        u = c.upper()
        if all(n.upper() in u for n in needles):
            return c
    return None


def analyze(df):
    if df.empty:
        print("No reports returned.")
        return
    num = _col(df, "PRODUCT") or _col(df, "NUMBER") or _col(df, "GAO")
    title = _col(df, "TITLE")
    date = _col(df, "RELEASE") or _col(df, "PUBLISH") or _col(df, "DATE")
    topic = _col(df, "TOPIC") or _col(df, "SUBJECT") or _col(df, "AGENCY")

    print(f"{len(df):,} products in the pull")

    # --- Reports by year -------------------------------------------------
    if date:
        df[date] = pd.to_datetime(df[date], errors="coerce")
        by_year = df[date].dt.year.value_counts().sort_index()
        for yr, n in by_year.tail(6).items():
            print(f"  {int(yr)}: {n:,} products")

    # --- Reports by topic ------------------------------------------------
    if topic:
        top = Counter(df[topic].dropna().astype(str)).most_common(8)
        print("Top topics:")
        for name, n in top:
            print(f"  {name[:40]:<40} {n:>4}")
    return num, title, date, topic


# --- Keyword / agency title search --------------------------------------
def search_titles(df, term):
    title = _col(df, "TITLE")
    if not title:
        return df.iloc[0:0]
    hits = df[df[title].astype(str).str.contains(term, case=False, na=False)]
    return hits


reports = fetch_reports()
analyze(reports)
# print(search_titles(reports, "cybersecurity")[[c for c in reports.columns][:3]])

Two practical notes apply. First, the topic and year tallies in the script are a first pass: they count products as the listing returns them, without distinguishing reports from testimonies from legal decisions, and without weighting by the size or significance of the product. A serious analysis should split by product type—treating a multi-volume report, a one-paragraph testimony, and a bid-protest decision as different things—and should normalize topic counts against the volume of activity in that subject area before reading the counts as a measure of oversight pressure. Second, for catalog-scale work—profiling every agency, trending the High-Risk List across Congresses, or joining the full recommendation-status record to the spending data—GAO's structured listings and feeds, refreshed on a schedule and reconciled against the product-number key, are far more reliable than scraping many paginated pages, and they carry the authoritative product metadata that ad hoc pulls miss.

Limitations and analytical caveats

The GAO reports catalog is the most authoritative public record of federal oversight in the United States, but it carries structural limitations that an analyst must internalize before drawing conclusions from it.

The agenda is request-driven, so the catalog reflects attention, not just problems. Because most GAO work originates in congressional requests and statutory mandates, the distribution of products across agencies and topics is shaped by what Congress chose to ask about. A subject with many GAO reports is a subject Congress paid attention to; a thinly covered corner of government may be poorly run and simply unexamined—the absence of GAO products is not evidence of the absence of problems. Reading report volume as a direct proxy for the severity or prevalence of mismanagement is the single most common misreading of this data, and the request model is the reason.

A recommendation is not an outcome. GAO can recommend; it cannot compel. Agencies are not legally bound to implement GAO's recommendations, and Congress may or may not act on its findings. Many recommendations remain open for years, and some are closed without ever being implemented. The catalog faithfully records what GAO recommended and whether it has been acted on, but a report's existence—even a forceful one with a large attributed financial benefit—does not mean the problem was fixed. The recommendation-status fields are precisely the corrective to this, and they should be consulted rather than assuming that a finding produced a change.

Attributed financial benefits are GAO's own estimates.The dollar figures GAO reports for savings, recoveries, and revenue enhancements are the office's attribution of benefits to its work, and attribution—deciding how much of a saving flowed from a recommendation rather than from other forces—necessarily involves judgment. These figures are reported in good faith and are useful as an order-of-magnitude indication of impact, but they are not independently audited accounting, and they should not be summed and cited as a precise ledger of taxpayer dollars saved.

The catalog is a summary, and some work is restricted.The title, topic, agency, and summary fields compress a substantial written product—often dozens or hundreds of pages, with methodology, findings, agency comments, and appendices— into a few coded fields; the full texture lives in the product itself, not in the catalog row. And not all of GAO's work is public: some products contain sensitive or classified material and are issued in restricted form, so the public catalog is the published record of GAO's output, not the complete record of everything the office has examined. Treating a catalog row as a substitute for reading the report, or treating the public catalog as exhaustive, over-reads what the dataset can bear.

Held with these caveats in mind, the gao_reports table is a uniquely valuable resource: a keyed, dated, agency-tagged catalog of the published work of Congress's nonpartisan watchdog—the institutional memory of federal oversight, from the High-Risk List to the bid-protest decision—and the natural companion to the spending and legislative records whose programs, dollars, and statutes GAO exists to hold to account.

Related writing

CRS Reports: The Federal Database Behind Congress’s Own Nonpartisan Research — The Congressional Research Service is GAO's sibling in the legislative branch, supplying Congress with nonpartisan policy and legal analysis; where CRS explains and frames an issue, GAO audits and evaluates how the resulting programs perform, and the two together form the analytic backbone of congressional oversight.

Congressional Voting Records: The Federal Database Behind Every House and Senate Roll Call Vote — GAO sits at the front of a feedback loop in which members request oversight work and then act on it, so the roll-call record of how members ultimately voted is the legislative downstream of the findings GAO reports back to the committees that asked.

USASpending Contracts: The Federal Record of Every Dollar the Government Buys — A great deal of GAO's work concerns federal contracting—from program-cost evaluations to the bid-protest decisions it adjudicates—and joining the reports catalog to the contract-spending record puts each oversight finding next to the dollars and the specific awards it concerns.