Technical writing
NIH Research Portfolio: The Federal Database Behind $50 Billion in Annual Biomedical Grants
The NIH Research Portfolio Online Reporting Tools (RePORTER) database covers every NIH-funded research project since 1985 — 500,000+ active and historical grants totaling over $50 billion per year, spanning every disease area, institution, and principal investigator in US biomedical research.
The National Institutes of Health: structure and scale
The National Institutes of Health is a component of the Department of Health and Human Services and the largest single funder of biomedical research in the world. Congress appropriated approximately $48 billion to NIH in fiscal year 2024, a figure that has grown from roughly $13 billion in 2000 through a combination of doubling initiatives, inflation adjustments, and emergency pandemic appropriations. NIH employs approximately 20,000 people across its main campus in Bethesda, Maryland and associated facilities; another 300,000 researchers at universities, medical schools, hospitals, and private research institutes receive NIH funding to conduct their work.
NIH divides its work between extramural research and intramural research. Extramural research—grants, cooperative agreements, and contracts awarded to external institutions—accounts for roughly 80% of the NIH budget. This is the population visible in RePORTER: competitive awards made to universities, medical schools, teaching hospitals, research institutes, and small businesses following peer review. Intramural research, conducted by NIH scientists on the Bethesda campus and at the NIH Clinical Center, accounts for the remaining 20% and does not appear in RePORTER as grant records; it is a separate federal research enterprise with its own leadership structure and scientific agenda.
NIH comprises 27 Institutes and Centers (ICs), each with a statutory mission and a separate congressional appropriation. The ICs are not interchangeable: each has its own scientific agenda, peer review study sections, paylines, program officers, and culture. The largest ICs by budget — NCI, NIAID, NHLBI, NIMH, NINDS — collectively account for more than half of NIH's extramural spending. Smaller ICs such as NIMHD (minority health disparities), FIC (Fogarty International Center for global health), and NCCIH (complementary and integrative medicine) operate with budgets under $500 million per year and fund research that the larger ICs might not prioritize.
Grant mechanisms: the activity code taxonomy
Every NIH award carries an activity code that identifies the funding mechanism. The taxonomy has grown to more than 150 distinct codes, but a dozen account for the overwhelming majority of NIH's extramural spending and are essential to understanding the database.
Research Project Grants (R series). The R01 is the flagship of the NIH portfolio and the basic unit of investigator-initiated biomedical research. R01s run three to five years and are funded at $250,000–$500,000 per year in direct costs, though the modular budget cap of $250,000 direct per year applies to the large majority of awards. Getting an R01 is a career milestone; losing one can force a lab closure. The R21 Exploratory/Developmental Grant supports higher-risk, preliminary research over two years at a maximum of $275,000 total direct costs—designed for ideas too early-stage to survive R01 review. The R03 Small Research Grant provides up to $100,000 total over two years for methodology development and pilot data collection. The R34 planning grant funds clinical trial infrastructure before full trial initiation.
Program Project Grants and Centers (P series). The P01 Program Project Grant funds a coordinated cluster of interrelated research projects under a single administrative core, typically at $1–3 million per year total. P01s are prestige grants and expensive to administer—reviewers evaluate each component project and the overall program integration simultaneously. The P30 Core Grant supports a shared research resource (a biostatistics core, a high-throughput sequencing facility, a tumor tissue repository) that serves multiple investigators at an institution. Cancer centers at major academic medical centers are funded as P30s from NCI. The P50 Specialized Center designates centers of excellence around a specific disease or research theme.
Cooperative Agreements (U series). Cooperative agreements differ from grants in that NIH program staff have substantial programmatic involvement in the research—they are not passive funders. The U01 is the cooperative agreement equivalent of the R01. The U10 funds clinical trials networks, the multi-site infrastructure conducting phase III trials too large for any single institution. The UH2 and UH3 are phased cooperative agreements used for translational research requiring milestone-based decision points before full funding commitment. Many of NIH's largest single awards by dollar value are U mechanisms, particularly in infectious disease and cardiovascular clinical trials.
Career Development (K series). K awards fund the transition from mentored to independent research careers. The K99/R00 Pathway to Independence Award is the highest-profile: it provides two years of mentored support (K99) followed by up to three years of independent support (R00), and it is portable, following the investigator to their first faculty position regardless of institution. The K23 Mentored Patient-Oriented Research Career Development Award supports clinician-scientists building clinical research careers. The K08 targets physician-scientists in basic or translational research. The K01 is a mentored research scientist development award for non-physician researchers.
Fellowship Awards (F series). F30 and F31 awards fund predoctoral fellows—graduate students pursuing MD/PhD or PhD degrees in biomedical fields. The F32 funds postdoctoral fellows for two to three years of mentored research experience. Fellowship applicants submit their own applications and are reviewed independently, but the award is institutional: the stipend is paid through the fellow's host institution, and the record appears in RePORTER with the institution listed as the grantee organization.
Training Grants (T series). T32 Institutional National Research Service Awards fund institutional training programs: the university applies for funds to support a defined number of predoctoral and postdoctoral trainees in a specified research area for five years. Individual trainees are not listed in RePORTER; the record reflects the institutional award. T32 grants are among the most stable in the NIH portfolio—they are difficult to obtain and difficult to lose, and they anchor research training programs at medical schools for decades.
Small Business Innovation Research and Small Business Technology Transfer (SBIR/STTR). Phase I SBIR awards (activity code R43) fund feasibility studies at approximately $300,000 over six months to one year. Phase II SBIR awards (R44) fund full development at approximately $1.7 million over two years. STTR awards (R41/R42) require a formal collaboration between the small business and a research institution. These appear in RePORTER alongside academic grants and are frequently excluded from analyses focused on investigator-initiated academic research.
Funding distribution by Institute and Center
The NCI (National Cancer Institute) consistently receives the largest IC allocation, approximately $7.5 billion per year—roughly 15% of the entire NIH budget and the largest single program in US science funding outside the defense budget. Cancer research benefits from decades of concentrated advocacy through disease organizations, survivor communities, and the political salience of cancer across party lines. The National Cancer Act of 1971 and its successors have systematically privileged NCI within NIH's internal budget process. The Biden-era Cancer Moonshot renewed this commitment with specific research targets around early detection and therapeutic development.
NIAID (National Institute of Allergy and Infectious Diseases) typically receives the second-largest allocation at approximately $6 billion per year in non-pandemic years. NIAID's remit covers infectious diseases, allergic diseases, immunology, and transplantation—a portfolio that spans from basic viral biology to large-scale HIV clinical trials networks. NHLBI (National Heart, Lung, and Blood Institute) follows at approximately $3.7 billion, covering cardiovascular disease, pulmonary conditions, hematology, and sleep disorders. NIMH (National Institute of Mental Health) funds psychiatric and neuroscience research at roughly $2.3 billion per year.
Geographic concentration in NIH funding is pronounced and self-reinforcing. The Boston-Cambridge cluster (Harvard, MIT, Boston University, Tufts, Brigham and Women's Hospital, Massachusetts General Hospital, Dana-Farber Cancer Institute) is the single largest recipient of NIH extramural dollars. The Baltimore corridor (Johns Hopkins University, Johns Hopkins Medicine, University of Maryland) follows. The San Francisco Bay Area (UCSF, Stanford, UC Berkeley) and New York City (Columbia, NYU Langone, Weill Cornell, Albert Einstein, Memorial Sloan Kettering) are the other major clusters. Together these four metropolitan areas account for a share of total NIH extramural spending that exceeds their share of US research faculty by a wide margin, reflecting the compounding advantage of established infrastructure, pilot funding networks, and administrative expertise.
Paylines, peer review, and success rates
NIH does not fund every application that peer reviewers recommend. The payline is the percentile cutoff below which applications are not funded in a given fiscal year for a given IC and mechanism. Paylines vary by IC and by mechanism and are published prospectively by most ICs at the start of each fiscal year. R01 paylines have historically fallen in the range of the 8th to 15th percentile—meaning only the top 8–15% of competing applications receive awards. The overall R01 success rate in 2023 was approximately 20%, reflecting both funded applications below the payline (exceptional program officer exceptions) and the fact that some applications are administratively triaged before scoring.
NIH peer review operates in two stages. Applications first go to a Scientific Review Group (study section) managed by the Center for Scientific Review (CSR) or an IC-specific review office. Study sections are standing panels of 15–25 scientists who convene three times per year to evaluate proposals in a defined scientific area. Each application is assigned to three primary reviewers who produce written critiques scored on five criteria: significance, innovation, approach, investigators, and environment. The full panel then discusses the application and assigns a final priority score on a 1–9 scale (lower is better). Applications scoring in the top half of the section receive a percentile rank, a normalized score relative to all applications reviewed by that section over three meetings. Applications in the lower half are triaged (not discussed) and receive no percentile.
Percentile-ranked applications then go to the IC's National Advisory Council, which reviews them for programmatic relevance and issues final funding recommendations. Program officers retain discretion to recommend funding applications above the payline (exceptions require justification and are monitored) or to skip applications just below the payline. A modular R01 requesting $250,000 per year in direct costs follows a simplified budget process; applications requesting more than $250,000 direct per year require a detailed, line-item budget that is reviewed as part of the application.
Indirect costs and what “total cost” means
Every NIH grant includes direct costs (salaries, supplies, animals, patient care, equipment) and indirect costs, formally called Facilities and Administrative (F&A) costs. The indirect cost rate is negotiated between each institution and the HHS Division of Cost Allocation and is expressed as a percentage of Modified Total Direct Costs (MTDC), which excludes capital equipment, patient care, and subcontract amounts over $25,000 from the base. Negotiated rates reflect the genuine cost of maintaining research infrastructure: laboratory construction, utilities, administrative staff, compliance systems, and institutional support for sponsored programs.
Rates vary substantially by institution type. Major research universities typically negotiate rates of 50–60% of MTDC for on-campus laboratory research. Institutions in expensive real estate markets with heavy capital investment — Harvard, MIT, and several others — have negotiated rates in the 60–68% range for specific cost pools. Hospital rates tend to be lower, 35–50%, because clinical operations absorb some overhead. Foreign institutions are capped at 8% under NIH policy regardless of their actual infrastructure costs.
This means the total_cost field in RePORTER is not the amount scientists spend on experiments. On a $500,000 total cost R01 at a university with a 60% indirect rate, roughly $185,000–$200,000 flows to the institution as overhead recovery before the PI receives a dollar for research. The RePORTER API separates award_amount (total cost) from direct_cost_amt, making this decomposition auditable but not automatic. Analyses conflating total cost with research spending will systematically overstate effective scientific output per dollar.
The RePORTER database and API
The NIH Research Portfolio Online Reporting Tools database at reporter.nih.gov provides the public-facing searchable interface to all NIH-funded grants. Each award record contains the project number, PI names and NIH-assigned PI IDs, institution, abstract, MeSH disease terms, fiscal year, total and direct costs, activity code, funding IC, study section, project start and end dates, and for multi-component awards, subproject identifiers. Records go back to fiscal year 1985, though structured fields become substantially richer after 2000.
The project number format encodes the grant history. A number like 1R01CA123456-01 reads as: type digit (1 = new award), activity code (R01), funding IC (CA = NCI), six-digit serial number (123456), and award year suffix (01 = first year of the project period). Non-competing renewals increment the type digit to 5 and the year suffix. Parsing the serial number correctly and stripping the type digit and year suffix is prerequisite to computing lifetime grant value across multiple budget periods.
The NIH Reporter API v2 at https://api.reporter.nih.gov/v2/projects/search accepts HTTP POST requests with a JSON criteria body and requires no API key. Pagination is handled via offset and limit parameters; the maximum page size is 500 records. The criteria object supports fiscal year filtering, activity code arrays, IC codes, full-text search across titles and abstracts, PI name, institution name, state, and MeSH term filtering. The response includes a meta object with the total result count, enabling loop-based pagination to retrieve complete result sets.
iCite, the NIH's citation impact tool linked to RePORTER, provides the Relative Citation Ratio (RCR) for publications arising from NIH-funded grants. RCR normalizes citation counts to the citation field of each paper and to the year of publication, enabling cross-field comparison of research impact. Linking RePORTER grant records to iCite publication records via the PubMed ID allows analysis of which funding mechanisms, ICs, institutions, and disease areas produce the highest citation impact per dollar of federal investment.
Research spending by disease category: RCDC
NIH publishes annual estimates of spending by disease category through the Research, Condition, and Disease Categorization (RCDC) system. RCDC assigns each funded project to one or more of 317 disease categories using automated text mining of project titles, abstracts, and MeSH terms. The process is not a simple keyword match—it uses a fingerprint algorithm that weights term combinations—but the assignments are approximate, and NIH publishes the methodology and the annual estimates together so that analysts can evaluate the classification logic.
The 2024 RCDC estimates place cancer spending at approximately $7.5 billion, the largest single category and the one most closely aligned with a single IC (NCI). Infectious disease spending, including the sustained COVID-19 research portfolio, exceeded $8 billion in fiscal years 2021 and 2022 before declining as emergency supplementals wound down. Alzheimer's disease research received approximately $3.6 billion in 2024, reflecting a decade-long bipartisan congressional push to double and then sustain Alzheimer's funding following the National Alzheimer's Project Act. HIV/AIDS received approximately $3.5 billion. Diabetes research received roughly $1.1 billion. Mental health, spread across NIMH, NIDA, and several other ICs, received approximately $2.4 billion in aggregate RCDC estimates.
The RCDC categories are not mutually exclusive—a grant studying HIV co-infection in people with diabetes might appear in both the HIV/AIDS and diabetes categories—and the sum of RCDC category totals substantially exceeds total NIH extramural spending for this reason. Using RCDC estimates as additive totals is a common analytical error. They are useful for within-category trend analysis over time and for comparing disease areas to each other at a given point in time.
The COVID-19 funding surge and ARPA-H
NIH received emergency supplemental appropriations for COVID-19 research in fiscal years 2020 and 2021 that created a funding surge visible in RePORTER. Approximately $2.3 billion in supplemental COVID-19 research funds flowed through NIH in fiscal year 2020 alone, funding urgent virology, clinical trials, diagnostics, and therapeutics research outside the normal annual grant cycle. NIAID received the largest share, funding the ACTIV (Accelerating COVID-19 Therapeutic Interventions and Vaccines) public-private partnership that produced the large U mechanism cooperative agreements for clinical trials networks. The RADx (Rapid Acceleration of Diagnostics) initiative generated its own cluster of awards. The PASC (Post-Acute Sequelae of SARS-CoV-2) research program, focused on long COVID, produced a wave of U01 awards beginning in 2021 that continue to generate data.
Operation Warp Speed vaccine grants were primarily led by BARDA (Biomedical Advanced Research and Development Authority), a sister agency to NIH within HHS, not by NIH directly. The NIH collaboration on COVID vaccines operated primarily through NIAID's Vaccine Research Center and its existing cooperative agreement relationships rather than through new RePORTER-visible awards. Researchers analyzing COVID funding who focus exclusively on RePORTER will miss the BARDA contract awards that funded the Moderna, Pfizer-BioNTech, and Janssen vaccine manufacturing scale-up; those appear in USASpending.gov as contract records, not grants.
The Advanced Research Projects Agency for Health (ARPA-H) was established by Congress in 2022 as a separate entity within HHS, modeled loosely on DARPA, to fund high-risk, high-reward biomedical research aimed at breakthrough capabilities rather than incremental scientific advances. ARPA-H received approximately $2.5 billion over its first three fiscal years and operates with a program manager model rather than peer review—individual program managers have substantial discretionary authority to fund or terminate projects. ARPA-H awards appear in USASpending.gov as contracts and Other Transaction Agreements (OTAs), not as NIH RePORTER grants; ARPA-H is explicitly separate from NIH's grant-making structure despite being located administratively within HHS. Analysts conflating ARPA-H spending with NIH spending will encounter methodological problems in any longitudinal analysis of the federal biomedical research portfolio.
Python: querying the NIH Reporter API v2
The following script connects to the NIH Reporter API v2, retrieves all active FY2024 grant records, and computes four analyses: total funding by IC, the top 20 institutions by aggregate active grant funding, the distribution of activity codes by count and total dollars, and the ten highest-funded individual active research projects. The API requires no authentication and accepts POST requests with a JSON criteria body.
import requests
import pandas as pd
from collections import Counter, defaultdict
# NIH Reporter API v2 — https://api.reporter.nih.gov
# No API key required. POST requests with JSON criteria bodies.
# Max 500 records per request; paginate with offset parameter.
API = "https://api.reporter.nih.gov/v2/projects/search"
def fetch_active_grants(fiscal_year: int = 2024, limit: int = 500) -> list[dict]:
"""
Page through all active NIH grants for a given fiscal year.
Returns the full list of project records.
"""
results = []
offset = 0
while True:
payload = {
"criteria": {
"fiscal_years": [fiscal_year],
},
"offset": offset,
"limit": limit,
"sort_field": "award_amount",
"sort_order": "desc",
}
resp = requests.post(API, json=payload, timeout=60)
resp.raise_for_status()
data = resp.json()
hits = data.get("results", [])
total = data.get("meta", {}).get("total", 0)
results.extend(hits)
if len(results) >= total or not hits:
break
offset += len(hits)
print(f" fetched {len(results):,} / {total:,}", end="\r")
return results
print("Fetching FY2024 active grants...")
grants = fetch_active_grants(fiscal_year=2024)
print(f"\nTotal grant records: {len(grants):,}")
# -----------------------------------------------------------------------
# Build a flat dataframe from the nested JSON
# -----------------------------------------------------------------------
rows = []
for g in grants:
org = g.get("organization", {}) or {}
ic = g.get("agency_ic_admin", {}) or {}
rows.append({
"appl_id": g.get("appl_id"),
"project_num": g.get("project_num"),
"activity_code": g.get("activity_code"),
"fiscal_year": g.get("fiscal_year"),
"ic_code": ic.get("abbreviation"),
"ic_name": ic.get("name"),
"total_cost": g.get("award_amount") or 0,
"direct_cost": g.get("direct_cost_amt") or 0,
"title": g.get("project_title"),
"org_name": org.get("org_name"),
"org_state": org.get("org_state"),
"project_start": g.get("project_start_date"),
"project_end": g.get("project_end_date"),
"pi_names": "; ".join(
p.get("full_name", "") for p in (g.get("principal_investigators") or [])
),
"terms": "; ".join(g.get("terms") or []),
})
df = pd.DataFrame(rows)
print(f"Dataframe shape: {df.shape}")
# -----------------------------------------------------------------------
# 1. Total funding by NIH Institute / Center (IC)
# -----------------------------------------------------------------------
ic_funding = (
df.groupby(["ic_code", "ic_name"])["total_cost"]
.agg(total_awarded="sum", grant_count="count")
.sort_values("total_awarded", ascending=False)
.reset_index()
)
ic_funding["total_awarded_B"] = ic_funding["total_awarded"] / 1e9
print("\nFunding by NIH Institute / Center (FY2024):")
print(ic_funding.to_string(index=False))
# -----------------------------------------------------------------------
# 2. Top 20 institutions by total active grant funding
# -----------------------------------------------------------------------
inst_funding = (
df.groupby("org_name")["total_cost"]
.agg(total_awarded="sum", grant_count="count")
.sort_values("total_awarded", ascending=False)
.head(20)
.reset_index()
)
inst_funding["total_awarded_M"] = inst_funding["total_awarded"] / 1e6
print("\nTop 20 institutions by total FY2024 NIH funding ($ millions):")
print(inst_funding[["org_name", "grant_count", "total_awarded_M"]].to_string(index=False))
# -----------------------------------------------------------------------
# 3. Activity code distribution
# -----------------------------------------------------------------------
activity_dist = (
df.groupby("activity_code")["total_cost"]
.agg(count="count", total="sum")
.sort_values("count", ascending=False)
.reset_index()
)
activity_dist["share_pct"] = (
activity_dist["count"] / activity_dist["count"].sum() * 100
).round(1)
print("\nActivity code distribution (top 20 by grant count):")
print(activity_dist.head(20).to_string(index=False))
# -----------------------------------------------------------------------
# 4. Top 10 highest-funded individual active research projects
# -----------------------------------------------------------------------
top_projects = (
df.nlargest(10, "total_cost")[
["project_num", "activity_code", "ic_code", "org_name",
"pi_names", "total_cost", "title"]
]
.reset_index(drop=True)
)
print("\nTop 10 highest-funded active NIH research projects (FY2024):")
for _, row in top_projects.iterrows():
cost_M = row["total_cost"] / 1e6
print(f" [{row[\'activity_code\']} | {row[\'ic_code\']}] ${cost_M:.1f}M — {row[\'org_name\']}")
print(f" {row[\'project_num\']} {row[\'title\'][:80]}...")
# -----------------------------------------------------------------------
# 5. Geographic concentration: total NIH funding by state
# -----------------------------------------------------------------------
state_funding = (
df.groupby("org_state")["total_cost"]
.agg(total="sum", grants="count")
.sort_values("total", ascending=False)
.head(15)
.reset_index()
)
state_funding["total_B"] = state_funding["total"] / 1e9
print("\nTop 15 states by total FY2024 NIH funding ($ billions):")
print(state_funding[["org_state", "grants", "total_B"]].to_string(index=False))
Pagination is the primary technical challenge when pulling the full NIH portfolio. RePORTER's total active grant population for a single fiscal year exceeds 50,000 records, and the API's 500-record maximum page size requires more than 100 requests to retrieve the complete set. The meta.total field in each response provides the total result count, enabling a clean termination condition. Rate limiting is not formally documented by NIH, but requests at one-per-second intervals have been observed to be reliable in practice. The nested JSON structure—organization fields under an organization object, PI names as an array of objects, IC information under agency_ic_admin—requires careful unpacking before flat-file analysis. The direct_cost_amt field is populated separately from award_amount and enables the direct-versus-indirect cost decomposition that is essential for meaningful spending comparisons across institution types.
Related: NSF research grants · ClinicalTrials.gov research registry
Part of the Federal Regulatory Data Hub.