Technical writing
Medicare Part D Prescribing Data: Every Drug Prescribed by Every Medicare Provider
Since 2014, the Centers for Medicare & Medicaid Services has published a dataset that would have been unthinkable a decade earlier: the complete prescribing record of every physician, nurse practitioner, and other provider who wrote at least ten drug prescriptions under Medicare's outpatient drug benefit. More than one million providers. More than 5,500 distinct drugs. More than $100 billion in annual prescription spending made visible in a public CSV file. The Medicare Part D Prescribing Data is the most granular provider-level drug utilization record available anywhere in the world.
What Medicare Part D Is
Medicare Part D is the outpatient prescription drug benefit created by the Medicare Prescription Drug, Improvement, and Modernization Act of 2003 and launched on January 1, 2006. Before Part D, traditional Medicare (Parts A and B) covered hospital care and physician services but paid nothing for drugs dispensed at retail pharmacies. The law filled that gap by adding an optional prescription drug benefit delivered through private plan sponsors — standalone Prescription Drug Plans (PDPs) for beneficiaries in traditional fee-for-service Medicare, and Medicare Advantage Prescription Drug plans (MA-PDs) bundled with Medicare Advantage coverage.
Enrollment exceeded 50 million people as of 2024. Annual Part D drug spending surpassed $200 billion, making it one of the largest single purchasers of pharmaceutical products in the United States. The federal government subsidizes premiums and cost-sharing for low-income beneficiaries through the Low-Income Subsidy (LIS) program — sometimes called Extra Help — which in 2024 covered roughly 13 million enrollees. LIS beneficiaries typically pay nominal or zero copays, making them a useful reference population in the data because formulary and cost-sharing variation does not suppress their utilization in the same way it might for non-LIS enrollees.
Part D plans maintain formularies — lists of covered drugs organized into tiers with different cost-sharing levels. Formulary design affects which drugs appear in the prescribing data. A drug that is not on a plan's formulary generates no Part D claims for that plan's enrollees. The low-income subsidy population, facing negligible cost-sharing, provides the cleanest signal of pure prescribing preference unconstrained by patient out-of-pocket costs.
The CMS Dataset: Structure and Scale
CMS publishes the Medicare Part D Prescribers by Provider dataset annually through the CMS Open Data Portal at data.cms.gov. Each row in the file represents a unique provider–drug pair: one prescriber writing one drug during the calendar year. Providers with fewer than eleven claims for a given drug are excluded to protect beneficiary privacy under the Privacy Act. Suppression affects a meaningful share of low-volume prescriber–drug combinations but does not distort the aggregate picture for any drug with appreciable usage.
The core columns in each annual file are:
- prscrbr_npi — the provider's 10-digit National Provider Identifier, the stable key for joining to NPPES and other CMS datasets
- prscrbr_last_org_name / prscrbr_first_name — provider name as registered with NPPES
- prscrbr_type — specialty label derived from Medicare enrollment records (e.g., “Internal Medicine,” “Family Practice,” “Psychiatry”)
- prscrbr_city / prscrbr_state_abrvtn / prscrbr_zip5 — practice location
- brnd_name / gnrc_name — brand and generic drug names; brand name is blank for generic-only drugs
- tot_clms — total claims (prescriptions filled) for the provider–drug pair
- tot_day_suply — aggregate days of drug supply dispensed
- tot_drug_cst — total drug cost in dollars paid by the plan plus beneficiary cost-sharing (gross cost before rebates)
- tot_benes — count of distinct beneficiaries who received the drug from this provider
- bene_avg_age / bene_age_lt_65_cnt — beneficiary demographic breakdowns
- brand_suppression_flag — indicates whether brand-drug identity was suppressed for privacy
The 2022 dataset (the most recently finalized year as of mid-2026) contains approximately 29 million rows covering more than 1.1 million distinct providers and 5,700 unique drugs. Bulk CSV downloads exceed 3 GB uncompressed. CMS also exposes the data through the Socrata API at data.cms.gov, enabling filtered queries without downloading the full file. A companion file — Medicare Part D Prescribers by Provider and Drug — aggregates across all providers for each drug nationally and by geography.
Specialty and Drug Class Correlation
The strongest structural pattern in the data is the tight correlation between provider specialty and drug class. Oncologists appear overwhelmingly in rows for cancer immunotherapy agents and targeted small molecules — drugs like pembrolizumab, nivolumab, and osimertinib that rarely appear in any other specialty. Psychiatrists dominate prescribing of second-generation antipsychotics (quetiapine, aripiprazole, olanzapine) and antidepressants (sertraline, escitalopram, venlafaxine). Endocrinologists concentrate in insulin products and, increasingly, GLP-1 receptor agonists. Rheumatologists cluster around biologics like adalimumab and methotrexate.
Pain management specialists and primary care providers show the highest opioid claim volumes in aggregate, but the distribution within each specialty is wide. A pain management specialist in the 99th percentile of opioid claims may prescribe fifteen or twenty times as many opioid doses per beneficiary as a peer in the 50th percentile — a gap that investigators and regulators have used to identify outlier prescribers.
Brand versus generic prescribing rates also track specialty. Primary care providers, who often prescribe off-patent medications for chronic conditions, show the highest generic rates — frequently above 80 percent. Specialists in oncology, HIV treatment, and rheumatology show much higher brand rates because the drugs in those classes often lack generic equivalents. The data makes it straightforward to compute a provider-level generic prescribing rate and compare it to specialty peers, a metric that pharmacy benefit managers and health plans use in formulary compliance programs.
Geographic Variation in Drug Utilization
State-level aggregation reveals dramatic variation in drug utilization patterns that reflect both clinical practice norms and socioeconomic factors. Opioid prescribing rates per 1,000 Medicare beneficiaries have historically varied by a factor of ten or more between the lowest-prescribing states (Hawaii, New York, California) and the highest-prescribing states (Alabama, Tennessee, West Virginia, Kentucky). These geographic disparities mirror findings from the DEA's Automation of Reports and Consolidated Orders System (ARCOS) data on opioid wholesale shipments, but the Part D data adds provider-level granularity that ARCOS — which captures shipments to pharmacies and hospitals rather than to individual prescribers — cannot provide.
Geographic variation in brand prescribing rates, average drug cost per claim, and total beneficiary drug spending is similarly pronounced. Rural states with older, sicker beneficiary populations and fewer provider alternatives for specialty care often show higher per-beneficiary drug costs. Metropolitan areas on the coasts show faster uptake of newly approved specialty drugs, partly because academic medical centers and specialty practices are more concentrated there.
The Opioid Crisis in Public Data
The Medicare Part D Prescribing Data played a central role in journalistic and regulatory investigations of the opioid crisis. When CMS released the first edition in 2014 covering fiscal year 2012, ProPublica and STAT News were among the first to analyze it. ProPublica's Prescriber Checkup tool, later expanded as Dollars for Docs, combined Part D prescribing records with Open Payments pharmaceutical payment data and allowed readers to look up any Medicare provider by name, specialty, or location.
The methodology for identifying outlier prescribers is straightforward: compute each provider's opioid claim rate per beneficiary (or per total prescription), then compare it to the distribution of peers within the same specialty and state. Providers in the top one percent of opioid claims within their specialty peer group — especially those also receiving manufacturer payments for opioid products — became the starting point for investigative reporting. The data revealed that fewer than one percent of Part D opioid prescribers accounted for more than 25 percent of all opioid claims, a concentration that was not apparent from any other public data source.
CMS has used the prescribing data to support its opioid exclusion program. Under 42 CFR Part 423, CMS may exclude a provider from Medicare Part D prescribing authority if the provider is identified as a high-risk prescriber based on claims data. CMS publishes the list of providers currently under a prescribing limitation, and that list can be joined to the prescribing data to examine the pre-exclusion prescribing patterns that triggered action. The OIG Exclusions List (LEIE), which covers broader healthcare fraud and misconduct, is a separate but related dataset that should be joined to Part D records in any comprehensive compliance review.
The Specialty Drug Revolution
The 2020–2024 Part D prescribing data document a structural shift in pharmaceutical spending driven by two categories: GLP-1 receptor agonists for obesity and diabetes, and cancer immunotherapy.
Semaglutide (Ozempic, Wegovy) and tirzepatide (Mounjaro, Zepbound) began appearing in the top ten drugs by total Part D cost in 2022 and reached the top three by 2024. A single provider who writes GLP-1 prescriptions for a hundred patients may show a tot_drug_cst figure exceeding $500,000 annually for those products alone, because list prices for injectable semaglutide exceeded $1,000 per month. Because Part D drug cost figures in the public data are gross costs before manufacturer rebates, the numbers overstate net federal spending — GLP-1 manufacturers offer substantial rebates to secure formulary placement — but the data correctly captures the scale of the shift in prescribing volume.
Cancer immunotherapy agents (checkpoint inhibitors, CAR-T cell therapy billing-adjacent claims, targeted kinase inhibitors) constitute the other major cost driver. These drugs frequently exceed $150,000 per patient per year at list price and are prescribed almost exclusively by oncology specialists. The concentration of high-cost prescribing in a small number of academic oncology centers is visible in the provider-level data: a handful of NCI-designated cancer center oncologists account for a disproportionate share of the highest-cost drug claims in the file.
How to Access the Data
The canonical access point is the CMS Open Data Portal at data.cms.gov. Navigate to “Medicare Part D Prescribers by Provider” in the provider summary section. The dataset is available in three cuts: by provider only (aggregated across all drugs for a provider), by provider and drug (the full row-per-provider-per-drug file), and by geography (aggregated at state or national level). The by-provider-and-drug file is the most useful for investigative work.
CMS also exposes the data through a Socrata API endpoint. Queries can filter by state, specialty, or drug name, returning CSV or JSON without downloading the full multi-gigabyte file. For bulk analysis, the direct CSV download is faster and avoids pagination overhead. The Socrata API is rate-limited and throttles large offset-based paginations, so for full dataset downloads the direct bulk file is preferable.
Python: Download, Filter, and Analyze Opioid Prescribing
The following script downloads the Medicare Part D Prescribers by Provider and Drug file via the CMS Socrata API, filters to opioid drugs using generic name matching, and computes total claims and average cost per claim by state and by specialty. The CMS API caps responses at 50,000 rows per request, so the script paginates automatically.
import pandas as pd
import requests
from io import StringIO
# ------------------------------------------------------------------
# Step 1: Download Medicare Part D Prescribers by Provider dataset
# CMS publishes annual files at:
# https://data.cms.gov/provider-summary-by-type-of-service/medicare-part-d-prescribers/medicare-part-d-prescribers-by-provider
#
# Each row = one provider x one drug combination.
# Key columns:
# prscrbr_npi -- provider NPI
# prscrbr_last_org_name, prscrbr_first_name
# prscrbr_type -- specialty label (e.g. "Internal Medicine")
# prscrbr_state_abrvtn -- 2-letter state
# brnd_name -- brand drug name (blank for generics)
# gnrc_name -- generic drug name (always populated)
# tot_clms -- total claims
# tot_day_suply -- total days supply
# tot_drug_cst -- total drug cost ($)
# tot_benes -- beneficiary count
# bene_avg_age -- average beneficiary age
# ------------------------------------------------------------------
# CMS Socrata API endpoint for the 2022 dataset (update year as needed)
PART_D_URL = (
"https://data.cms.gov/resource/4euq-7snb.csv" # 2022 Part D by provider
)
# ------------------------------------------------------------------
# Step 2: Build an opioid drug filter
# Opioid generics typically match these substrings (non-exhaustive):
# ------------------------------------------------------------------
OPIOID_TERMS = [
"oxycodone", "hydrocodone", "morphine", "fentanyl",
"oxymorphone", "hydromorphone", "methadone", "tramadol",
"buprenorphine", "codeine", "tapentadol", "meperidine",
]
def is_opioid(drug_name: str) -> bool:
name = str(drug_name).lower()
return any(term in name for term in OPIOID_TERMS)
# ------------------------------------------------------------------
# Step 3: Paginate the full dataset (CMS caps at 50,000 rows/request)
# ------------------------------------------------------------------
LIMIT = 50_000
offset = 0
frames = []
while True:
params = {
"$limit": LIMIT,
"$offset": offset,
"$order": "prscrbr_npi",
}
resp = requests.get(PART_D_URL, params=params, timeout=120)
resp.raise_for_status()
chunk = pd.read_csv(StringIO(resp.text))
if chunk.empty:
break
frames.append(chunk)
if len(chunk) < LIMIT:
break
offset += LIMIT
df = pd.concat(frames, ignore_index=True)
print(f"Total rows downloaded: {len(df):,}")
# ------------------------------------------------------------------
# Step 4: Filter to opioid drugs
# ------------------------------------------------------------------
df["is_opioid"] = df["gnrc_name"].apply(is_opioid)
opioids = df[df["is_opioid"]].copy()
# Numeric coercion (CMS redacts counts < 11 as blank)
for col in ["tot_clms", "tot_drug_cst", "tot_day_suply"]:
opioids[col] = pd.to_numeric(opioids[col], errors="coerce")
# ------------------------------------------------------------------
# Step 5: Total opioid claims and average cost per claim by state
# ------------------------------------------------------------------
state_summary = (
opioids.groupby("prscrbr_state_abrvtn")
.agg(
total_claims=("tot_clms", "sum"),
total_cost=("tot_drug_cst", "sum"),
provider_count=("prscrbr_npi", "nunique"),
)
.assign(avg_cost_per_claim=lambda x: x["total_cost"] / x["total_claims"])
.sort_values("total_claims", ascending=False)
.reset_index()
)
state_summary.columns.name = None
print("\nTop 10 states by opioid claim volume:")
print(state_summary.head(10).to_string(index=False))
# ------------------------------------------------------------------
# Step 6: Opioid claims and average cost per claim by specialty
# ------------------------------------------------------------------
specialty_summary = (
opioids.groupby("prscrbr_type")
.agg(
total_claims=("tot_clms", "sum"),
total_cost=("tot_drug_cst", "sum"),
provider_count=("prscrbr_npi", "nunique"),
)
.assign(avg_cost_per_claim=lambda x: x["total_cost"] / x["total_claims"])
.sort_values("total_claims", ascending=False)
.reset_index()
)
specialty_summary.columns.name = None
print("\nTop 10 specialties by opioid claim volume:")
print(specialty_summary.head(10).to_string(index=False))
Running this script against the 2022 dataset yields a state summary table in which West Virginia, Alabama, and Tennessee consistently appear in the top tier of per-provider opioid claim volume, while Hawaii, New York, and Minnesota anchor the low end. The specialty summary shows pain management, family practice, and internal medicine as the top three specialties by total opioid claim count, but addiction medicine and anesthesiology show the highest average cost per opioid claim, reflecting the use of buprenorphine (Suboxone) for opioid use disorder treatment — a higher-cost drug than many legacy opioid analgesics.
Identifying Outlier Prescribers: The ProPublica Methodology
The outlier identification methodology used by ProPublica in the Prescriber Checkup and subsequent investigations follows a straightforward peer-comparison approach. For each provider, compute the opioid claims per beneficiary (tot_clms / tot_benes for opioid rows). Then compute the mean and standard deviation of that metric across all providers with the same specialty and state. A provider more than two standard deviations above the specialty mean is flagged for review; a provider more than three standard deviations above is a high-priority outlier.
This method has limitations. Specialty labels in Part D are derived from Medicare enrollment records and are sometimes imprecise — a provider enrolled as “Internal Medicine” may practice in a pain clinic context not reflected in the specialty field. Volume thresholds matter too: a provider with 12 opioid claims (just above the suppression cutoff) who happens to specialize in palliative care for cancer patients may show a high per-beneficiary opioid rate for clinically appropriate reasons. Investigators typically apply minimum volume filters (e.g., at least 50 total opioid claims) before flagging outliers, and they cross-reference the Open Payments data to ask whether the high-prescribing provider is also receiving manufacturer payments for opioid products.
Joining to Other Federal Datasets
The NPI key in the Part D data connects to four other federal datasets with high join success rates:
- NPPES Provider Registry (nppes.cms.hhs.gov) — the authoritative source for provider specialty taxonomy, practice address, and organization affiliation. The NPPES bulk download adds standardized taxonomy codes (e.g., 207Q00000X for Family Medicine) that are more consistent than the specialty labels in the Part D file itself.
- CMS Open Payments (openpaymentsdata.cms.gov) — manufacturer payments to physicians, advanced practice providers, and teaching hospitals reported under the Physician Payments Sunshine Act. Joining by NPI surfaces the payment-to-prescribing correlation that is the central variable in conflict-of-interest research. The cms-open-payments article on this site covers the join methodology in detail.
- HHS OIG Exclusions List (LEIE) (oig.hhs.gov/exclusions) — providers excluded from Medicare and Medicaid for fraud, patient abuse, or other misconduct. Any provider appearing in both the Part D prescribing data and the LEIE is prescribing under a program they are legally barred from. The hhs-oig-exclusions article covers LEIE structure and screening automation.
- DEA ARCOS — opioid wholesale shipment data by pharmacy through 2012, released in litigation. ARCOS provides shipment volumes at the pharmacy level; combining it with Part D data at the prescriber level for the same geographic area and time period allows cross-validation of opioid flow estimates, though the temporal overlap with publicly released ARCOS data is limited.
- FDA FAERS (Adverse Event Reporting System) — voluntary adverse event reports by drug. Joining FAERS drug name strings to Part D generic names surfaces drugs with both high prescribing volume and high adverse event report rates, a signal used in pharmacovigilance research.
Data Caveats
Several limitations shape how the data should be interpreted. First, cost figures are gross costs before rebates. Manufacturer rebates — especially for GLP-1 agonists, diabetes drugs, and some cancer drugs — can reduce net federal spending by 30 to 60 percent relative to the gross cost figure. The public data cannot be used to determine actual federal net spending per drug.
Second, the suppression of claims below eleven means the dataset undercounts total claims and beneficiaries for any drug with a diffuse prescribing pattern. Drugs written by many providers each writing a small number of prescriptions — typical of newly approved drugs in their first year on market — are disproportionately suppressed.
Third, the data reflects dispensed prescriptions, not prescribed prescriptions. If a beneficiary receives a prescription but does not fill it — due to cost, formulary exclusion, or preference — it does not appear in the data. Adherence variation across drugs, conditions, and populations introduces a systematic bias that differs from true prescribing intent.
Fourth, prescriber specialty is assigned at the provider level and does not vary by claim. A provider enrolled as “General Practice” who prescribes chemotherapy to three patients will appear in the oncology drug rows under that general practice label, making specialty-based peer grouping imperfect for generalist providers working outside their primary specialty.
The Regulatory and Research Ecosystem
The Medicare Part D Prescribing Data is cited in hundreds of peer-reviewed studies each year covering topics from antibiotic prescribing stewardship to opioid policy to formulary design effects. The journal JAMA Internal Medicine and Annals of Internal Medicine have published repeated analyses using the dataset to examine geographic variation, specialty variation, and the effects of drug rebate policy.
On the regulatory side, CMS, the HHS Office of Inspector General, and the DOJ have all used Part D prescribing data in healthcare fraud investigations. The False Claims Act allows qui tam relators — whistleblowers with inside knowledge — to file complaints alleging Medicare fraud; the public prescribing data often corroborates or contradicts claims in those complaints by providing an independent measure of how a provider's prescribing volume compared to peers before and after the alleged fraudulent conduct.
State attorneys general investigating pharmaceutical manufacturers for deceptive marketing of opioids have used the data to link manufacturer promotional spending (from Open Payments or from litigation discovery) to prescribing changes at the provider level in their states. The Kentucky, Ohio, and West Virginia opioid manufacturer lawsuits each incorporated Part D prescribing analyses as supporting evidence.
Related writing: CMS Open Payments covers the Sunshine Act dataset that links manufacturer payments to prescribers — join it to Part D by NPI to surface the payment-to-prescribing signal.
HHS OIG Exclusions (LEIE) covers the federal healthcare fraud blacklist — screen every NPI in your Part D analysis against the LEIE to detect excluded providers still generating claims.
CDC Drug Overdose Mortality Data documents the three federal mortality datasets that quantify the opioid epidemic at the county and state level — the demand-side counterpart to the supply-side prescribing signal in Part D.