Technical writing

Medicare Part B Data: Every Procedure Billed to Medicare and What It Paid

· AI Analytics
Federal DataCMSMedicareHealthcare

Every year the Centers for Medicare & Medicaid Services publishes a file that no industry in America would voluntarily release about itself: the exact dollar amount each physician and supplier charged Medicare, how many procedures they performed, and what Medicare actually paid. The Medicare Part B Physician and Supplier Public Use File is price transparency at scale—1 million providers, 12,000 procedure codes, and more than $400 billion in annual payments laid bare in a single CSV.

What Medicare Part B covers

Medicare is divided into parts. Part A covers hospital inpatient stays, skilled nursing facilities, and hospice care. Part B covers everything else that happens outside the hospital walls: physician office visits, outpatient procedures, diagnostic imaging (X-rays, MRIs, CT scans), clinical laboratory tests, durable medical equipment (wheelchairs, oxygen concentrators, CPAP machines), ambulance services, mental health visits, physical and occupational therapy, and preventive services including annual wellness visits and cancer screenings.

Part B is financed by a combination of monthly premiums paid by beneficiaries (set at roughly 25 percent of program costs) and general federal revenues. In 2023, Part B spending exceeded $430 billion—a figure that dwarfs the entire healthcare budgets of most countries. Understanding where that money goes requires the Physician and Supplier Public Use File.

The Physician and Supplier Public Use File

CMS began releasing provider-level Part B payment data publicly in 2014, following a federal court ruling that struck down a decades-old injunction the American Medical Association had obtained in 1979 to block disclosure. The injunction had kept physician payment data hidden for 35 years. Once it fell, CMS published data going back to 2012 and has released annual updates ever since.

The file is published at data.cms.gov/provider-summary-by-type-of-service in two variants: a provider-level summary (one row per provider, aggregating all their services) and a provider-service-level file (one row per provider per HCPCS code). The service-level file is the more analytically powerful of the two.

Dataset structure and key fields

Each row in the provider-service file represents a unique combination of provider and procedure code. The key fields are:

FieldDescription
Rndrng_NPINational Provider Identifier — the unique 10-digit ID for every provider
Rndrng_Prvdr_TypeProvider specialty (e.g., Ophthalmology, Cardiology, Physical Therapy)
HCPCS_CdHealthcare Common Procedure Coding System code (CPT or HCPCS Level II)
HCPCS_DescPlain-English description of the procedure or service
Place_Of_SrvcF (facility) or O (office/non-facility) — affects reimbursement rates
Tot_SrvcsTotal number of services billed for this provider/code combination
Tot_BenesDistinct beneficiaries receiving this service from this provider
Tot_Sbmtd_ChrgTotal amount the provider charged (the “sticker price”)
Tot_Mdcr_Alowd_AmtTotal Medicare allowed amount (what Medicare agreed the service is worth)
Tot_Mdcr_Pymt_AmtTotal Medicare payment (allowed minus beneficiary cost-sharing)
Tot_Mdcr_Stdzd_AmtStandardized payment — geographic wage index removed

A single mid-size ophthalmology practice might appear in hundreds of rows, one for each combination of physician and HCPCS code they billed. The full provider-service file for a recent year typically contains around 9 to 11 million rows.

CPT and HCPCS: the procedure code taxonomy

Every Part B claim is identified by a procedure code. There are two layers. CPT (Current Procedural Terminology) codes are five-digit numeric codes maintained by the American Medical Association, covering physician services: office visits (99202–99215 after the 2021 E&M revisions that eliminated 99201), surgical procedures, and diagnostic tests. HCPCS Level II codes begin with a letter and cover services CPT does not: durable medical equipment (E-codes), drugs administered in the office (J-codes), ambulance services (A-codes), and orthotics and prosthetics (L-codes).

The 2021 revision to Evaluation and Management (E&M) coding deserves special mention because it visibly altered the payment distribution in Part B data. CMS and the AMA overhauled the office visit codes to emphasize medical decision-making complexity over documentation of time spent. Code 99201 (lowest-level new patient visit) was eliminated. Payments for moderate- and high-complexity visits (99214, 99215) increased substantially. Any longitudinal analysis of E&M billing must account for this structural break.

Common high-value HCPCS clusters to watch: J-code drugs (infusions and injections billed under Part B rather than Part D), E-codes (power wheelchairs and respiratory equipment are historical fraud hotspots), and imaging codes in the 70000 series (CT, MRI, nuclear medicine).

Submitted charges vs. allowed amount vs. payment

One of the most striking features of the data is the gap between what providers charge and what Medicare pays. Providers set their own “chargemaster” or submitted charge—a list price that has no formal ceiling. Medicare then applies its fee schedule to determine the allowed amount, and pays roughly 80 percent of the allowed amount (beneficiaries pay the remaining 20 percent as coinsurance).

For common physician services the markup ratio (submitted charge divided by allowed amount) often runs 2:1 to 4:1. For hospital outpatient departments and certain specialties it can exceed 10:1. A facility billing $1,500 for an MRI that Medicare allows at $250 is not unusual. The submitted charge has no practical effect on Medicare payment—it only matters for patients without insurance who may be billed the full chargemaster rate—but its magnitude signals how providers have priced themselves for commercial payers who negotiate off the chargemaster.

The data makes this visible at the individual provider level for the first time in any systematic way. Researchers have shown that markup ratios vary enormously by specialty, geography, and whether the provider is affiliated with a large health system.

Standardized payments: geographic apples-to-apples

Medicare payment rates vary by geography because the fee schedule incorporates a Geographic Practice Cost Index (GPCI) that adjusts for local wages, office rents, and malpractice costs. A cardiologist in Manhattan is paid more per echocardiogram than one in rural Mississippi—not because Medicare values the service differently, but because the inputs cost more in New York.

To compare utilization and spending patterns across regions, CMS provides the standardized payment amount: the payment CMS would have made if every provider were in the same geographic locality. This removes the wage-index distortion and lets you ask whether physicians in one region are performing more services per beneficiary than in another, independent of price. Population-level research on practice variation—the Dartmouth Atlas tradition—relies on exactly this kind of standardization.

The $6.5 billion revelation of 2014

When CMS released the first modern iteration of provider-level Part B data in April 2014, ProPublica and the New York Times were among the first to analyze it. The resulting coverage identified a remarkable concentration: a small number of individual physicians were receiving tens of millions of dollars per year from Medicare.

The analysis found that the top 1 percent of Medicare providers collectively received more than $6.5 billion in payments in 2012. Radiation oncologists appeared prominently: intensity-modulated radiation therapy (IMRT) involves a technical component (the machine and facility) and a professional component (the physician's supervision and planning), and both components can be billed separately under different HCPCS codes. A single treatment course can generate dozens of billable fractions.

Ophthalmologist Salomon Melgen of Florida received approximately $21 million in Medicare Part B payments in 2012—the highest of any individual physician that year. Melgen billed extensively for Lucentis (ranibizumab, HCPCS J2778), an anti-vascular endothelial growth factor (anti-VEGF) injection used to treat wet age-related macular degeneration and diabetic macular edema. He was later convicted of Medicare fraud in 2017, with prosecutors demonstrating that he had billed for services not rendered and re-used single-dose vials across multiple patients. The Part B data had flagged his billing pattern as an extreme outlier years before his prosecution.

Ophthalmology and the anti-VEGF price controversy

The anti-VEGF drug story is one of the most consequential healthcare pricing debates the Part B data illuminates. Three drugs treat the same conditions:

Clinical trials (CATT, IVAN) found no significant difference in visual outcomes between Avastin and Lucentis at one year. Yet Medicare Part B payments show that a large share of ophthalmologists choose the higher-priced drugs. Because Part B reimburses drugs at average sales price plus 6 percent (ASP+6%), a physician earns a larger absolute margin on a $1,850 injection than on a $60 one—creating a financial incentive that runs opposite to cost-effective prescribing. The Part B data quantifies this incentive at the individual provider level.

Telemedicine and the COVID-era billing surge

The 2020 and 2021 Part B files captured something unprecedented: an explosion of telehealth billing that Medicare had not previously allowed at scale. CMS issued emergency waivers in March 2020 permitting audio-only visits, dropping the requirement that patients be in rural areas, and temporarily allowing billing of telehealth at the same rates as in-person visits.

The billing data showed an enormous shift in E&M codes to telehealth modifiers (modifier GT, modifier 95) and the appearance of new providers with implausibly high telehealth volumes. HHS OIG investigations subsequently identified a wave of telehealth fraud schemes: companies recruiting Medicare beneficiaries for unnecessary telehealth consultations, then billing for DME orders (back braces, knee braces, orthotic devices) that the telehealth physician never clinically evaluated. The Part B data was the earliest systematic signal that something had gone wrong.

High-utilization specialties

Certain specialties reliably generate the largest Part B payments due to the nature and cost of services:

Suppression rules

CMS suppresses any provider-HCPCS combination where the provider served fewer than 11 distinct beneficiaries for that service during the year. This threshold is intended to protect patient privacy by preventing re-identification of individuals who received rare services. The practical effect is that very small practices and highly specialized services are systematically absent from the data, which can cause undercounting in rural and subspecialty analyses.

Downloading and parsing the data

CMS publishes annual files at data.cms.gov/provider-summary-by-type-of-service. Each year's provider-service file is a ZIP-compressed CSV, typically 400–700 MB uncompressed. CMS also exposes an API, but for bulk analysis downloading the full file is more reliable.

import requests, zipfile, io, pandas as pd

# CMS Medicare Part B PUF — Provider and Service (most recent year)
# https://data.cms.gov/provider-summary-by-type-of-service
URL = (
    "https://data.cms.gov/sites/default/files/2024-04/"
    "MUP_PHY_R24P04_0_1_2023_Prov_Svc.zip"
)

r = requests.get(URL, stream=True)
r.raise_for_status()

with zipfile.ZipFile(io.BytesIO(r.content)) as zf:
    name = [n for n in zf.namelist() if n.endswith(".csv")][0]
    df = pd.read_csv(zf.open(name), dtype=str, low_memory=False)

print(df.shape)        # (rows, columns)
print(df.columns.tolist())

Filtering to ophthalmology and anti-VEGF billers

The following code filters the provider-service file to ophthalmology providers and aggregates their anti-VEGF injection billing by NPI and drug code, producing a ranked list of the top billers.

import pandas as pd

# Assume df loaded from download above
# Numeric coercion
numeric_cols = [
    "Tot_Srvcs", "Tot_Benes", "Tot_Sbmtd_Chrg",
    "Tot_Mdcr_Alowd_Amt", "Tot_Mdcr_Pymt_Amt",
    "Tot_Mdcr_Stdzd_Amt",
]
for col in numeric_cols:
    df[col] = pd.to_numeric(df[col], errors="coerce")

# Filter: ophthalmology specialty codes 18 (Ophthalmology) and 41 (Optometry)
ophtho = df[df["Rndrng_Prvdr_Type"].isin(["Ophthalmology", "Optometry"])].copy()

# Anti-VEGF injection HCPCS codes
# J0178 = Aflibercept (Eylea)  J2778 = Ranibizumab (Lucentis)
# J9035 = Bevacizumab (Avastin, off-label)
anti_vegf_codes = ["J0178", "J2778", "J9035"]
avf = ophtho[ophtho["HCPCS_Cd"].isin(anti_vegf_codes)].copy()

# Aggregate by NPI + HCPCS
agg = (
    avf.groupby(["Rndrng_NPI", "Rndrng_Prvdr_Last_Org_Name", "HCPCS_Cd"])
    .agg(
        total_services=("Tot_Srvcs", "sum"),
        total_beneficiaries=("Tot_Benes", "sum"),
        total_payment=("Tot_Mdcr_Pymt_Amt", "sum"),
        total_allowed=("Tot_Mdcr_Alowd_Amt", "sum"),
    )
    .reset_index()
    .sort_values("total_payment", ascending=False)
)

# Top 20 billers
top20 = agg.head(20)
print(top20[["Rndrng_NPI", "Rndrng_Prvdr_Last_Org_Name",
             "HCPCS_Cd", "total_payment", "total_services"]].to_string())

Calculating markup ratios by specialty

To quantify the gap between chargemaster prices and Medicare reimbursement across specialties:

# Markup ratio: submitted charges vs. allowed amount
df["markup_ratio"] = df["Tot_Sbmtd_Chrg"] / df["Tot_Mdcr_Alowd_Amt"].replace(0, float("nan"))

# Specialties with highest median markup
markup_by_specialty = (
    df.groupby("Rndrng_Prvdr_Type")["markup_ratio"]
    .median()
    .sort_values(ascending=False)
    .head(15)
    .reset_index()
    .rename(columns={"Rndrng_Prvdr_Type": "specialty",
                      "markup_ratio": "median_markup"})
)
print(markup_by_specialty.to_string(index=False))

Joining with Open Payments and OIG exclusions

The Part B data becomes most powerful when joined to other CMS datasets:

import pandas as pd

# 1. Load Part B PUF (already downloaded)
partb = pd.read_csv("partb_puf.csv", dtype=str, low_memory=False)

# 2. Load CMS Open Payments General Payments
# https://openpaymentsdata.cms.gov/datasets
openpay = pd.read_csv("OP_DTL_GNRL_PGYR2023.csv", dtype=str, low_memory=False)

# Normalize NPI for join
partb["Rndrng_NPI"] = partb["Rndrng_NPI"].str.strip()
openpay["Covered_Recipient_NPI"] = openpay["Covered_Recipient_NPI"].str.strip()

# Aggregate Open Payments by NPI
openpay["Total_Amount_of_Payment_USDollars"] = pd.to_numeric(
    openpay["Total_Amount_of_Payment_USDollars"], errors="coerce"
)
industry_pay = (
    openpay.groupby("Covered_Recipient_NPI")["Total_Amount_of_Payment_USDollars"]
    .sum()
    .reset_index()
    .rename(columns={
        "Covered_Recipient_NPI": "Rndrng_NPI",
        "Total_Amount_of_Payment_USDollars": "industry_payments_usd",
    })
)

# Aggregate Part B payments by NPI
partb["Tot_Mdcr_Pymt_Amt"] = pd.to_numeric(
    partb["Tot_Mdcr_Pymt_Amt"], errors="coerce"
)
partb_npi = (
    partb.groupby("Rndrng_NPI")["Tot_Mdcr_Pymt_Amt"]
    .sum()
    .reset_index()
    .rename(columns={"Tot_Mdcr_Pymt_Amt": "medicare_payment_usd"})
)

# Join
merged = partb_npi.merge(industry_pay, on="Rndrng_NPI", how="left")
merged["industry_payments_usd"] = merged["industry_payments_usd"].fillna(0)
merged["pay_ratio"] = (
    merged["industry_payments_usd"] / merged["medicare_payment_usd"]
)

# Top providers by Medicare payment, with industry payments shown
top = merged.sort_values("medicare_payment_usd", ascending=False).head(50)
print(top.to_string(index=False))

The join above surfaces providers who received large Medicare Part B payments alongside significant industry payments—a combination worth scrutiny, particularly for specialties where specific high-cost products dominate billing.

What the data cannot tell you

The Part B PUF is a billing record, not a clinical record. High payment totals are not inherently evidence of fraud or inappropriate care—a high-volume retinal surgeon treating hundreds of patients with genuine wet macular degeneration will legitimately appear at the top of the anti-VEGF ranking. The data identifies statistical outliers; clinical context and additional evidence determine whether an outlier represents excellent productivity, aggressive but defensible practice patterns, or actual fraud.

Similarly, the data reflects claims submitted and paid, not outcomes. A provider billing heavily for diagnostic tests might be practicing thorough, high-quality medicine or might be running a volume-maximizing practice with poor care coordination. The Part B data is a starting point for investigation, not a verdict.

Scale and significance

Medicare Part B is one of the largest single payers in the world. Its provider-level payment data represents a level of healthcare financial transparency that most countries have not achieved and most industries would actively resist. The 2014 release that followed the collapse of the AMA injunction demonstrated that public disclosure does not harm physicians as a class—it identifies the extreme outliers while leaving the normal distribution of practice patterns unaffected.

For researchers, journalists, compliance officers, and policy analysts, the Medicare Part B PUF is foundational. It is the closest thing American healthcare has to a universal billing ledger, updated annually, freely downloadable, and indexed to the provider identifiers that connect it to the rest of the CMS data ecosystem.


Related: CMS Open Payments: Financial Conflicts in Medicine, Mapped — industry payments to physicians joined with Part B billing reveal the drug preference patterns that follow the money.

Related: Medicare Part D Prescribing Data: Every Drug, Every Prescriber — the prescription-side complement to Part B's procedure-side view of Medicare spending.

Related: HHS OIG Exclusions: The Federal Healthcare Fraud Blacklist — providers excluded from Medicare after fraud convictions and license actions, joinable to Part B billing records.