Technical writing

USCIS H-1B Data: The Federal Record of Who Sponsors Skilled Foreign Workers

· 11 min read· AI Analytics
USCISH-1BImmigrationSkilled VisasFederal Data

Every spring a federal lottery decides which companies may hire which foreign engineers, researchers, and analysts for the year ahead—demand routinely runs several times the number of visas the law allows. USCIS records the result employer by employer in the H-1B Employer Data Hub: roughly 764,000 rows from 2009 through 2023, one per sponsoring company per fiscal year, each carrying the count of petitions approved and denied. It is the most granular public answer to a politically charged question—who actually sponsors skilled foreign workers in the United States, and how the government's willingness to approve them has moved with the administration in power.

This article covers what the H-1B Employer Data Hub is and how the visa is framed in statute; the specialty-occupation test that defines who qualifies; the annual numerical cap, the advanced-degree exemption, and the lottery that demand routinely triggers; the cap-exempt universities, affiliated nonprofit research organizations, and government research entities that sit outside the numbers game; the structure of the data and the difference between initial and continuing petitions; which employers sponsor the most workers—the large IT-services firms and the major technology companies—and how approval and denial rates shifted across administrations; the pairing with the Department of Labor's Labor Condition Application and prevailing-wage data that documents the wage side of the program; a Python workflow that aggregates approvals by employer and by fiscal year and computes approval rates over time; and the caveats—petition counts are not headcounts, employer names are messy, and a denial is not the end of the story—that every analyst must internalize before drawing conclusions.

What the dataset is

United States Citizenship and Immigration Services, USCIS, is the agency within the Department of Homeland Security that adjudicates immigration benefits—including the petitions through which US employers ask to employ foreign workers in the H-1B nonimmigrant category. For most of the program's history the public knew the H-1B story only in aggregate: a national cap, a count of registrations, a press release. The H-1B Employer Data Hub, which USCIS launched to make the program transparent at the level of the individual sponsor, changed that. The Hub publishes, for each fiscal year, the petitions USCIS acted on broken out by employer, with the outcome recorded as counts of approvals and denials. Surfaced here from the Hub's downloadable files, the record comprises roughly 764,000 employer-petition rows spanning fiscal years 2009 through 2023.

In our database this record is stored as the table uscis_h1b, with the grain of one row per employer per fiscal year: a company that sponsored H-1B workers in five different years contributes five rows, one per year, each summarizing that year's petitions for that company. The columns identify the sponsoring employer, its location and industry, and the outcome counts split four ways—initial versus continuing, approval versus denial:

fiscal_year            -- the federal fiscal year of the adjudication
employer               -- name of the sponsoring (petitioning) employer
tax_id                 -- last four digits of the employer's tax ID
state                  -- employer / worksite state
city                   -- employer / worksite city
zip                    -- employer / worksite ZIP code
naics                  -- North American Industry Classification System code
initial_approval       -- new-employment petitions approved
initial_denial         -- new-employment petitions denied
continuing_approval    -- extension / amendment petitions approved
continuing_denial      -- extension / amendment petitions denied

The four outcome columns are the substantive payload, and the initial-versus-continuing split is the one analysts most often miss. An initial petition is for new employment—a worker the employer has not previously employed in H-1B status, or a worker changing employers—and it is the category subject to the annual cap and the lottery. A continuing petition extends or amends the status of a worker the employer already has, and it is generally not counted against the cap. Conflating the two badly distorts any reading of the data: a large IT-services firm with a long-tenured H-1B workforce shows enormous continuing-approval counts that have nothing to do with how many new lottery slots it won. The tax_id field carries only the last four digits of the employer's tax identification number rather than a precise employer identifier, and the employer name is free text—both facts that the caveats section will return to, because they make company-level aggregation harder than the clean column layout suggests.

The H-1B visa and the statutory frame

The H-1B is a nonimmigrant visa category created by the Immigration Act of 1990, which restructured the employment-based provisions of the Immigration and Nationality Act. It lets a US employer temporarily employ a foreign worker in a specialty occupation—a job that requires the theoretical and practical application of a body of highly specialized knowledge and, as a normal minimum for entry, a bachelor's degree or higher in the specific specialty (or its equivalent). The category exists to let US employers fill positions for which they assert they need particular professional expertise, and over three decades it has become the central skilled-immigration channel for the country's technology and professional sectors. It is the principal route by which a foreign software engineer, data scientist, financial analyst, physician, or university researcher comes to work in the United States in a degree-requiring role.

The H-1B is, importantly, a dual-intent and employer-sponsored, temporary visa. It is initially granted for up to three years and may be extended, generally to a maximum of six years, with longer extensions available for workers far enough along in the employment-based green card process. “Dual intent” means an H-1B holder may simultaneously pursue permanent residence without jeopardizing the temporary visa—which is why the H-1B is, in practice, the most common on-ramp to an employment-based green card. The visa is tied to the sponsoring employer: the worker may not simply switch jobs, but must have a new employer file a new petition. This employer-tethering is the reason the data is organized by employer and the reason the program is so consequential to the firms that depend on it—the right to employ a particular skilled worker is, quite literally, granted to the company, not the person.

Two agencies share the program. The Department of Labor (DOL) handles the first step—the employer must file a Labor Condition Application attesting to wage and working conditions, discussed at length below—and USCIS handles the second step, adjudicating the H-1B petition itself on Form I-129. The Data Hub captures the USCIS half: the petition and its outcome. Because the adjudication is discretionary in part—USCIS evaluates whether the job truly qualifies as a specialty occupation, whether the worker is qualified, and whether the employer-employee relationship is genuine—the approval-versus-denial split is not a mechanical formality. It is the visible trace of how strictly the agency is reading the statute at a given moment, which is precisely why the approval rate moves with administrations.

The cap, the advanced-degree exemption, and the lottery

The defining feature of the H-1B program is that it is numerically capped. Congress limits the number of new H-1B workers admitted each fiscal year. The base cap is commonly 65,000 per year, with an additional 20,000 set aside for workers who hold a master's degree or higher from a US institution—the advanced-degree exemption. The cap applies to initial, cap-subject petitions; it does not apply to extensions of existing workers, which is the statutory basis for the initial-versus-continuing split in the data.

For most of the program's recent history, demand has routinely and dramatically exceeded the cap. When more cap-subject petitions are filed than there are visas available, USCIS conducts a random lottery to select which will be processed. USCIS moved the process to an electronic registration system, in which employers first submit a short registration for each prospective worker and only the registrations selected in the lottery proceed to a full petition—a change that lowered the cost of entering and, in turn, exposed the program to gaming through duplicate and speculative registrations, which the agency has since worked to curb with a beneficiary-centric selection process. The lottery has a profound, often-overlooked consequence for the data: because selection is random among cap-subject filings, the count of initial approvals for any employer in a capped category is governed as much by chance and by how many registrations the employer submitted as by the firm's underlying hiring strategy. The Data Hub records who won; it does not record who entered and lost.

Not every H-1B petition is subject to the cap. The statute exempts petitions filed by, or on behalf of workers who will be employed at, institutions of higher education, nonprofit research organizations affiliated with universities, and government research organizations. These cap-exempt employers can file H-1B petitions year-round without competing in the lottery, which is why universities, academic medical centers, and affiliated research institutes appear steadily in the data with stable approval patterns rather than the spiky, lottery-driven counts of the cap-subject technology and IT-services firms. The cap-exempt carve-out is also a meaningful analytic dividing line: a researcher reading the Hub to understand the lottery should separate cap-exempt sponsors out, because their numbers answer a different question entirely—the academic and public-research demand for skilled foreign labor, unmediated by the cap.

Who sponsors: IT services and the technology sector

The single most-asked question of this dataset is also the one it answers most directly: which employers sponsor the most H-1B workers?Aggregating approvals by employer across years produces a leaderboard that has been remarkably stable in its broad composition, and it falls into two recognizable groups.

The first and historically dominant group by petition volume is the large IT-services and consulting firms—the global outsourcing and systems-integration companies, many headquartered in or with deep roots in India, that place technology staff on contracts at client sites across the United States. Their business model is labor-intensive and project-based, which generates very high petition volumes, heavy use of continuing petitions to extend staff across multi-year engagements, and a worksite that is frequently a client's office rather than the petitioner's own. The second group is the major technology companies—the large platform, semiconductor, and software firms—which sponsor in high volume but with a different profile: more often hiring the worker into their own workforce for a permanent role, more advanced-degree filings, and a pipeline that more frequently leads to an employment-based green card. Banks, financial-services firms, large consultancies, universities, and academic medical centers fill out the rest of the upper ranks. The distinction between the two groups matters for interpretation: a high raw approval count can reflect a body-shop model built on volume and extensions just as easily as a deliberate strategy of recruiting scarce specialized talent, and the data alone does not separate them without attention to the initial-versus-continuing split and the worksite.

Approval and denial rates across administrations

The most analytically charged feature of the Data Hub is that it lets an observer watch the approval and denial rates move over time—and those rates are not a constant of nature. Because USCIS exercises judgment in deciding whether a job is a genuine specialty occupation, whether the offered wage and duties fit, and whether the employer-employee relationship is bona fide, the denial rate is a sensitive instrument for the agency's posture. For much of the program's history, initial-petition approval rates sat very high, with denials a small single-digit share.

That changed markedly in the second half of the 2010s. A series of policy memoranda and adjudication practices—heightened scrutiny of the specialty-occupation definition, a stricter reading of the rules governing workers placed at third-party (client) worksites, a surge in Requests for Evidence (RFEs) demanding additional documentation, and the rescission of a longstanding policy of deference to prior approvals—drove denial rates for initial petitions sharply upward, with the steepest increases falling on the IT-services firms whose third-party-placement model was the explicit target. Litigation and subsequent policy reversals later relaxed several of those practices, and approval rates recovered. The Data Hub makes this arc legible: the same employers, filing for the same kinds of jobs, saw their denial rates rise and then fall with the change in adjudication policy rather than with any change in the underlying labor market. This is the clearest illustration in the dataset of a general truth about regulatory data—an approval rate measures the regulator's behavior at least as much as the regulated party's—and it is why year-over-year comparisons must always be read against the policy backdrop of the fiscal year in question.

Pairing with Department of Labor LCA and prevailing-wage data

The USCIS Data Hub describes the petition and its outcome, but it is largely silent on wages—and the wage side of the H-1B program is where much of the public-policy argument lives. To see the wages, the Hub must be paired with a separate federal dataset: the Department of Labor's Labor Condition Application (LCA) disclosure data. Before an employer may file an H-1B petition with USCIS, it must first file an LCA (Form ETA-9035) with DOL, attesting to several conditions: that it will pay the H-1B worker at least the prevailing wage for the occupation in the area of employment (or the actual wage paid to similarly employed workers, whichever is higher), that employing the worker will not adversely affect the working conditions of US workers, and that there is no strike or lockout. DOL certifies the LCA largely on the employer's attestation, and it publishes the disclosure data—employer, job title, worksite, the prevailing wage, and the offered wage—for every LCA.

Read together, the two datasets describe the program's employer-and-wage anatomy. The LCA data shows what an employer attested it would pay and at what worksite; the USCIS Hub shows whether the corresponding petitions were actually approved. Joining them—by employer name and geography, since neither carries a clean shared key—lets an analyst ask the questions that neither answers alone: do the highest-volume sponsors pay near the bottom of the prevailing-wage scale, or well above it? How does the offered-versus-prevailing wage gap differ between the IT-services group and the major technology companies? Did the wave of denials in the late 2010s fall hardest on the lowest-wage filings? The pairing is imperfect—the LCA is an attestation, not a guarantee of what is ultimately paid, and the join on employer name is fuzzy—but it is the only way the public record connects a sponsorship to a wage, and it is the foundation of nearly every serious empirical study of whether the program complements or undercuts the US skilled-labor market.

Analytical uses

A national, employer-resolved, year-stamped record of skilled-visa sponsorship supports a distinctive set of analyses that aggregate program statistics cannot.

Ranking and profiling sponsors is the most immediate use: aggregating approvals by employer to identify the largest sponsors, then profiling each by its initial-versus-continuing mix, its geographic footprint, and its industry code to distinguish the volume-driven IT-services model from the talent-acquisition model of the technology firms. Tracking approval and denial rates over time—overall, by employer, and by employer group—turns the Hub into a longitudinal record of how strictly the agency has adjudicated, and it is the basis for any credible claim about whether a given administration tightened or loosened the program.

Geographic and industry concentration exploits the state, city, and NAICS fields to map where sponsored employment clusters—the technology corridors, the financial centers, the university towns—and which industries beyond software depend on the visa. Finally, the LCA-joined wage analysis brings the Department of Labor data to bear: linking sponsorship to attested prevailing and offered wages to study whether the program is recruiting scarce, high-paid expertise or supplying mid-wage labor, the central empirical question in the policy debate. Used together, these analyses turn a count of petitions into a portrait of how the United States imports skilled labor, who benefits, and how the rules of access shift with the politics of the moment.

Python workflow: approvals by employer and approval rates over time

The script below loads one or more fiscal-year files from the USCIS H-1B Employer Data Hub, normalizes the four outcome columns into total approvals and denials per employer per year, and computes two of the core metrics: the top sponsoring employers by total approvals, and the approval rate by fiscal year. No API key is required for public data. Because the Hub's per-year file names and exact column names change between releases, the script isolates the download URL pattern in one helper and discovers the working approval, denial, and employer column names at runtime rather than hard-coding them; any production use should be validated against the current Data Hub download page, should separate cap-exempt sponsors from cap-subject ones, and should keep the initial and continuing counts distinct when the lottery is the object of study.

import requests, csv, io
import pandas as pd

# USCIS H-1B Employer Data Hub -- no API key required for public data.
#
# USCIS publishes the Employer Data Hub as downloadable per-fiscal-year
# files (CSV/XLSX) and as a searchable web hub. Each row is one employer
# in one fiscal year with counts of:
#   Initial Approval, Initial Denial, Continuing Approval, Continuing Denial
# plus employer name, city, state, ZIP, NAICS, and tax-ID (last four digits).
# The per-year file names change between releases, so isolate the URL
# pattern here and confirm it against the current Data Hub download page
# rather than hard-coding it across the whole pipeline.
HUB = "https://www.uscis.gov/sites/default/files/document/data"


def year_url(fy):
    # USCIS publishes one downloadable export per fiscal year, e.g.
    # .../data/h1b_datahubexport-2023.csv -- confirm the exact name on the
    # current Data Hub download page, which USCIS revises between releases.
    return f"{HUB}/h1b_datahubexport-{fy}.csv"


def load_year(fy, url=None):
    url = url or year_url(fy)
    r = requests.get(url, timeout=300)
    r.raise_for_status()
    text = r.content.decode("utf-8", errors="replace")
    df = pd.read_csv(io.StringIO(text))
    df.columns = [c.strip() for c in df.columns]
    df["fiscal_year"] = fy
    return df


def _col(cols, *needles):
    # Return the first column whose name contains all of the needles.
    for c in cols:
        if all(n.lower() in c.lower() for n in needles):
            return c
    return None


def normalize(df):
    cols = df.columns
    ia = _col(cols, "initial", "approval")
    idn = _col(cols, "initial", "denial")
    ca = _col(cols, "continuing", "approval")
    cd = _col(cols, "continuing", "denial")
    emp = _col(cols, "employer") or _col(cols, "petitioner")
    for c in (ia, idn, ca, cd):
        df[c] = pd.to_numeric(df[c], errors="coerce").fillna(0)
    df["approvals"] = df[ia] + df[ca]
    df["denials"] = df[idn] + df[cd]
    df["employer"] = df[emp].astype(str).str.upper().str.strip()
    return df[["fiscal_year", "employer", "approvals", "denials"]]


def analyze(frames):
    df = pd.concat([normalize(f) for f in frames], ignore_index=True)

    # --- Top sponsoring employers (all years) ----------------------------
    by_emp = (df.groupby("employer")[["approvals", "denials"]]
                .sum().sort_values("approvals", ascending=False))
    print("Top 10 H-1B sponsors by approvals:")
    for emp, row in by_emp.head(10).iterrows():
        print(f"  {emp[:40]:40s} {int(row.approvals):>9,}")

    # --- Approval rate by fiscal year ------------------------------------
    by_fy = df.groupby("fiscal_year")[["approvals", "denials"]].sum()
    by_fy["rate"] = by_fy["approvals"] / (by_fy["approvals"] + by_fy["denials"])
    print("\nApproval rate by fiscal year:")
    for fy, row in by_fy.iterrows():
        print(f"  FY{fy}: {row.rate:.1%} "
              f"({int(row.approvals):,} approved / "
              f"{int(row.denials):,} denied)")
    return by_emp, by_fy


# Pull one fiscal-year file from the live Data Hub and run the aggregation.
# Add more years (e.g. range(2017, 2024)) to build the cross-year picture.
analyze([load_year(2023)])
# frames = [load_year(fy) for fy in range(2017, 2024)]
# analyze(frames)

Two practical notes apply. First, the employer aggregation in the script collapses on an upper-cased, trimmed employer name, which is a deliberately coarse first pass: the same company appears in the raw data under many spellings, suffixes, and former names, so any serious sponsor leaderboard must run an entity-resolution step—normalizing legal suffixes, reconciling subsidiaries to parents, and reviewing the top of the list by hand—before the ranking can be trusted. Second, the approval-rate metric mixes initial and continuing petitions; for a question about the lottery and new hiring, restrict the numerator and denominator to the initial columns, and for a question about the agency's adjudication posture, compute initial and continuing rates separately, since extensions of existing workers are approved at a very different rate than new-employment petitions.

Limitations and analytical caveats

The H-1B Employer Data Hub is the most granular public record of skilled-visa sponsorship in the United States, but it carries structural limitations that an analyst must internalize before drawing conclusions from it.

Petition counts are not headcounts. Each row counts petitions adjudicated, not unique workers employed. A single worker can generate several petitions over the years—the initial filing, then extensions, then an amendment for a new worksite—each one a separate count. Summing approvals over time therefore counts the same person multiple times and badly overstates how many distinct workers an employer sponsors. The data answers “how many petitions did this employer get approved this year?” not “how many H-1B workers does this employer have?”—and the initial-versus-continuing split is the only lever the data gives you to approximate the difference.

Employer names are messy and the tax-ID field is coarse.The petitioner name is free text entered on the petition, so the same firm appears under many variants, parent companies and subsidiaries file under different names, and the tax_id field carries only the last four digits of the tax identification number rather than a precise employer key. Without an entity-resolution step, a naive aggregation will fragment a single large sponsor across several name variants—understating its true volume—or, conversely, merge unrelated firms with similar names. Every published H-1B leaderboard rests on a name-matching judgment, and reasonable analysts produce somewhat different rankings depending on how aggressively they consolidate.

A denial is not the end, and an approval is not a hire.A denied petition can be refiled, appealed, or followed by a fresh attempt in a later year, so a high denial count in one year does not mean the worker never obtained status. In the other direction, an approved petition does not guarantee that the worker actually started, stayed, or remained with the employer—approvals can go unused, and the worker may later change employers. The Hub records adjudication outcomes, not employment outcomes. Relatedly, because the lottery is random among cap-subject filings, an employer's initial-approval count reflects how many registrations it submitted and the luck of the draw, not a clean signal of intended demand—the data shows who won, never who entered and lost.

The Hub is silent on wages, occupation detail, and the worker. The USCIS data carries no wage, no detailed job title, and nothing about the worker's nationality, education, or pay—those live in the Department of Labor LCA disclosures and other sources, joinable only by fuzzy employer-and-geography matching. An analysis that reads the Hub alone can describe who sponsors and how often, but it cannot, on its own, speak to whether the program pays fairly, recruits genuinely scarce skills, or affects US workers. Those are the questions the public most cares about, and they require the join the dataset does not make for you.

Held with these caveats in mind, the uscis_h1b table is a uniquely valuable resource: an employer-resolved, year-stamped, outcome-coded record of who asked to hire skilled foreign workers and whether the federal government said yes—the sponsorship half of a program whose wage half lives next door in the Labor Department's data and whose politics live in the rise and fall of the denial rate it lays bare.

Related writing

BLS Occupational Employment and Wage Statistics: The Federal Database Behind Median Salary Data for Every US Occupation — The prevailing-wage attestations behind every H-1B petition are benchmarked against occupational wage data, so the BLS survey is the reference point for judging whether a sponsor's offered wage sits at the top or the bottom of the scale for the job and the metro area.

SEC EDGAR Company Registry: The Federal Index That Resolves Every Public Company — Resolving the messy free-text employer names in the H-1B data to real corporate entities—reconciling subsidiaries to parents and consolidating name variants—is exactly the entity-resolution problem the EDGAR company index is built to solve.

IRS Exempt Organizations Business Master File: The Federal Record of 1.3 Million Tax-Exempt Nonprofits — The cap-exempt universities and affiliated nonprofit research organizations that sponsor H-1B workers outside the lottery are the same tax-exempt entities the IRS master file enumerates, making it the tool for separating cap-exempt sponsors from the cap-subject firms.