Technical writing
FAA Airmen Certification Database: The Federal Record of Every US Pilot and Mechanic
The FAA Airmen Certification Database is the federal registry of every person the government has certified to work in American aviation — not just pilots, but the mechanics who sign off on airframes, the dispatchers who release airline flights, the instructors who train the next generation, and the riggers who pack parachutes. We hold a snapshot of the public version, the Releasable Airmen file, as the table faa_airmen, covering roughly 880,963 certificated airmen, each with a unique FAA-assigned identifier, certificate types and levels, ratings, and medical class.
What the database is, and the file the public actually gets
In aviation law the word is “airman,” a gender-neutral term of art that long predates the modern usage debate, and it covers anyone who holds an FAA certificate to perform a function aboard or in support of civil aircraft. The Federal Aviation Administration issues those certificates and keeps the master record of them at the Civil Aviation Registry in Oklahoma City, the same facility that registers aircraft. Where the aircraft registry answers “what is this airplane and who owns it,” the airmen registry answers “who is qualified to fly, fix, dispatch, or rig it.” Together they are the two halves of the FAA's identity system for civil aviation, and both are keyed to permanent unique identifiers rather than to names.
There is a critical distinction between the database the FAA holds internally and the file the public can download, and conflating the two is the most common error in interpreting this dataset. The internal record is a full personally identifiable file: it contains the airman's date of birth, full mailing and street address, the complete history of certificate actions, medical examination details, and enforcement history. That record is protected under the Privacy Act and is not public. What the public receives is the Releasable Airmen Download, a deliberately thinned extract published at the FAA registry site. It carries, per airman, the unique FAA identifier, name, city/state/country of residence, the certificates held with their type and level, the ratings on each certificate, and the medical class and medical expiration date. It does not carry the street address (suppressed since 2008), the date of birth, or the airman's enforcement and accident history. The releasable file is the only version of this dataset that can be redistributed, and it is the version faa_airmen mirrors.
The download itself is a ZIP archive of fixed-schema CSV files rather than a single table. Pilots and non-pilots are split into separate files, and within each group a “basic” file carries one row per airman (the identity and medical fields) while a “certificate” file carries one row per certificate, because a single airman routinely holds several certificates at once. A working commercial pilot might appear in the certificate file three times — once as a commercial pilot, once as a flight instructor, once as a ground instructor — all joined back to a single row in the basic file by the shared unique identifier. Any analysis has to decide deliberately whether it is counting airmen or counting certificates, because the two numbers are very different.
Certificate types and the pilot pipeline
The certificates in the database fall into two broad families: the pilot certificates that authorize a person to fly, and the non-pilot certificates that authorize the supporting functions. The pilot certificates form a ladder, and understanding that ladder is the key to reading the dataset, because the same airman climbs it over a career and the database records each rung as a distinct level on the certificate.
At the bottom is the Student Pilot certificate. Since a 2016 rule change it no longer expires and is issued separately from the medical certificate, which changed the population dynamics of the file. A student pilot may fly solo under an instructor's endorsement but may not carry passengers. The student certificate is the front door to aviation, and the count of active students is the leading indicator of how many new pilots are entering the system.
Above the student sit the certificates that confer real privileges. The Sport Pilot and Recreational Pilot certificates are limited grades — sport pilot in particular was created in 2004 to lower the barrier to entry, permitting flight in light-sport aircraft without a traditional medical certificate — and both are far less common than the certificate most people picture, the Private Pilot. The private pilot certificate is the workhorse of general aviation: it allows carrying passengers and flying for personal and business travel, but not for hire. The great majority of certificated pilots in the country hold a private certificate as their highest grade and never fly professionally.
The professional grades sit above the private. The Commercial Pilot certificate permits a pilot to be paid to fly — banner towing, aerial survey, charter, agricultural application, flight instruction for compensation. At the top of the ladder is the Airline Transport Pilot (ATP), the highest certificate the FAA issues and the one required to act as a captain (and, since the 2013 rule that followed the Colgan Air 3407 crash, generally as a first officer too) in scheduled airline service. The ATP carries demanding minimums — most prominently 1,500 hours of flight time — and the share of all pilots who hold it is a meaningful proxy for the depth of the professional and airline pipeline. The full progression a career pilot follows is Student to Private to Commercial to ATP, and because the database records the level on each certificate, you can read the entire pipeline directly out of the file by tabulating levels.
The non-pilot certificates are just as important to aviation safety even though they never touch the controls in flight. The largest non-pilot group is the Mechanic, certificated with Airframe and/or Powerplant ratings — the combination universally called an A&P. An A&P mechanic is legally empowered to perform and approve maintenance on civil aircraft, and the count of certificated mechanics is a direct measure of the aviation-maintenance workforce that the airlines and repair stations draw from. Alongside mechanics the database holds Repairmen (certificated for specific tasks at a specific facility), Flight and Ground Instructors (who train and endorse other airmen), Dispatchers (who share legal operational control of airline flights with the captain), Flight Engineers (a fading certificate from the era of three-person cockpits), and Parachute Riggers (certificated to pack and maintain emergency and sport parachutes). Each is a distinct certificate type code in the file, which is what lets the dataset double as a census of these specialized workforces.
Ratings and type ratings
A certificate says what a person is authorized to do; a rating refines it to say on what they may do it. Ratings are recorded against each certificate, and they are where the database stops being a simple roster and becomes a detailed picture of what each airman can actually fly.
The foundational ratings are the category and class ratings, which describe the kind of aircraft. The standard pilot ratings are airplane single-engine land (ASEL), airplane single-engine sea (ASES, for floatplanes), airplane multi-engine land (AMEL), and airplane multi-engine sea (AMES). Beyond airplanes, separate category ratings cover rotorcraft (helicopters and gyroplanes), gliders, and lighter-than-air craft. A pilot accumulates these ratings over time, and the set of ratings on a certificate is a compact summary of the airman's breadth: a pilot rated ASEL, AMEL, and rotorcraft can fly a far wider range of machines than one rated ASEL alone.
Layered on top of the category and class ratings is the instrument rating, which authorizes flight in instrument meteorological conditions — in cloud and low visibility, navigating solely by reference to instruments and air traffic control. The instrument rating is the single most consequential add-on a private pilot can earn, because it transforms a fair-weather certificate into one usable for serious travel, and the presence or absence of an instrument rating across the pilot population is a meaningful measure of how capable that population really is.
The most specific qualification is the type rating. Large or turbojet aircraft — airliners and business jets above defined size and complexity thresholds — require a rating specific to that exact aircraft type. An airline captain does not simply hold an ATP; the certificate lists a type rating for each airliner model the pilot is qualified in, such as a Boeing 737 or an Airbus A320 type. Type ratings are recorded against the certificate in the database, which means the file can, in principle, reveal how many pilots are typed in a given airframe — a fact of direct interest to airlines planning fleet transitions and to anyone studying where pilot supply is concentrated.
The medical certificate
Flying privileges depend not only on a pilot certificate but, for most operations, on a valid medical certificate, and the releasable file records the airman's medical class and expiration date. The FAA issues three classes of medical certificate, graded by the privileges they support and the rigor of the examination.
A first-class medical is required to exercise ATP privileges — to act as an airline captain — and carries the shortest validity and the most demanding examination, including periodic electrocardiograms for older pilots. A second-class medical is required for commercial operations short of airline captain. A third-class medical, the least rigorous, suffices for private, recreational, and student flying. Validity periods step down with class and up with the privileges at stake, and they are shorter for older airmen, so the medical-expiration field in the file is genuinely informative about whether a given airman is currently able to exercise the privileges of their certificate.
Two reforms reshaped the medical picture and, with it, what the file's medical fields mean. The examination process moved online through MedXPress, the FAA system through which an airman files the medical application before visiting an Aviation Medical Examiner, streamlining the paperwork but not changing the underlying classes. Far more consequential was BasicMed, established in 2017, which lets many private pilots fly without holding a current FAA medical certificate at all — instead relying on a one-time qualifying medical, a state-licensed physician's periodic examination, and an online course, within limits on aircraft size, speed, altitude, and passenger count. BasicMed matters for data interpretation because a pilot flying under BasicMed may show an expired or absent FAA medical class in the file while remaining fully legal to fly. The medical fields therefore tell you about the airman's FAA medical certificate specifically, not about their total legal eligibility to fly.
Privacy: the 2008 address suppression and the airman opt-out
The shape of the releasable file is the product of a long privacy history, and that history explains both what the file contains and why its coverage is imperfect. For decades the FAA released airman records that included home addresses, and that practice drew sustained objection from pilots who saw a federal database publishing their residence to anyone who asked — a concern sharpened by identity-theft worries and by the simple fact that a certificate is a professional credential, not a consent to publication of one's home.
In 2008 the FAA changed the policy and suppressed street addresses from the public releasable file. Since then the geographic detail in the public dataset stops at city, state, and country — which is precisely why faa_airmen can support state-level and city-level geography but cannot place an airman at an address. That single change is the reason analyses of this dataset operate at the state or metro level rather than the household level.
Beyond address suppression, individual airmen may opt out of the releasable file entirely. The legal scaffolding traces to the Pilot Records Improvement Act (PRIA) and the surrounding framework governing how airman records may be released; an airman who exercises the opt-out is withheld from the public download altogether, even though the FAA still holds the full internal record. This is the most important coverage fact about the dataset: the releasable file is not a complete census of certificated airmen, but a census of certificated airmen who have not opted out. The opted-out airmen are invisible in the public file, so every count derived from it is a lower bound on the true certificated population, and the size of that gap is not directly observable from the file alone.
What you can do with it
Even thinned for privacy, the releasable file is one of the richest open workforce datasets the federal government publishes, because it enumerates an entire set of licensed professions at the individual level with geography and qualification attached.
Pilot supply by state. Because every airman carries a state of residence, the file supports a direct map of where pilots live, and normalizing the counts by population yields certificated pilots per 100,000 residents — a measure that reveals the enormous variation between aviation-dense states (Alaska, where flying is basic transportation, stands far above the rest) and the national norm. The same per-capita lens applied to specific grades shows where the professional pilot population concentrates, which bears directly on regional pilot-shortage debates.
The ATP pipeline over time. Tabulating certificate levels yields the share of pilots at each rung of the ladder, and tracking the count of ATPs and commercial pilots against students and privates over successive releases is a leading indicator of the airline pipeline. When airlines warn of a captain shortage, the structure of that warning is visible in the ratio of ATPs to the feeder grades beneath them.
Instructor density. Flight instructors are the bottleneck of pilot production — new pilots cannot be trained faster than instructors can train them — so the geographic density of certificated flight instructors, and its trend, is a structural constraint on how quickly the pilot population can grow. The file lets you count instructors per capita or per training-active region directly.
The mechanic and maintenance workforce. The A&P and repairman populations are a census of the people legally allowed to maintain aircraft, and the aviation-maintenance labor shortage is as real as the pilot shortage. State-level counts of certificated mechanics, and their age-driven attrition, inform workforce planning for airlines, repair stations, and the technical schools that feed them.
Joining to safety and enforcement data. The unique FAA identifier and the airman's name make the file a join key into other federal aviation datasets. Linked to the NTSB aviation accident database, the certificate and rating fields let researchers ask whether accident pilots were appropriately rated and current for the flight they were attempting — whether, for instance, an accident in instrument conditions involved a pilot without an instrument rating. Linked to FAA enforcement records, the file supports analysis of certificate actions across the airman population. In both cases the airmen file supplies the qualification context that turns a bare accident or enforcement record into an answerable safety question.
A worked Python example
The workflow below downloads the FAA Releasable Airmen archive from the registry, unpacks the fixed-schema CSV files, joins the per-certificate file to the per-airman basic file on the unique identifier, and then computes two headline figures: certificated pilots per 100,000 population by state, and the ATP share of all certificated pilots. It reads every column as a string first, because the releasable files use blank fields and short codes throughout, and it normalizes the identifier column name because the exact header varies slightly between releases.
import requests, zipfile, io, csv
import pandas as pd
from collections import Counter
# ---------------------------------------------------------------------------
# FAA Airmen Certification Database -- the Releasable Airmen Download
#
# Source: https://registry.faa.gov/database/ (Releasable Airmen Download)
# A periodically refreshed ZIP archive. It contains several fixed-schema
# CSV files keyed to a single unique FAA-assigned airman ID:
#
# PILOT_BASIC.csv one row per airman: ID, name, city/state/country,
# medical class, medical date/expiry, basic-med flag
# PILOT_CERT.csv one or more rows per airman: certificate type,
# level, expiry, and that certificate's ratings/types
# NONPILOT_BASIC.csv mechanics, repairmen, dispatchers, riggers, etc.
# NONPILOT_CERT.csv the non-pilot certificate/rating detail
#
# Street addresses were suppressed from the public file in 2008, and airmen
# may opt out of the releasable file entirely under the Pilot Records
# Improvement Act (PRIA). Both facts reduce coverage -- see the caveats.
# ---------------------------------------------------------------------------
ZIP_URL = "https://registry.faa.gov/database/CS122024.zip" # example release
print("Downloading FAA Releasable Airmen archive...")
resp = requests.get(ZIP_URL, timeout=300)
resp.raise_for_status()
zf = zipfile.ZipFile(io.BytesIO(resp.content))
print("Files in archive:", zf.namelist())
def load(name):
# The releasable files are comma-delimited with a header row. Read every
# column as a string first; many fields are codes or are left blank.
with zf.open(name) as fh:
text = io.TextIOWrapper(fh, encoding="latin-1")
return pd.DataFrame(list(csv.DictReader(text)))
basic = load("PILOT_BASIC.csv")
cert = load("PILOT_CERT.csv")
basic.columns = [c.strip().upper() for c in basic.columns]
cert.columns = [c.strip().upper() for c in cert.columns]
# A single unique FAA ID joins the two files. Column naming varies slightly
# between releases (UNIQUE ID / UNIQUE-ID); normalise it.
def id_col(df):
for cand in ("UNIQUE ID", "UNIQUE-ID", "UNIQUE_ID", "ID"):
if cand in df.columns:
return cand
raise KeyError("no airman id column found: " + str(list(df.columns)))
bid, cid = id_col(basic), id_col(cert)
print(f"Airmen (basic rows): {len(basic):,}")
print(f"Certificate rows: {len(cert):,}")
# ---------------------------------------------------------------------------
# 1. Certificate-type mix. The TYPE column on PILOT_CERT.csv carries a
# single-letter code; the public dictionary maps the common pilot codes:
# P = pilot certificate, F = flight instructor, E = ground instructor,
# A = (airframe & powerplant) mechanic, etc. The certificate LEVEL
# column carries the pilot grade for pilot certificates:
# ATP = airline transport pilot, COM = commercial, PVT = private,
# STU = student, SPT = sport, REC = recreational.
# ---------------------------------------------------------------------------
lvl = "LEVEL" if "LEVEL" in cert.columns else "CERT LEVEL"
pilot_levels = cert[cert.get("TYPE", "P").eq("P")] if "TYPE" in cert.columns else cert
level_mix = Counter(pilot_levels[lvl].fillna("?"))
print("\nPilot certificate level mix:")
for code, n in level_mix.most_common():
print(f" {code:<5} {n:>8,}")
# ---------------------------------------------------------------------------
# 2. Certificated pilots per 100,000 population, by state.
# Restrict to pilot certificates, collapse to one highest-grade row per
# airman, count by state of residence, and divide by state population.
# ---------------------------------------------------------------------------
GRADE_RANK = {"ATP": 5, "COM": 4, "PVT": 3, "REC": 2, "SPT": 1, "STU": 0}
p = pilot_levels[[cid, lvl]].copy()
p["RANK"] = p[lvl].map(GRADE_RANK).fillna(-1)
# one row per airman = their highest pilot grade
top = p.sort_values("RANK").drop_duplicates(cid, keep="last")
top = top.merge(basic[[bid, "STATE"]], left_on=cid, right_on=bid, how="left")
by_state = top[top["STATE"].str.len().eq(2)].groupby("STATE").size()
# 2024 Census state population estimates (abbreviated sample; load the full
# table from the Census API for a complete run).
POP = {
"CA": 39_431_263, "TX": 31_290_831, "FL": 23_372_215,
"NY": 19_867_248, "AK": 740_133, "WY": 587_618,
"ND": 796_568, "CO": 5_957_493, "WA": 7_958_180,
}
rows = []
for st, n in by_state.items():
if st in POP:
rows.append((st, n, 100_000 * n / POP[st]))
per100k = pd.DataFrame(rows, columns=["STATE", "PILOTS", "PER_100K"])
per100k = per100k.sort_values("PER_100K", ascending=False)
print("\nCertificated pilots per 100k population (sample states):")
for _, r in per100k.iterrows():
print(f" {r.STATE} {int(r.PILOTS):>7,} {r.PER_100K:6.1f} / 100k")
# ---------------------------------------------------------------------------
# 3. ATP share -- the fraction of all certificated pilots who hold the
# top professional grade. A proxy for the depth of the airline pipeline.
# ---------------------------------------------------------------------------
total_pilots = len(top)
atp = int((top[lvl] == "ATP").sum())
print(f"\nCertificated pilots (one grade each): {total_pilots:,}")
print(f"Airline Transport Pilots: {atp:,}")
print(f"ATP share: {100 * atp / total_pilots:.1f}%")
A few implementation notes. The archive name encodes the release date, so the URL must point at a current release rather than a fixed file; the registry download page lists the latest. The join between the certificate file and the basic file is one-to-many — many certificate rows per airman — so the example first collapses each airman to a single highest-grade row using a grade-rank map before counting, which is what prevents a pilot who also holds a flight-instructor certificate from being counted twice in the per-state totals. The certificate-type and certificate-level codes are short strings drawn from the FAA's published data dictionary, and because their exact spelling and the column names drift between releases, it is worth printing the column list and the level distribution once before relying on specific code values. The population denominators here are a small illustrative sample; a full run loads the complete state population table from the Census Bureau API and divides through it.
Caveats and limitations
The releasable file is invaluable, but it carries several structural limitations that any serious analysis must keep in front of it.
Opt-outs reduce coverage. As stressed above, airmen who exercise the PRIA opt-out are absent from the public file entirely. The dataset is therefore a census of non-opted-out airmen, not of all certificated airmen, and every count drawn from it understates the true population by an amount that the file itself does not reveal. Comparisons across states or over time implicitly assume opt-out rates are stable and uniform, which is an assumption, not a fact.
There are no flight hours. The file records what certificates and ratings an airman holds, not how much they fly. A pilot with thousands of hours and a pilot who passed a checkride years ago and has barely flown since appear identical if they hold the same certificate. The database measures qualification, not activity, and it cannot distinguish an active professional from a long-dormant certificate holder.
Currency is not reflected. Holding a certificate is not the same as being legally current to exercise its privileges. Currency — recent flight reviews, recent takeoffs and landings, recent instrument approaches — is tracked in a pilot's personal logbook, not in any federal database, and it is not in this file. An airman can appear in good standing here while being legally unable to carry passengers for lack of a current flight review. The medical-expiration field is the only currency-like signal in the file, and even it is confounded by BasicMed, under which an expired FAA medical does not mean the pilot cannot fly.
Address suppression bounds the geography. Because street addresses were removed in 2008, the finest geography the public file supports is city and state. The dataset cannot place an airman at an address, cannot be geocoded below the city level, and cannot support household- or neighborhood-level analysis. State and metro aggregates are the appropriate unit of analysis, and the residence on file is the airman's mailing city/state, which may differ from where they actually fly or work.
Related writing
NTSB Aviation Accident Database: the federal record behind every US aircraft accident investigation — the natural join partner for the airmen file, where the certificate and rating fields here supply the pilot-qualification context that turns a raw accident record into an answerable safety question.
FMCSA Motor Carrier Census: the federal database behind 2 million registered trucking companies — the surface-transportation analogue, another DOT credentialing-and-registration registry keyed to unique identifiers that doubles as a workforce and market-structure census.
NHTSA vehicle safety complaints: the database behind auto defect investigations and recalls — a third pillar of federal transportation-safety data, showing how the same model of public records and mandatory reporting operates for passenger vehicles rather than aircraft and airmen.