Technical writing

FDA Tobacco Product Problem Reports: The Federal Record of Vaping and Tobacco Hazards

· 11 min read· AI Analytics
FDATobaccoVapingProduct SafetyFederal Data

A lithium-ion battery overheats in a vaper's pocket and the device vents flame; a tin of moist snuff arrives webbed with mold; an e-liquid bottle leaks, or triggers a reaction no label warned of; a cigarette is found studded with a foreign object. When a consumer, a clinician, or a manufacturer decides someone in Washington should know, the complaint can land in one federal file—the FDA's Tobacco Product Problem Reports. It is a modest dataset, roughly 1,300 reports, but it is the tobacco-specific channel of the country's product-safety surveillance, and it is where the wave of exploding vape batteries first showed up in the official record.

This article covers what the problem-reports dataset is and the grain of a single report; the statute that gave the FDA authority over tobacco at all—the 2009 Tobacco Control Act—and the Center for Tobacco Products it created; the 2016 deeming rule that pulled e-cigarettes, cigars, hookah, and more under that authority and made this surveillance necessary; the logic of post-market, passive surveillance and why it differs from pre-market review; the taxonomy of products and problems—from contamination and foreign objects to device malfunctions and unexpected health reactions; the vaping battery-and-e-liquid wave that dominates the device side of the file; how the dataset sits alongside the FDA's drug and device adverse-event systems as a companion channel; a Python workflow that pulls reports from the openFDA tobacco endpoint and tallies them by product type and problem while tracking e-cigarette device problems over time; and the caveats—underreporting, the absence of causation, and self-reported, uneven-quality narratives—that every analyst must internalize before drawing conclusions.

What the dataset is

The Tobacco Product Problem Reports are the FDA's record of complaints about tobacco and nicotine products. A problem report is a structured account—submitted by a member of the public, a health professional, or a manufacturer—that something went wrong with a tobacco product: that it was contaminated or moldy, that it contained a foreign object, that a device malfunctioned, that it caught fire or overheated or exploded, or that using it produced an unexpected health reaction. Each report identifies the product as best the reporter can, describes the problem, and notes whether anyone was harmed. The reports are collected through the FDA's Safety Reporting Portal and curated by the Center for Tobacco Products (CTP), and they are published openly through the openFDA tobacco/problem endpoint, where the file runs to roughly 1,300 reports.

In our database this record is stored as the table fda_tobacco_problems, with the grain of one row per report: a single complaint about a single product event is a single row, keyed by a report identifier. The columns capture when the complaint was filed, what kind of product was involved, what the problem was, and any harm:

report_id                   -- unique identifier for the problem report
date_submitted              -- date the report was submitted to the FDA
tobacco_products            -- product category label(s): cigarette, the
                               long "Electronic cigarette ... vaping product"
                               label, cigar, smokeless, hookah, and more
reported_product_problems   -- device/quality problems: contamination, mold,
                               foreign object, leak, overheating, fire, explosion
reported_health_problems    -- unexpected reactions: burns, respiratory, nausea,
                               mouth/throat irritation, and other harms
number_tobacco_products     -- count of distinct product categories on the report
number_health_problems      -- count of distinct health problems on the report
number_product_problems     -- count of distinct product problems on the report
nonuser_affected            -- whether a bystander, not the user, was harmed

The report_id is the primary key, the stable handle for a single complaint. The tobacco_products field is the first analytic axis—it records the kind of product the report is about, and because the universe of regulated products is broad it can take many values, from conventional cigarettes and smokeless tobacco to the e-cigarettes, cigars, and hookah brought in by the deeming rule. The two problem fields are the substantive payload, and the dataset draws a deliberate distinction between them: reported_product_problemsdescribe something wrong with the article itself—contamination, mold, a foreign object, a leaking cartridge, a battery that overheats, a device that catches fire— while reported_health_problems describe what the product did to a person, the unexpected reactions and injuries. A single report can carry both, and the number_* counts let an analyst weight a report by how many distinct products or problems it raised. The nonuser_affected flag matters more than its size suggests: a vape battery that explodes can injure a bystander, and the data lets that harm be isolated from harm to the user.

What it is and the Tobacco Control Act

For most of the FDA's history it had no authority over tobacco at all. A cigarette—the single most lethal consumer product in the country—was, in regulatory terms, outside the agency's reach, a striking gap given that the FDA polices the safety of drugs, devices, food, and cosmetics. That changed with the Family Smoking Prevention and Tobacco Control Act of 2009, which for the first time gave the FDA explicit statutory authority to regulate the manufacture, marketing, and distribution of tobacco products to protect public health. The Act is the legal foundation for everything described here; without it there would be no federal tobacco product-safety channel to populate.

The 2009 Act did several consequential things at once. It created the Center for Tobacco Products within the FDA, a dedicated center funded not by general appropriations but by user fees levied on tobacco manufacturers and importers. It gave the agency power to set product standards, to require disclosure of ingredients and constituents, to restrict marketing—particularly marketing that reaches young people—and to require larger, more prominent health warnings. It established a pre-market review pathway so that, going forward, new tobacco products would have to clear the FDA before reaching the market. And, importantly for this dataset, it built the legal scaffolding for post-market oversight: the authority to keep watching a product after it is for sale, to gather reports of problems, and to act on what those reports reveal. The problem-reports file is one expression of that post-market authority—the channel through which the harms of products already on shelves flow back to the regulator.

The deeming rule and the rise of e-cigarettes

The 2009 Act applied immediately to cigarettes, smokeless tobacco, and roll-your-own tobacco, but it gave the FDA the power to extend—to “deem”—its authority to other categories of tobacco products. For years that authority sat unexercised while a new product category exploded: the e-cigarette. Electronic nicotine delivery systems, sold in forms ranging from cigarette-shaped disposables to refillable tank devices to high-powered mods to the pod systems that came to dominate, grew from a niche import into a mass-market phenomenon, almost entirely outside federal oversight. There was no pre-market review of these devices, no ingredient disclosure, and—most relevant here—no formal channel for collecting reports when one overheated, leaked, or harmed a user.

The 2016 deeming rule closed that gap. The FDA exercised its deeming authority to extend the Tobacco Control Act to e-cigarettes and vaping products, cigars, hookah (waterpipe) tobacco, pipe tobacco, and other previously unregulated categories, bringing them under the same regime of registration, ingredient reporting, marketing restrictions, and pre-market review that already governed cigarettes. The deeming rule is the proximate reason this surveillance system matters the way it does. Once vaping devices were tobacco products in the eyes of the law, the FDA's post-market machinery turned toward them, and the problem-reports file became the place where the distinctive hazards of an electronic, battery-powered nicotine product—hazards a combustible cigarette simply does not have—could be recorded. The timing also coincided with rising public alarm about youth vaping and, later, with the 2019 outbreak of severe lung injury associated with vaping (often abbreviated EVALI), which sharpened attention on every channel that might surface a vaping harm.

Post-market, passive surveillance: what the system is for

To read this dataset correctly, it helps to understand what kind of surveillance it is. Regulatory oversight of a product has two broad phases. Pre-market review happens before a product is sold—the FDA evaluates an application and decides whether the product may go to market. Post-market surveillance happens after, and it watches the product in real-world use, because some problems only emerge at scale, over time, in the hands of millions of ordinary users in conditions no pre-market study could fully anticipate. The problem reports are a post-market instrument: they catch what surfaces once the product is out in the world.

More specifically, this is passive surveillance. The FDA does not go out and sample products at random or run a representative survey of users; it waits for reports to come in. A consumer, a clinician, or a manufacturer has to decide that a problem is worth reporting and then take the trouble to submit it through the Safety Reporting Portal. This design is the system's great strength and its great limitation in a single stroke. The strength is breadth and speed: anyone, anywhere, can flag a hazard the moment they encounter it, and an emerging pattern—a particular device model that keeps overheating—can surface in the data before any formal study would have detected it. The limitation, developed at length in the caveats, is that passive systems capture only the reports people choose to make, which is a small and non-random fraction of all the problems that actually occur. The same logic governs the FDA's drug and device adverse-event systems; the tobacco problem reports are the tobacco-specific member of that family, built on the same passive-reporting premise.

The taxonomy of products and problems

The analytic value of the file comes from the two axes it crosses: what kind of product, and what kind of problem. On the product axis, the reports span the full deemed universe—conventional cigarettes, smokeless tobacco (chewing tobacco, snuff, snus), cigars and little cigars, hookah and waterpipe tobacco, pipe and roll-your-own tobacco, and the broad and fast-growing category of electronic cigarettes and vaping products, including the e-liquids that go in them. The mix is not uniform across these categories, and a central question the data answers is how that mix has shifted: combustible products generate one profile of complaints, while electronic devices generate an entirely different one rooted in their batteries and electronics.

On the problem axis the dataset distinguishes product problems from health problems. The product problems are defects in the article itself. For combustible and smokeless products these cluster around contamination—mold (the classic complaint about moist smokeless tobacco stored too long or in too much humidity), off odors, and foreign objects found in the product, from insects to bits of plastic or metal. For electronic products the product problems are mechanical and electrical: leaking cartridges and pods, devices that fail to work, and—the category that defines the vaping era of this dataset— batteries that overheat, catch fire, or explode. The health problems are the harms to people: burns and injuries from a device fire, respiratory symptoms, nausea and vomiting, mouth and throat irritation, headaches, and the broad set of unexpected reactions a reporter attributes to using the product. Because a report can carry both kinds at once—a battery that exploded and the burn it caused—the two fields together let an analyst trace a hazard from the defect to the harm, which is exactly the chain a regulator wants to see.

The vaping battery and e-liquid wave

If one phenomenon explains why this dataset exists in its current form, it is the wave of problems specific to electronic nicotine devices, and within that, the battery hazard. A vape is, fundamentally, a high-energy lithium-ion battery driving a heating element. Lithium-ion cells store a great deal of energy in a small space, and when they fail—through a manufacturing defect, physical damage, a short circuit, overcharging, or the use of an incompatible or counterfeit charger—they can undergo thermal runaway: a self-reinforcing overheating that ends in venting flame or a violent explosion. In a phone or a laptop this is rare and dangerous enough; in a device held in the mouth or carried in a trouser pocket it is uniquely menacing. The problem reports captured a stream of these events—devices that ignited while charging, while in a pocket, or while in use—with burn injuries to the face, hands, and legs, and occasionally injuries to bystanders.

Alongside the battery events sit the e-liquidproblems—leaking or defective cartridges and pods, and the toxic-exposure concern that liquid nicotine raises, particularly the risk of nicotine poisoning when concentrated e-liquid is swallowed, especially by a young child who mistakes a brightly colored bottle for something to drink. As e-cigarettes grew from a curiosity into a mass product, their share of the problem-reports file grew with them, and the device-malfunction problems they brought—problems with essentially no analogue in the combustible-tobacco world—reshaped what the dataset is mostly about. Tracking that shift over time, which the Python workflow below does explicitly, is one of the most informative things the data supports: it shows a surveillance system reorienting in real time around a new and different class of hazard. The reports do not, on their own, prove how often these events occur or which models are most dangerous—passive data cannot do that—but they were among the earliest signals that a battery-powered nicotine product carried a fire and explosion risk that needed a regulatory answer.

How it fits with the FDA's other safety systems

The tobacco problem reports do not stand alone; they are the tobacco-specific channel in a larger constellation of FDA post-market safety systems, and they are best understood by contrast with their siblings. The closest analogues are the agency's adverse-event systems for the products it has regulated far longer. The FAERS system collects adverse-event and medication-error reports for drugs and therapeutic biologics; the device adverse-event and recall systems do the same for medical devices. Each is a passive, post-market reporting database keyed to a product, each surfaces signals that warrant further investigation, and each shares the same structural caveats this dataset carries. The tobacco problem reports extend that same model to a class of products the FDA could not touch before 2009 and did not fully reach until 2016.

The boundary between them is the product, and it matters for analysis. A nicotine replacement therapy—a nicotine patch or gum approved as a smoking-cessation drug—is regulated as a drug and its adverse events flow to FAERS, not here. A tobacco product—an e-cigarette sold for recreational use, a cigarette, a cigar— flows to this tobacco channel. The line can be subtle, and a complete picture of nicotine-related harm sometimes requires looking at more than one system. There is also a link out to the broader injury-surveillance world: severe vaping device burns and explosions also appear in emergency-department injury surveillance and in consumer-product hazard reporting, so the tobacco problem reports are one node in a wider network of channels that, taken together, describe the real-world safety of these products more completely than any one of them does alone.

Analytical uses

A national, product-typed, problem-coded record of tobacco and vaping complaints supports a distinctive set of analyses, provided the passive-surveillance caveats are respected throughout.

Product-mix and problem-mix profiling is the most immediate use. Tallying reports by product type and by problem reveals which categories and which hazards dominate the file—whether the smokeless complaints are mostly mold and foreign objects, whether the e-cigarette complaints are mostly battery and device malfunctions, and how the two compare. Trend analysis over time exploits the submission date to track how the mix has shifted, the clearest example being the rise of electronic-device problems as vaping grew—a story the raw counts tell vividly even though they cannot be read as incidence rates.

Hazard-signal detection is the regulatory purpose: scanning the problem narratives and codes for clusters—a recurring device behavior, a repeated kind of injury, a pattern of bystander harm—that warrant a closer look, following the chain from a product defect to the health problem it caused. Finally, the device-burn and explosion subset is its own study: isolating the battery, fire, and explosion reports and characterizing the injuries, the circumstances (charging, in-pocket, in-use), and the share that harmed someone other than the user gives a qualitative picture of a hazard that conventional tobacco never posed—the kind of evidence that informs design standards and consumer warnings even when the underlying counts are too sparse and too biased to support a rate.

Python workflow: problem reports from the openFDA tobacco endpoint

The script below pulls tobacco product problem reports from the openFDA tobacco/problem.json endpoint and computes the core views: a server-side tally of reports by product type and by reported problem, and a focused pull of the e-cigarette and vape reports that it tracks over time and screens for the battery-fire-explosion subset. No API key is required for light use, and a free key raises the limits. The dataset is small—on the order of 1,300 reports—so a single sequence of pages drains it without bumping the skip ceiling, and the count parameter sizes each axis cheaply before any rows are downloaded. Because these are free-text fields, the count calls use openFDA's .exact keyword suffix; without it the API rejects the aggregation. And because the openFDA tobacco schema evolves and field names vary, any production use should be validated against the current openFDA tobacco field reference.

import requests, pandas as pd
from collections import Counter

# openFDA Tobacco Problem Reports API
# Endpoint: https://api.fda.gov/tobacco/problem.json
# No API key required for <= 240 requests/min and 1,000/day.
# Register at https://open.fda.gov/apis/authentication/ for higher
# limits. This dataset comes from the FDA Safety Reporting Portal
# and the Center for Tobacco Products post-market surveillance feed.
#
# This script:
#   1. Tallies reports by tobacco product type (server-side count)
#   2. Tallies reports by reported problem (server-side count)
#   3. Pulls e-cigarette / vape reports and tracks them over time
BASE = "https://api.fda.gov/tobacco/problem.json"


def count_field(field, search=None):
    # The openFDA count parameter returns aggregate term frequencies
    # without transferring individual records -- the cheap way to
    # size the data before downloading rows. Use the .exact keyword
    # suffix: bare text fields are not aggregatable and return a 500.
    params = {"count": field + ".exact"}
    if search:
        params["search"] = search
    r = requests.get(BASE, params=params, timeout=60)
    r.raise_for_status()
    rows = r.json().get("results", [])
    return {row["term"]: row["count"] for row in rows}


def fetch_records(search=None, page_size=1000):
    # Page through reports for an optional search expression.
    # openFDA caps skip at 25,000; this dataset is far smaller,
    # so a single sequence of pages drains it.
    out, skip = [], 0
    while True:
        params = {"limit": page_size, "skip": skip}
        if search:
            params["search"] = search
        r = requests.get(BASE, params=params, timeout=60)
        if r.status_code == 404:           # openFDA returns 404 on empty pages
            break
        r.raise_for_status()
        batch = r.json().get("results", [])
        if not batch:
            break
        out.extend(batch)
        if len(batch) < page_size:
            break
        skip += page_size
    return out


# --- 1 & 2: what products and problems dominate the file? ----------
# tobacco_products holds full category labels (e.g. the long
# "Electronic cigarette, electronic nicotine or vaping product (...)"
# string), so the terms below are descriptive, not short codes.
by_product = count_field("tobacco_products")
print("Reports by tobacco product type")
print("-" * 44)
for term, n in Counter(by_product).most_common(12):
    print(f"{term[:30]:<32} {n:>6,}")

by_problem = count_field("reported_health_problems")
by_problem.update(count_field("reported_product_problems"))
print("\nReports by reported problem (top 15)")
print("-" * 44)
for term, n in Counter(by_problem).most_common(15):
    print(f"{term[:30]:<32} {n:>6,}")

# --- 3: track e-cigarette / vape device problems over time ---------
vape = fetch_records(search='tobacco_products:"Electronic cigarette"')
df = pd.DataFrame(vape)
print(f"\nE-cigarette / vape reports pulled: {len(df):,}")

if not df.empty and "date_submitted" in df:
    df["date_submitted"] = pd.to_datetime(df["date_submitted"], errors="coerce")
    by_year = df["date_submitted"].dt.year.value_counts().sort_index()
    print("\nE-cigarette reports by year submitted")
    print("-" * 30)
    for yr, n in by_year.items():
        share = n / max(len(df), 1)
        print(f"{int(yr)}: {n:>5,}  ({share:.1%})")

    # Flag the battery / fire / explosion subset -- the hazard that
    # drove the vaping wave into the surveillance record.
    blob = df.astype(str).agg(" ".join, axis=1).str.lower()
    fire = blob.str.contains("battery|overheat|explo|fire|burn", regex=True)
    print(f"\nReports mentioning battery/fire/explosion: {int(fire.sum()):,} "
          f"({fire.mean():.1%} of vape reports)")

Two practical notes apply. First, the battery-fire-explosion screen in the script is a deliberately coarse keyword pass over the concatenated report text—it is meant to size the hazard subset quickly, not to adjudicate it. A rigorous analysis should work from the structured product-problem codes where they exist, read the narratives for the ambiguous cases, and never report the keyword share as if it were a precise count, because free-text screening both misses events described in other words and catches incidental mentions. Second, every number this script produces is a count of reports, not a count of events in the world. Because the data is passive and self-reported, a rising line means more reports were submitted, which is a tangle of real incidence, growing product use, and shifting public awareness—not incidence on its own. The counts are signals to investigate, not measurements to publish; treat them accordingly and pair any trend with the denominators (product sales, user population) that this file does not contain.

Limitations and analytical caveats

The tobacco problem reports are a valuable early-warning channel, but they carry the structural limitations of every passive-surveillance system, and an analyst must internalize them before drawing any conclusion from the file.

Underreporting is pervasive and uneven. Only a small fraction of real product problems are ever reported, because reporting depends on someone noticing a problem, knowing the channel exists, and choosing to submit. A dramatic event—a device that exploded and sent someone to the emergency room—is far more likely to be reported than a minor one, and media attention, regulatory campaigns, and high-profile incidents can all cause reporting to spike independently of any change in actual incidence. The roughly 1,300 reports in the file are therefore not a census of tobacco product problems; they are the visible tip of a much larger and unmeasured iceberg, biased toward the severe and the salient.

A report does not establish causation. Each report is one person's account that a product was associated with a problem; it is not a finding that the product caused the problem. A reporter may misattribute a symptom, mistake one product for another, or describe an event whose true cause lies elsewhere. The data surfaces associations and possible signals; confirming that a particular product causes a particular harm requires investigation that this file cannot supply on its own. Counting reports and treating the total as a measure of how dangerous a product is conflates a reporting artifact with a causal fact.

The reports are self-reported and of uneven quality.Because anyone can submit, the product identification can be vague (“a vape pen,” with no brand or model), the problem description can be thin or, conversely, a long narrative that resists coding, and the same event can in principle be reported more than once. The structured fields impose order on a messy stream of human accounts, and the coding necessarily compresses detail; the full texture of an event lives in the narrative, not the codes. Cross-product comparisons are especially fragile, since a fast-growing, heavily publicized category like vaping will draw disproportionate reporting attention relative to a long-established one.

Held with these caveats in mind, the fda_tobacco_problems table is a uniquely useful resource: the tobacco-specific channel of federal product-safety surveillance, the place where the hazards of products the FDA could not even touch before 2009—and did not fully reach until the 2016 deeming rule—flow back to the regulator, and the file in which a battery-powered nicotine device's capacity to overheat, ignite, and explode first became part of the public record. It does not measure how often these harms occur; it shows that they do, and which products and problems the people closest to them thought worth reporting.

Related writing

FDA FAERS: The Federal Adverse Event Reporting Database Behind Drug Safety Surveillance — FAERS is the drug-and-biologic sibling of the tobacco problem reports, built on the same passive, post-market reporting premise and the same caveats, and it is where nicotine products regulated as cessation drugs send their adverse events instead of to the tobacco channel.

FDA Medical Device Recalls: The CDRH Recall Database Explained — The device-safety machinery shows what happens after a hazard signal hardens into an actionable defect, and the lithium-battery overheating that dominates the vaping side of the tobacco file is the same class of failure that drives recalls across electronic medical devices.

FDA Device Establishment Registration: The Federal Map of the Medical-Device Supply Chain — Registration data maps the manufacturers and importers behind regulated products, the same kind of supply-chain mapping that turns a vague tobacco problem report into a traceable device model and the firm responsible for it.