CDC PLACES: The Federal Model of Health for Every US Census Tract

No national health survey is large enough to tell you how common diabetes is on your block. The samples that measure American health—a few hundred thousand phone interviews a year—are spread so thin that a single census tract of a few thousand people may contain zero respondents. CDC PLACES answers the question anyway, not by measuring but by modeling: it projects survey responses onto demographic and geographic structure to produce an estimate of roughly forty health measures for every county, place, ZIP-code tabulation area, and—at the finest grain—every census tract in the United States. The tract file runs to roughly 3.05 million tract-by-measure rows, the data behind almost all neighborhood-level health-equity work in the country.

This article covers what PLACES is and the small-area-estimation problem it exists to solve; the partnership between CDC and the Robert Wood Johnson Foundation and the project's lineage in the 500 Cities Project; the multilevel regression and poststratification (MRP) method that turns a thin national survey into a tract-by-tract map; the four families of measures—chronic-disease outcomes, health-risk behaviors, preventive-service use, and health status and disability; the four geographies PLACES publishes and why the census tract is the one that matters for equity; the single most important caveat—these are modeled estimates, not direct measurements or a case count; how the tract estimates pair with social-determinants and demographic data to drive targeting and intervention; a Python workflow that pulls one measure for a county's tracts from the CDC Socrata API, ranks them, and summarizes the spread; and the analytical caveats—model smoothing, confidence-interval width, the survey-mode and reference-year lag, and the difference between a prevalence and a count—that every analyst must internalize.

What the dataset is

PLACES—Population Level Analysis and Community Estimates—is a CDC project that publishes model-based estimates of common health measures for small geographic areas across the entire United States. The premise is simple and the consequence is large. The country's flagship health-behavior survey, the Behavioral Risk Factor Surveillance System (BRFSS), interviews on the order of four hundred thousand adults a year—an enormous survey by ordinary standards, but spread across more than seventy thousand census tracts it cannot directly measure prevalence at the neighborhood level. PLACES bridges that gap by using the survey, together with the demographic composition of each small area, to estimate what the survey would have found had it been able to interview enough people in every tract. The result is a complete, gap-free national grid of health estimates at a resolution the underlying survey could never reach on its own.

In our database the finest-grained version of this is stored as the table cdc_places_tract, with the grain of one row per census tract per measure: a single tract contributes one row for diabetes, another for current smoking, another for routine checkups, and so on across the full measure set, which is why the file runs to roughly 3.05 million rows. Each row identifies the tract, names the measure, and carries the modeled prevalence estimate, its confidence interval, and the population denominator:

countyfips / stateabbr      -- 5-digit state+county code, 2-letter state
locationid (tract FIPS)     -- 11-digit census tract identifier
locationname                -- the tract's display name
measureid                   -- short measure code (DIABETES, CSMOKING, ...)
measure                     -- the human-readable measure description
category                    -- Health Outcomes, Prevention, Health Risk
                               Behaviors, Health Status, Disability
data_value                  -- the MODEL-BASED prevalence estimate (a percent)
data_value_type             -- crude or age-adjusted prevalence
low_confidence_limit        -- lower bound of the 95% confidence interval
high_confidence_limit       -- upper bound of the 95% confidence interval
totalpopulation             -- total population of the tract (denominator)
totalpop18plus              -- adult population (denominator for adult measures)
geolocation                 -- tract centroid point (lat/long)

The data_value is the load-bearing column, and it must be read for exactly what it is: a model-based prevalence estimate, almost always expressed as a percentage of the relevant population (the adult population for adult measures). It is not a count of people and not a direct survey measurement for that tract. The low_confidence_limit and high_confidence_limit bracket the estimate with a 95% interval, and the width of that bracket is itself information: a wide interval is the model telling you it is uncertain about this particular tract. The data_value_type distinguishes crudeprevalence (the share of that tract's population, as it actually is) from age-adjusted prevalence (standardized to a reference age distribution so that two tracts with very different age structures can be compared without the comparison being driven by age alone). Choosing the wrong one is a common analytic error: crude rates answer “what is the burden here?” while age-adjusted rates answer “is health here worse than elsewhere, holding age constant?” The totalpopulation denominator is what lets an analyst convert a modeled prevalence back into an approximate number of affected people—an estimate built on an estimate, to be done with great care.

The small-area problem and the RWJF partnership

To understand why PLACES exists, you have to understand the small-area-estimation problem it solves. Public-health surveillance in the United States is built on national and state surveys—BRFSS for behaviors and chronic conditions, others for nutrition, mental health, and disability. These are powerful at the level they were designed for: a national or state prevalence of smoking or obesity, estimated from a representative sample, is statistically sound. But policy and intervention happen locally—a city health department wants to know which neighborhoods have the highest diabetes burden, a hospital wants to site a clinic where chronic disease is concentrated, a researcher wants to pair health with the social conditions of a specific tract. At that resolution the direct survey estimate collapses: there are simply not enough respondents in any one small area to produce a reliable number, and most tracts have none at all.

PLACES is the institutional answer, and it did not emerge from CDC alone. It is a collaboration between CDC and the Robert Wood Johnson Foundation (RWJF), the large philanthropic funder whose mission centers on building a national culture of health and on health equity. The partnership matters because the project is as much an equity instrument as a surveillance one: RWJF's interest is precisely in the neighborhood-level disparities that only small-area estimates can reveal, and the foundation's support helped extend the work from a handful of cities to the whole country. PLACES is the direct successor to the 500 Cities Project, an earlier CDC–RWJF effort that produced city- and tract-level estimates for the five hundred largest US cities. PLACES generalized that method—same modeling approach, same kind of output—from five hundred cities to the entire nation, so that every county, place, ZIP-code tabulation area, and census tract in the country, not just the large cities, receives estimates. The lineage is worth keeping in mind because the 500 Cities estimates and the PLACES estimates are methodologically continuous; PLACES is the national-scale evolution of an idea that was first proven on the largest cities.

How the estimates are made: MRP

The engine that turns a thin national survey into a complete tract-by-tract map is a statistical technique called multilevel regression and poststratification, universally abbreviated MRP. It is worth walking through in plain terms, because the method is the dataset: every value in cdc_places_tract is an output of this procedure, and understanding it is the difference between using the estimates well and misusing them.

MRP has two halves, and the name describes both. The multilevel regression half fits a model that predicts the health outcome—say, whether an adult has diabetes—from individual characteristics that are known to be associated with it (most importantly age and sex, along with race and ethnicity) and from the area the person lives in. “Multilevel” means the model has structure at more than one level: it learns both individual-level patterns (older people are more likely to have diabetes) and area-level patterns (some places have higher prevalence even after accounting for who lives there), and it partially pools information—borrowing strength from demographically similar areas to estimate places where the survey itself is thin. This partial pooling is the source of the method's power and, as the caveats will stress, the source of its central limitation: small areas with little or no direct survey data are pulled toward what the model expects given their demographic composition.

The poststratification half is where geography enters. From the census, PLACES knows the exact demographic composition of every census tract—how many residents fall into each age-by-sex-by-race cell. Poststratification takes the fitted model's prediction for each demographic cell and reweights those predictions by the actual count of people in each cell in a given tract, then sums them. In effect, the model says “here is the predicted diabetes rate for a person of each age, sex, and race,” and poststratification says “this tract is made up of these people in these proportions, so its overall rate is the population-weighted blend.” A tract that is older and of a demographic composition the model associates with higher diabetes will receive a higher estimate; a younger tract a lower one. The estimate for a tract is therefore driven by two things: the local survey signal where it exists, and—dominantly, where it does not—the tract's demographic makeup run through a model trained on the whole country. This is why PLACES estimates are excellent for spatial patterns and comparisons and poor for asserting an exact individual-tract count: they encode the relationship between demography and health far more than they encode any direct observation of the specific neighborhood.

The measures: outcomes, behaviors, prevention, status

PLACES publishes on the order of forty measures, grouped into a handful of categories that together sketch a community's health profile. The categories map directly to the category column in the table and are the natural way to think about what the dataset can and cannot tell you.

Chronic-disease outcomes are the headline measures and the reason most analysts come to PLACES. They include the prevalence of diagnosed diabetes, obesity, coronary heart disease, chronic obstructive pulmonary disease (COPD), high blood pressure, high cholesterol, asthma, stroke, cancer, chronic kidney disease, and depression. These are the conditions whose burden defines population health and whose neighborhood concentration drives health inequity, and the ability to map them tract by tract is exactly what was impossible before small-area estimation. Health-risk behaviors are the upstream measures—the modifiable behaviors that produce the chronic diseases: current cigarette smoking, physical inactivity (no leisure-time physical activity), binge drinking, short sleep duration, and lack of dental visits. Because these are the levers a public-health program can actually pull, the behavior measures are where targeting often begins.

Preventive-service use measures the other side of the ledger—not what makes people sick but whether they are getting the care that prevents or catches disease early: routine checkups, dental visits, cholesterol screening, blood-pressure medication adherence, and the recommended cancer screenings (cervical, breast, and colorectal), along with vaccinations such as influenza. A neighborhood with high chronic-disease burden and low preventive-service use is a different—and more urgent—problem than one with high burden but good access, and the prevention measures are what make that distinction visible. Finally, health status and disability measures capture self-reported overall and mental health (frequent physical or mental distress, fair-or-poor self-rated health) and the prevalence of disabilities—mobility, cognition, hearing, vision, independent living, and self-care. Read together, the four families let an analyst move from outcomes to their behavioral drivers to the preventive care that could interrupt them to the lived experience of health in a place—the full arc of a community health profile at the grain of a single neighborhood.

Geographies: county, place, ZCTA, and the census tract

PLACES publishes the same measures at four geographic levels, and the choice of level is one of the most consequential an analyst makes. The four are the county, the incorporated and census-designated place (cities and towns), the ZIP-code tabulation area (ZCTA), and the census tract. Each is a separate release with the same measure set; the table this article describes is the tract-level release, the finest grain and the largest file.

The census tract is the one that matters for equity work, and it is worth being precise about what it is. A census tract is a small, relatively permanent statistical subdivision of a county defined by the Census Bureau, designed to contain roughly four thousand residents and drawn to be reasonably homogeneous in population characteristics. There are more than eighty thousand of them nationwide. The tract is the standard unit of neighborhood-level analysis in the United States precisely because it is small enough to capture a coherent neighborhood and stable enough to join to the rich body of tract-level census and American Community Survey data—income, poverty, education, race and ethnicity, housing, employment. That joinability is what makes the PLACES tract file so valuable: a health estimate at the tract level can be set directly alongside the social and economic conditions of the same tract, which is the foundational move of health-equity analysis. The ZCTA level, by contrast, is convenient because so much administrative and commercial data is keyed by ZIP code, but ZCTAs are approximations of postal delivery areas, not true neighborhoods, and they do not nest cleanly inside counties or align with census geography—so for rigorous equity work the tract is almost always the better unit, with the ZCTA reserved for cases where the other data is only available by ZIP.

The central caveat: these are modeled estimates

Everything else in this article is secondary to one point, and it deserves its own section because misunderstanding it is the most common and most damaging error made with PLACES data. The values in cdc_places_tract are model-based estimates, not direct measurements and not a case count. No one walked through the tract and counted the people with diabetes. The number is what a statistical model predicts the prevalence to be, given the tract's demographic composition and the national relationship between demography and the outcome, anchored by whatever sparse survey signal exists for the area.

The practical consequences follow directly from how MRP works. Because the model partially pools—smoothing each tract toward demographically similar areas—the estimates are excellent for the questions that depend on spatial pattern and relative comparison and weak for the questions that depend on an exact individual value. PLACES will reliably tell you that the south-side tracts of a city have substantially higher diabetes prevalence than the north-side ones, that a band of neighborhoods forms a high-burden cluster, that smoking and physical inactivity co-vary across a county's tracts. It will not reliably tell you that tract 1234.56 has exactly 14.2% diabetes and the tract next to it exactly 13.8%—a difference well inside the confidence intervals and well inside the smoothing the model applied. Two specific failure modes recur. First, because the estimates are demographically driven, a tract that happens to be unusual—a demographically typical neighborhood that is, for local reasons, much healthier or much sicker than its composition predicts—will be pulled toward the model's expectation and its true exceptionality understated. Second, converting a modeled prevalence into a count of affected people by multiplying through the population denominator compounds two uncertainties and produces a figure far less certain than its precise-looking digits suggest. The disciplined use of PLACES treats every value as an estimate with the confidence interval attached, reasons in terms of patterns and comparisons rather than exact counts, and never reports a tract estimate as though it were a measured fact.

Pairing with social determinants and demographics

The reason the tract file is the centerpiece of neighborhood health-equity work is that the census tract is the universal join key between health and the social conditions that shape it. A modeled health prevalence is interesting on its own; set beside the social and economic profile of the same tract, it becomes the raw material of equity analysis. Three joins do most of the work.

The first is to census and American Community Survey data. Every PLACES tract estimate can be joined by its tract FIPS code to the ACS measures for the same tract—median household income, the poverty rate, educational attainment, racial and ethnic composition, housing cost burden, vehicle access, and employment. This join is what lets an analyst quantify the relationship that small-area estimation was built to expose: how strongly chronic-disease burden, smoking, or low preventive-service use tracks with poverty, race, and education across a city's tracts. The second is to composite social-determinants and deprivation indices—measures such as the CDC/ATSDR Social Vulnerability Index or the Area Deprivation Index, both keyed to census geography—which bundle many social conditions into a single tract-level score. Pairing a PLACES outcome with a vulnerability index turns a scatter of correlations into a clean statement: the most socially vulnerable tracts carry the highest modeled disease burden and the lowest preventive-service use, the empirical signature of health inequity.

The third use is operational rather than analytical: targeting interventions. Because the estimates blanket every tract, a health department can rank all the tracts in its jurisdiction by a chosen measure—or by a combination of high burden and high social vulnerability—and direct a screening program, a community health worker, a mobile clinic, or a tobacco-cessation campaign to the neighborhoods where the need is greatest. This is the form of analysis PLACES was explicitly designed to enable, and it is where the modeled-estimate caveat is most forgiving: prioritization depends on getting the ranking roughly right, which is exactly the spatial-pattern question the estimates answer well, rather than on the exact value in any one tract, which is the question they answer poorly.

Python workflow: one measure across a county's tracts

PLACES is published on the CDC Open Data portal at data.cdc.gov, which runs on the Socrata platform and exposes every dataset through a queryable REST API. The script below pulls a single measure—diabetes, by default—for all the census tracts in one county, using a SoQL $where filter on the county FIPS code and the measure id, then computes three things: the spread of the modeled prevalence across the county's tracts (median and range), the ten highest-burden tracts with their confidence intervals, and the distribution of confidence-interval widths as a diagnostic of where the model is least certain. No API key is required for public reads. Because Socrata re-issues the dataset resource id with each annual PLACES release, the resource id should be confirmed against the current listing on data.cdc.gov before running at scale. Requirements: requests and pandas.

import requests, pandas as pd

# CDC PLACES is published on the CDC Open Data portal (data.cdc.gov),
# which runs on Socrata. No API key is required for public reads; an
# app token raises the throttling ceiling but is optional.
#
# The census-tract release is one dataset; each row is a single
# tract-by-measure estimate. We pull one measure for all tracts in a
# county (FIPS), rank them, and summarize the spread of the modeled
# prevalence. The dataset resource id below is the tract-level file;
# confirm the current id on data.cdc.gov, since CDC re-issues the
# resource id with each annual release.
BASE = "https://data.cdc.gov/resource"
RESOURCE = "cwsq-ngmh"          # PLACES: census tract data (annual release)


def fetch_measure(county_fips, measure_id, limit=50000):
    # SoQL query: one measure, one county, data-value rows only.
    # county_fips is the 5-digit state+county code (e.g. "17031" = Cook County, IL).
    params = {
        "$where": (
            f"countyfips = '{county_fips}' "
            f"and measureid = '{measure_id}' "
            "and data_value is not null"
        ),
        "$select": ("locationname, measureid, data_value, "
                    "low_confidence_limit, high_confidence_limit, "
                    "totalpopulation, totalpop18plus"),
        "$limit": limit,
    }
    r = requests.get(f"{BASE}/{RESOURCE}.json", params=params, timeout=120)
    r.raise_for_status()
    return pd.DataFrame(r.json())


def analyze(county_fips, measure_id):
    df = fetch_measure(county_fips, measure_id)
    if df.empty:
        print(f"No estimates for {measure_id} in county {county_fips}.")
        return

    # Socrata returns everything as strings; coerce the numerics.
    for c in ("data_value", "low_confidence_limit",
              "high_confidence_limit", "totalpopulation"):
        df[c] = pd.to_numeric(df[c], errors="coerce")
    df = df.dropna(subset=["data_value"])

    # --- 1. Spread across tracts -----------------------------------------
    lo, hi = df["data_value"].min(), df["data_value"].max()
    med = df["data_value"].median()
    print(f"{measure_id} in {county_fips}: {len(df):,} tracts")
    print(f"  prevalence median {med:.1f}% (range {lo:.1f}% - {hi:.1f}%)")

    # --- 2. Highest-burden tracts ----------------------------------------
    top = df.sort_values("data_value", ascending=False).head(10)
    print("  highest-burden tracts:")
    for _, row in top.iterrows():
        ci = f"({row.low_confidence_limit:.1f}-{row.high_confidence_limit:.1f})"
        print(f"    {row.locationname[:24]:<24} {row.data_value:>5.1f}% {ci}")

    # --- 3. Confidence-interval width: are estimates well-identified? -----
    # Wide intervals flag tracts where the model is least certain.
    df["ci_width"] = df["high_confidence_limit"] - df["low_confidence_limit"]
    print(f"  median CI width: {df['ci_width'].median():.1f} pts "
          f"(max {df['ci_width'].max():.1f})")
    return df


# Example: diabetes among adults (DIABETES) across Cook County, IL tracts.
analyze("17031", "DIABETES")
# analyze("06037", "CSMOKING")   # current smoking, Los Angeles County, CA

Two notes on using this in earnest. First, the confidence-interval-width diagnostic in the script is not decoration—it is the honest way to read a small-area file. The tracts with the widest intervals are the ones where the model had the least to go on, and an analyst who ranks tracts by point estimate without ever looking at interval width will treat the model's most uncertain guesses as though they were as solid as its best-identified ones. Sorting or filtering on ci_width surfaces exactly the tracts whose ranking should be held loosely. Second, for anything beyond a single county—a state-wide or national equity analysis, or one that joins PLACES to ACS and a vulnerability index across tens of thousands of tracts—the per-county API pull becomes slow, and the better path is to download the full tract-level release as a bulk CSV from data.cdc.gov and work from it locally, joining on the eleven-digit tract FIPS code. The API is ideal for exploration and for keeping a single jurisdiction current; the bulk file is the right tool for national-scale work.

Limitations and analytical caveats

PLACES is the most complete neighborhood-level health resource the United States has, and it is precisely because it is so usable that its limits must be held firmly in mind. Several go beyond the central modeled-estimate caveat already stressed.

Model smoothing limits the discovery of local anomalies.The same partial pooling that makes the estimates stable also makes them conservative: a tract that is genuinely exceptional relative to its demographic profile is pulled toward what the model expects, so PLACES is a poor instrument for discovering a neighborhood whose health departs sharply from its composition. It will faithfully reproduce the broad demographic gradient of disease; it is less able to flag the surprising outlier. Analysts who need to identify locally anomalous neighborhoods—a tract sicker than its income and age predict, perhaps because of a specific environmental exposure—should treat PLACES as a screening layer to be confirmed with direct local data, not as the final word.

Confidence-interval width varies enormously, and a point estimate alone is misleading. Because the survey signal is unevenly distributed, some tracts are far better identified than others, and the confidence intervals reflect that. A ranking of tracts that ignores interval width will routinely place a tract whose true value could plausibly fall anywhere across a ten-point range above a tract whose value is pinned to within two—a distinction the point estimates conceal. Every serious use of the data should carry the interval alongside the estimate and should be skeptical of fine-grained rankings among tracts whose intervals overlap.

There is a reference-year lag, and the underlying survey shapes the estimate. A PLACES release is built on BRFSS data from an earlier year (or years) combined with a particular census or ACS reference population, so the estimates describe a period that trails the release date, and a single annual snapshot is not a real-time picture of current health. Moreover, every estimate inherits the properties of the survey beneath it: BRFSS measures are self-reported(diagnosed conditions and reported behaviors, not clinical measurements), so they reflect access to diagnosis as well as true prevalence—a community with poor access to care may show a lower diagnosed-diabetes rate not because it is healthier but because its disease is undiagnosed. The estimate is only ever as good as the survey question and the population that answered it.

A prevalence is not a count, and tract boundaries change.The estimates are percentages of a population; turning them into a number of affected people requires multiplying by a denominator that is itself an estimate, and the resulting figure carries the uncertainty of both. And census tract boundaries are redrawn with each decennial census, so a longitudinal comparison of the same tract across releases that span a re-tracting must account for the fact that “the same” tract FIPS code may cover a different piece of ground—another reason PLACES is built for cross-sectional spatial comparison far more than for tract-level time series.

Held with these caveats in mind, cdc_places_tract is a uniquely valuable resource: a complete, gap-free, neighborhood-resolution map of American health—roughly 3.05 million modeled tract-by-measure estimates that make it possible, for the first time at national scale, to set the burden of chronic disease beside the social conditions of the very neighborhoods that bear it, so long as every value is read for what it is—a careful statistical estimate of a pattern, not a count of people.

Related writing

CDC Nutrition, Physical Activity, and Obesity: The Federal Surveillance Record of American Health Behavior — The state-level surveillance companion to PLACES's tract-level model: where NPAO tracks the directly measured behavior trends behind obesity across states and years, PLACES projects the same family of behaviors down to the neighborhood, and the two read best together as the macro and micro views of the same chronic-disease story.

CDC Injury Mortality: The Federal Record of How Americans Die from Firearms, Overdoses, and Crashes — A counterpoint in CDC's data ecosystem built on counted deaths rather than modeled prevalence, showing how the agency pairs hard mortality records with the small-area behavioral and chronic-disease estimates that anticipate them.

SAMHSA Treatment Data: The Federal Database Behind Substance Abuse and Mental Health Program Statistics — Where PLACES models the prevalence of depression, frequent mental distress, and binge drinking at the neighborhood level, SAMHSA records the treatment system that responds to behavioral-health need, and joining estimated need to delivered care is a natural next analytic step.