Technical writing
NOAA Climate Data: The National Centers for Environmental Information Behind 130 Years of Temperature Records and Climate Normals
Behind every headline about the warmest year on record, every utility load forecast built on 30-year temperature averages, and every insurance actuarial table tied to hurricane recurrence intervals sits the same federal institution: the National Centers for Environmental Information, NOAA's archive for atmospheric, coastal, geophysical, and oceanic data. With more than 150 petabytes under management and more than 25 billion online data requests served each year, NCEI is one of the largest civil science data operations in the world—and most of its holdings are freely accessible to anyone with an internet connection.
What NCEI is and how it was formed
The National Centers for Environmental Information came into existence on 1 July 2015, when NOAA consolidated three previously separate archive programs into a single organization. The predecessor agencies were the National Climatic Data Center (NCDC), headquartered in Asheville, North Carolina; the National Geophysical Data Center (NGDC), based in Boulder, Colorado; and the National Oceanographic Data Center (NODC), operated out of Silver Spring, Maryland. Each had decades of independent history. NCDC dated to 1951 and held the authoritative US surface temperature record. NGDC maintained archives of earthquake, geomagnetic, and solar data. NODC curated ocean temperature profiles, salinity measurements, and biological oceanographic data going back to the Challenger expedition of the 1870s.
The merger was administrative rather than physical: staff and server infrastructure remained distributed across the three original sites, and most data systems kept their own URLs and access protocols. The practical effect was a unified governance structure and a single brand for a collection of archives that had always had overlapping scientific missions. NCEI's headquarters is in Asheville, inheriting that designation from the old NCDC, and the Asheville facility remains the center of the atmospheric climate archive.
The 150-petabyte figure encompasses data holdings across all three legacy centers: atmospheric observations from surface stations, radiosondes, and satellites; ocean temperature and salinity profiles from floats, ships, and buoys; seismic waveforms and geomagnetic field measurements; paleoclimate proxies including ice cores, tree rings, and coral records; and derived products like gridded analyses, climate normals, and reanalysis datasets. The atmospheric surface station archive—the piece most visible to the public—is a fraction of the whole but is by far the most heavily accessed portion of the collection.
The Global Historical Climatology Network
The backbone of NCEI's surface temperature record is the Global Historical Climatology Network, commonly abbreviated GHCN. There are two principal versions: GHCN-Daily and GHCN-Monthly, which serve different analytical purposes and have different station populations.
GHCN-Daily aggregates daily observations from approximately 120,000 stations worldwide. The core variables are maximum temperature (TMAX), minimum temperature (TMIN), precipitation (PRCP), snowfall (SNOW), and snow depth (SNWD). Not every station reports every variable, and station records range from a few years to more than a century. The US portion is the densest, drawing from the National Weather Service Cooperative Observer Program, first-order airport stations, and automated surface observation systems. Many non-US records come from national meteorological services that exchange data under World Meteorological Organization agreements.
GHCN-Monthly is a smaller, more carefully curated dataset of approximately 7,280 stations with monthly averages. Some European stations in GHCN-Monthly have continuous records extending to 1763, making them among the longest instrumental climate records in the world. US stations are generally available from 1895. The monthly dataset is the primary input to large-scale surface temperature analyses because its homogenization has been more thoroughly validated than the daily version.
Quality control and homogenization are where the scientific complexity of the GHCN record lies. Raw station records contain artifacts that are not climatic: time-of-observation changes (when a thermometer is read in the morning versus the afternoon affects the recorded daily maximum and minimum), station relocations (moving a thermometer from a rooftop to a field changes its microclimate), equipment replacements (switching from liquid-in-glass to electronic sensors), and urbanization (growing city heat islands warming nearby stations). NCEI applies automated and manual procedures to detect and correct these artifacts—a process called homogenization. The adjustments are published alongside the data, allowing researchers to examine what was changed and why.
GHCN is not NCEI's private product. Berkeley Earth, an independent research organization, and NASA's Goddard Institute for Space Studies both use GHCN station data as input to their own independent global surface temperature analyses. The fact that three independent groups—NOAA, NASA GISS, and Berkeley Earth— applying different homogenization methods and station selection criteria to overlapping station networks produce very similar long-term temperature trend estimates is the strongest evidence that the warming signal is real and not an artifact of any single organization's processing choices.
US Climate Normals
Climate Normals are 30-year averages of meteorological variables computed for individual stations, updated each decade. The World Meteorological Organization established the 30-year averaging convention to provide a stable reference period long enough to smooth out interannual variability while short enough to reflect relatively contemporary conditions. The current standard is the 1991–2020 Normals, published by NCEI in May 2021. The previous standard was 1981–2010.
NCEI publishes Normals for more than 15,000 US stations. The variables covered include monthly and annual temperature (mean, maximum, minimum), precipitation, snowfall, snow depth, heating degree days, and cooling degree days. Heating degree days (HDD) count how many degrees below 65°F the daily mean temperature falls; cooling degree days (CDD) count how far above 65°F. Both metrics translate temperature into energy demand: a winter with 4,500 HDD requires roughly three times the heating energy of one with 1,500 HDD.
The aphorism “climate is what you expect; weather is what you get” captures the function of Normals precisely. They are the expected value against which any observation is measured. When a meteorologist says a city is running five degrees above normal for the month, the reference is the 30-year Normal for that station and month.
The applications of Normals extend well beyond weather reporting. Electric utilities use Normals for long-run load forecasting: a new building's annual energy budget is modeled using the HDD/CDD Normals for its location, not historical extremes. Farmers use precipitation and temperature Normals for planting decisions: the last spring frost date and first fall frost date are Normal-derived statistics that determine the length of the growing season. Property and casualty insurers reference precipitation and wind Normals in actuarial models. Building energy codes in many jurisdictions incorporate HDD/CDD Normals to set insulation and HVAC efficiency requirements calibrated to local climate.
Because Normals shift each decade as a new decade of data enters the 30-year window, the 1991–2020 Normals are warmer than the 1981–2010 Normals at almost every US station. This has practical consequences: weather that would have been described as above-normal under the old Normals may now be classified as near-normal under the new ones, effectively normalizing warming at the margins of the distribution.
The global surface temperature record
NOAA's global surface temperature product is called NOAAGlobalTemp. It combines GHCN-Monthly land surface temperatures with the Extended Reconstructed Sea Surface Temperature dataset (ERSST) to produce a globally complete monthly temperature anomaly record extending back to 1854. Anomalies are expressed relative to the 1901–2000 average.
For the contiguous United States specifically, NCEI also maintains nClimGrid-Daily and nClimDiv. nClimDiv is the workhorse dataset for US climate monitoring: it provides temperature and precipitation values for 344 climate divisions (subnational regions defined by approximate climate homogeneity) at monthly resolution back to 1895. State and national averages are computed from nClimDiv and are the figures cited in NCEI's monthly and annual State of the Climate reports.
The numbers from these datasets have become central reference points in climate science communication. 2023 was the warmest year in the global instrumental record by a substantial margin—approximately 1.45°C above the pre-industrial baseline estimated for 1850–1900. The El Niño years of 2016 and 2023 consistently rank at the top of the global anomaly list, but the underlying trend is visible even in non-El Niño years: the ten warmest years in the NOAAGlobalTemp record have all occurred since 2005. For the contiguous United States, the average annual temperature has risen approximately 3.2°F (1.8°C) since reliable records begin in 1901.
Climate extremes and the billion-dollar disaster database
NCEI maintains several products specifically designed to track the frequency and intensity of climate extremes.
The U.S. Climate Extremes Index (CEI) is a composite indicator that measures the fraction of the contiguous United States experiencing much-above-normal conditions across five components: extreme one-day precipitation, extreme monthly maximum temperature, extreme monthly minimum temperature, drought, and the absence of extreme cold (which is a warming signal rather than a warming harm). The CEI provides a single number summarizing how unusual national-scale climate conditions are in a given period.
The Climate at a Glance (CAAG) tool at ncei.noaa.gov/access/monitoring/climate-at-a-glance/ is the most widely used NCEI interface for non-specialist users. It generates time-series charts of temperature and precipitation anomalies for any US state, climate division, or the national average, for any period from 1895 to the present. A researcher asking whether a specific state has warmed significantly since 1970, or whether a recent decade was the wettest on record for a region, can answer both questions in under a minute with CAAG without writing any code.
The Billion-Dollar Weather and Climate Disastersdatabase is NCEI's most politically visible product. It tracks every US weather and climate disaster since 1980 with CPI-adjusted losses exceeding $1 billion in 2023 dollars, based on insurance industry loss data from Munich Re, Swiss Re, and the Insurance Information Institute, supplemented by FEMA and USDA estimates. The database is CPI-adjusted for each update to allow fair comparison across years.
The trend is clear. During the 1980s, the United States averaged approximately three billion-dollar events per year. The 1990s averaged roughly six. The 2010s averaged approximately fifteen. The period 2017–2023 saw an average of more than twenty events per year. In 2023 alone, NCEI recorded 28 separate billion-dollar events, with combined losses of approximately $94 billion. Single catastrophic years stand out: 2017 produced more than $300 billion in losses from Harvey, Irma, Maria, the California wildfires, and a series of tornado and flooding events—the costliest year in the database's history before adjustment. Across all event types, tropical cyclones account for the largest share of cumulative losses, followed by severe convective storms and flooding.
A related monitoring tool is the record count ratio. On any given day, NCEI tracks the number of daily high temperature records set at climate stations versus the number of daily low temperature records. In a stable climate, these should occur at roughly equal rates. A persistent ratio of two new highs for every new low is a statistically significant warming signal at the national scale, distinct from any particular warm or cold event. That 2:1 ratio has characterized US climate records in most recent decades, and the ratio has widened in the warmest years.
The NCEI CDO API
NCEI's primary programmatic access path for climate station data is the Climate Data Online (CDO) REST API, available at www.ncdc.noaa.gov/cdo-web/webservices/v2. A free token is required; registration takes under a minute at the CDO web portal and the token is issued immediately by email.
The API is organized around seven endpoints:
- /datasets — lists the available datasets. Key values:
GHCND(Global Historical Climatology Network Daily),GSOM(Global Summary of the Month),GSOY(Global Summary of the Year),NORMAL_MLY(monthly Climate Normals),NORMAL_DLY(daily Normals). - /stations — finds stations matching filters. The most important filters are
datasetid(which dataset the station appears in),locationid(FIPS state or county code), anddatatypeid(restricts to stations that actually report a given variable). - /data — the main data endpoint. Required parameters:
datasetid,stationid,startdate,enddate. Optional:datatypeid(restrict to specific variables like TMAX, TMIN, PRCP),units(standard or metric),limit(max 1,000 per request). - /datatypes, /locationcategories,/locations, /datacategories— metadata endpoints for discovering what variables, geographies, and category groupings are available.
The free tier permits 1,000 requests per day and returns up to 1,000 records per request. For long station records—a century of monthly data is 1,200 records—chunking by decade or year keeps each request within the per-request limit. For bulk downloads of the full GHCN-Daily archive (which is very large), NCEI provides FTP access at ftp.ncdc.noaa.gov/pub/data/ghcn/daily/, where each station's complete history is available as a single flat file.
The Python example below fetches monthly mean temperature for Central Park, New York (GHCND:USW00094728—one of the longest continuous urban station records in the US, beginning in 1869) from 1900 to 2023, computes the anomaly from a 1901–1960 baseline, calculates a ten-year rolling mean, and plots the result as a classic “warming stripes”-style bar chart with the rolling trend overlaid.
import requests
import pandas as pd
import matplotlib
matplotlib.use("Agg")
import matplotlib.pyplot as plt
# NCEI Climate Data Online (CDO) REST API
# Docs: https://www.ncdc.noaa.gov/cdo-web/webservices/v2
# Register for a free token at: https://www.ncdc.noaa.gov/cdo-web/token
CDO_BASE = "https://www.ncdc.noaa.gov/cdo-web/api/v2"
TOKEN = "your_token_here" # replace with your registered CDO token
HEADERS = {"token": TOKEN}
# Central Park, New York -- one of the longest continuous US records
# Station ID: GHCND:USW00094728
STATION = "GHCND:USW00094728"
START_YEAR = 1900
END_YEAR = 2023
# The CDO API returns max 1,000 results per request and 1,000 requests/day.
# For monthly data (GSOM dataset) 1900-2023 is 124 * 12 = 1,488 records.
# We chunk by decade to stay within the per-request limit.
print("Fetching NCEI monthly mean temperature for Central Park 1900-2023 ...")
all_rows = []
chunk_years = range(START_YEAR, END_YEAR + 1, 10)
for year_start in chunk_years:
year_end = min(year_start + 9, END_YEAR)
params = {
"datasetid": "GSOM", # Global Summary of the Month
"stationid": STATION,
"datatypeid": "TAVG", # average temperature (tenths of C)
"startdate": str(year_start) + "-01-01",
"enddate": str(year_end) + "-12-31",
"limit": 1000,
"units": "metric",
}
resp = requests.get(CDO_BASE + "/data", headers=HEADERS,
params=params, timeout=60)
resp.raise_for_status()
payload = resp.json()
results = payload.get("results", [])
all_rows.extend(results)
print(" " + str(year_start) + "-" + str(year_end) + ": " + str(len(results)) + " records")
df = pd.DataFrame(all_rows)
if df.empty:
print("No data returned -- check token and station ID.")
raise SystemExit(1)
df["date"] = pd.to_datetime(df["date"])
df["year"] = df["date"].dt.year
df["tavg_c"] = pd.to_numeric(df["value"], errors="coerce")
# Drop missing values
df = df.dropna(subset=["tavg_c"])
# Annual mean from monthly values
annual = df.groupby("year")["tavg_c"].mean().reset_index(name="tavg_annual_c")
# Baseline 1901-1960 (pre-warming period, matches NOAA convention)
baseline = annual[(annual["year"] >= 1901) & (annual["year"] <= 1960)]["tavg_annual_c"].mean()
print("Baseline mean 1901-1960: " + str(round(baseline, 2)) + " C")
annual["anomaly_c"] = annual["tavg_annual_c"] - baseline
# 10-year rolling mean of anomaly
annual["rolling_10yr"] = annual["anomaly_c"].rolling(window=10, center=True).mean()
# Record warm years (top 10)
top10 = annual.nlargest(10, "anomaly_c")[["year", "anomaly_c"]].sort_values("anomaly_c", ascending=False)
print("Top 10 warmest years (anomaly vs 1901-1960 baseline):")
for _, row in top10.iterrows():
sign = "+" if row["anomaly_c"] >= 0 else ""
print(" " + str(int(row["year"])) + " " + sign + str(round(row["anomaly_c"], 2)) + " C")
# Plot
fig, ax = plt.subplots(figsize=(12, 5))
ax.bar(annual["year"], annual["anomaly_c"],
color=["#cc3300" if v >= 0 else "#0b4a8f" for v in annual["anomaly_c"]],
width=0.9, alpha=0.6, label="Annual anomaly")
ax.plot(annual["year"], annual["rolling_10yr"],
color="#1a1a1a", linewidth=2, label="10-year rolling mean")
ax.axhline(0, color="#888", linewidth=0.8, linestyle="--")
ax.set_title("Central Park NYC temperature anomaly vs 1901-1960 baseline (NCEI GSOM)")
ax.set_xlabel("Year")
ax.set_ylabel("Anomaly (deg C)")
ax.legend(fontsize=9)
fig.tight_layout()
fig.savefig("central_park_temperature_anomaly.png", dpi=150)
print("Plot saved: central_park_temperature_anomaly.png")
Related NOAA datasets
The surface temperature record is the most prominent component of NCEI's holdings, but several other datasets in the same archive are scientifically and operationally significant in their own right.
HURDAT2 (the Atlantic Hurricane Database) is maintained by NOAA's National Hurricane Center and covers every known Atlantic tropical cyclone from 1851 to the present. Each storm is represented as a best-track: a sequence of six-hourly position fixes with intensity estimates (maximum sustained wind speed and minimum central pressure). The best-track record extends back to the satellite era for objective intensity estimates and before that relies on ship logs, newspaper accounts, and reconnaissance aircraft data. HURDAT2 is the primary dataset behind Atlantic hurricane climatology research, recurrence interval estimates for coastal engineering, and wind speed return period analyses for insurance underwriting.
NEXRAD Level-II and Level-III radar dataform one of the largest environmental data archives in the world by volume. NCEI archives base-scan reflectivity, velocity, and spectrum width data from all 160+ WSR-88D radar sites across the US, Puerto Rico, and US territories, beginning from the mid-1990s when the network was commissioned. Level-II is the raw base-scan data; Level-III is derived products including precipitation estimates, storm relative motion, and severe weather signatures. The archive supports retrospective analysis of past storm events, verification of nowcasting algorithms, and machine learning research into storm-mode classification and intensity estimation.
NOAA Tides and Currents(tidesandcurrents.noaa.gov) maintains a network of more than 100 long-term tide gauge stations around US coasts and territories. Many stations have continuous records extending 50 to 100 years, and a few go back further. The tide gauge record is the primary observational basis for US sea level trend estimates. The national average rate of sea level rise is approximately 3.6 mm per year, but this figure masks substantial geographic variation. Stations on the Atlantic coast, particularly in the mid-Atlantic, record higher rates—in some cases 5–6 mm per year—because land subsidence from glacial isostatic adjustment and groundwater withdrawal adds to the global ocean signal. Gulf Coast stations near river deltas show even higher relative sea level rise due to sediment compaction.
The Cooperative Observer Program (COOP) is the volunteer observer network behind much of the GHCN-Daily US record. Approximately 11,000 observers—farmers, park rangers, teachers, and weather enthusiasts—report daily temperature and precipitation to NCEI using standardized equipment and protocols. Many COOP stations have been in continuous operation for more than a century at the same location, making them irreplaceable for long-term trend detection. A station with 100 continuous years of record at a fixed location, using consistent measurement methods, is scientifically more valuable than a modern automated station installed five years ago, even if the older equipment is less precise. COOP is the human infrastructure that connects the instrumental climate record to the present day.
Limitations and analytical considerations
Station coverage is geographically uneven in ways that affect global analyses. The US, Western Europe, and Australia are densely covered; large portions of Africa, central Asia, and the high Arctic have sparse or intermittent records. Gridded global products like NOAAGlobalTemp interpolate across data-sparse regions, introducing uncertainty that is largest at high latitudes—precisely where warming has been fastest. Uncertainty bounds in the global mean temperature are dominated by these polar coverage gaps.
The CDO API is well-suited to station-level queries but is not designed for bulk extraction of gridded products or radar archives. Researchers working with nClimGrid or NEXRAD data should use NCEI's thematic portals (the Climate Data Portal for gridded products, the NEXRAD archive browser for radar) rather than the CDO API. Attempting to reconstruct national-scale gridded products by querying thousands of stations via the CDO API will exhaust the daily request limit and produce an unevenly sampled result.
The billion-dollar disaster database is CPI-adjusted but not normalized for increases in population, wealth, or insured values. A hurricane striking a coastal city with 2024 population and property values will produce larger nominal losses than the same storm striking the same location in 1960, even if the storm itself is identical. Research that seeks to detect changes in physical hazard intensity—as opposed to changes in loss magnitude—must normalize for exposure growth before interpreting the trend.
Climate Normals are updated decadally, which means they are always somewhat out of date relative to present conditions. The 1991–2020 Normals will remain the official reference until 2031, when the 2001–2030 Normals are published. In a rapidly changing climate, a 30-year average computed with its most recent data point a decade ago may not accurately represent the expected value for current or near-future conditions—a limitation particularly relevant for engineering applications with 50- or 100-year design horizons.
For NOAA's storm impact record—tornado tracks, flash flood deaths, and the property damage encoding used in the Storm Events database: 60 years of extreme weather: using NOAA Storm Events data to map tornado paths, flood losses, and climate trends →
For the EPA's complementary air quality monitoring network, AQI breakpoints, and the CDO-adjacent AQS API for pulling PM2.5 data by state: EPA Air Quality System: The Federal Monitor Network Behind NAAQS Compliance and Pollution Mapping →