Technical writing

SEC EDGAR Company Registry: The Federal Index That Resolves Every Public Company

· 10 min read· AI Analytics
SECEDGARCIKPublic CompaniesFederal Data

Every other SEC dataset — insider trades on Form 4, institutional holdings on Form 13F, fund portfolios on N-PORT, private placements on Form D, the structured financial facts behind every annual report, the 8-Ks announcing every material event — identifies the company it concerns by a single number: its Central Index Key. The SEC EDGAR company registry is the master index that turns that number into an entity. The table described here holds 28,392 companies, each row carrying the CIK, the current name, the ticker, the industry code, the state of incorporation, the fiscal year end, the exchange, the addresses, every former name the entity has used, and whether it is still filing — the lookup that makes the rest of the SEC corpus joinable.

What It Is

The EDGAR company registry is the canonical roster of every entity that has ever been an SEC filer. EDGAR — the Electronic Data Gathering, Analysis, and Retrieval system — is the SEC's filing infrastructure, and before any entity can submit a single document to it, the SEC assigns that entity a Central Index Key. The CIK is the primary key of the entire EDGAR universe. The company registry is simply the table of those keys and the descriptive attributes attached to each: who the filer is, what industry it operates in, where it is incorporated, and the window of time over which it has been active.

It is important to be precise about what an “entity” is here, because the registry is broader than the phrase “public company” suggests. A CIK is assigned to anything that files. That includes operating companies with listed stock, but it also includes registrants that have never had a publicly traded share: companies that filed a registration statement and then withdrew, special-purpose vehicles that issue asset-backed securities, foreign private issuers, investment-company trusts that umbrella dozens of mutual fund series, and the individual people who file insider-transaction forms. The 28,392 companies in this table are the corporate-entity subset of EDGAR — the issuers and operating registrants — rather than the full filer population, which also contains hundreds of thousands of natural-person reporting owners. When this article says “company,” it means an entity in that corporate subset.

The registry exists because EDGAR was built filing-first. A document is the atomic unit; the company is a layer of metadata wrapped around the documents a given CIK has submitted. That design has a consequence that shapes everything downstream: the registry is the only place the SEC consolidates per-entity identity. There is no separate “list of public companies” the agency maintains independently of who happens to be filing. The company registry, derived from the CIK assignment process and the headers of the filings themselves, is that list.

The CIK Is the Universal Join Key

The single most important fact about this dataset is that the CIK is the join key for the entire SEC. Every structured SEC dataset is keyed, directly or indirectly, on CIK, and the company registry is the dimension table that gives those facts a name, an industry, and a history. Without the registry, the rest of the SEC corpus is a collection of fact tables referencing opaque integers; with it, every one of those integers resolves to a company.

Consider what the CIK ties together across the corpus:

The practical upshot is that the company registry is the spine of any cross-filing analysis. A question like “show me every SEC disclosure touching this company — its insider trades, the funds that hold it, its private raises, its material events — on one timeline” is answerable precisely because all of those datasets share the CIK, and the registry is what lets you start from a human concept (a company, a ticker, an industry) and fan out into the fact tables. Start from the wrong key — a company name, a ticker — and the joins break in exactly the ways the next sections describe.

The Fields

Each row of the registry describes one entity. The schema clusters into identity, industry, legal domicile, and lifecycle.

-- sec_companies: 28,392 companies
-- One row per entity registered as an EDGAR filer, keyed by CIK.

cik                      INTEGER  -- Central Index Key: the permanent, unique filer ID
ticker                   TEXT     -- primary exchange ticker, where one exists (often NULL)
name                     TEXT     -- current legal/registered name of the entity
sic_code                 INTEGER  -- 4-digit Standard Industrial Classification code
sic_description          TEXT     -- human-readable SIC industry label
state_of_incorporation   TEXT     -- 2-char code: DE, NV, MD, ... or a country for foreign filers
fiscal_year_end          TEXT     -- fiscal year-end as MMDD, e.g. "1231" or "0630"
exchange                 TEXT     -- listing venue: Nasdaq, NYSE, CBOE, OTC, or NULL
business_address         TEXT     -- principal executive office address
mailing_address          TEXT     -- mailing address (often identical to business address)
former_names             TEXT     -- prior names with the dates each was in effect (JSON/array)
is_active                BOOLEAN  -- TRUE if the entity is currently filing
first_filing_date        DATE     -- date of the earliest filing on record for this CIK
last_filing_date         DATE     -- date of the most recent filing on record for this CIK

Why CIK Is Stable and Tickers Are Not

The contrast between the cik field and the ticker field is the central lesson of this dataset. A CIK is permanent and unique: once the SEC assigns it, it never changes and is never reissued to a different entity. A company can rename itself, move its incorporation, change exchanges, get acquired, or delist entirely, and its CIK stays fixed through all of it. That permanence is exactly what makes the CIK a reliable join key — every historical filing the entity ever made remains addressable by the same number.

Tickers have none of these properties. A ticker is an exchange symbol, not an SEC identifier, and it is messy in several distinct ways:

The rule that falls out of this: identify a company by its CIK whenever you intend to join, compare across time, or build anything durable. Use the ticker only as a human-facing label or as the input to a lookup — never as the key itself.

SIC Industry Codes and Their Quirks

The sic_code field is a four-digit Standard Industrial Classification code, withsic_description carrying the matching label. SIC is the industry taxonomy the SEC uses to classify filers: 7372 is “Services-Prepackaged Software,” 2834 is “Pharmaceutical Preparations,” 6022 is “State Commercial Banks,” 1311 is “Crude Petroleum & Natural Gas.” The first digit places the entity in a broad division (manufacturing, finance, services, mining, and so on), and successive digits narrow it. For the SEC's purposes the SIC code is the primary lens for grouping companies into peer sets, and it is the field that makes industry-level analysis of the whole corpus possible.

SIC carries real quirks that any analyst should hold in mind:

State of Incorporation and Delaware Dominance

The state_of_incorporation field records the legal domicile of the entity — the jurisdiction whose corporate law governs it — as a two-character code. For US entities this is a state code; for foreign private issuers it is a country code. The field is distinct from where the company operates or where its headquarters sit (those are in the address fields); incorporation is a purely legal attribute.

The dominant fact in this column is Delaware. A large majority of US public companies are incorporated in Delaware (DE) regardless of where they actually do business, a concentration with no parallel in any other corporate attribute. The reasons are well known: Delaware's Court of Chancery is a specialized business court with centuries of accumulated case law, the Delaware General Corporation Law is flexible and frequently updated, and the predictability of Delaware outcomes is itself valuable to companies and their investors. The result is that incorporation state in the registry is heavily skewed — Delaware first by a wide margin, then a long tail of Nevada, Maryland (favored by REITs and funds for its statutory provisions), and the company's actual home state.

For analysis, the incorporation field supports questions the headquarters address cannot: measuring the Delaware share of a given industry or cohort, identifying the minority of firms that deliberately incorporate elsewhere, and flagging foreign-domiciled issuers (a country code in this field is itself a marker of a foreign private issuer, with the different disclosure regime that implies).

Fiscal Year End

The fiscal_year_end field stores the entity's fiscal year-end as a four-character MMDD string. The overwhelmingly common value is 1231— a calendar-year fiscal year — but a meaningful minority of companies close their books on other dates: retailers often end in late January or early February to fall after the holiday selling season, and many technology and education companies use a June 30 (0630) or September 30 (0930) year end.

This field is small but load-bearing for any work that touches financial statements. Comparing two companies' annual results is only valid once you know whether their fiscal years line up; a June-fiscal-year company's “fiscal 2026” covers a different twelve months than a December-fiscal-year company's. The fiscal year end is also what determines when an annual report (10-K) and the related Section 16 annual forms are due, so it anchors the filing calendar for the entity. When the registry is joined to the financial-facts data, the fiscal year end is the field that lets you align periods correctly rather than naively comparing report labels.

Former Names: Tracking Renames and Mergers

The former_names field is the registry's memory. While name holds the entity's current registered name, former_names preserves every prior name the same CIK has filed under, typically with the date range each name was in effect. Because the CIK never changes, this field is how you follow an entity through a rename without losing the thread.

This is more useful than it first appears, because corporate name changes are common and carry information:

The broader point is that former_names is what makes name-based history coherent on top of an ID-based system. The CIK guarantees continuity; the former-names list is what lets a human reading the data see the continuity.

Active Versus Inactive Filers

The is_active flag, read together with first_filing_date andlast_filing_date, describes the entity's lifecycle on EDGAR. An active filer is one still submitting filings; an inactive CIK belongs to an entity that has stopped — because it was acquired, went private, was liquidated, deregistered, or simply went dark. The first and last filing dates bound the window of activity: the earliest document on record for the CIK and the most recent.

Inactive CIKs are not noise to be discarded; they are the historical record. Roughly speaking, the registry accumulates filers and rarely loses them — a delisted company's CIK and all its filings remain permanently addressable. That permanence is the reason the SEC corpus supports longitudinal analysis at all. But it does mean that the registry as a whole is a superset of currently public companies. Filtering on is_active = TRUE narrows to the live universe; leaving it unfiltered gives you the full population, including decades of entities that no longer exist. Which one is correct depends entirely on the question: a point-in-time market view wants active filers, while a study of corporate mortality or shell reuse needs the inactive ones precisely because that is where the history lives.

The Ticker-to-CIK Lookup Problem

Because the CIK is the join key but humans and market-data systems speak in tickers, the most common first step in any SEC analysis is resolving a ticker to a CIK — and it is harder than it looks, for all the reasons tickers are messy. The SEC provides two official mechanisms.

The company_tickers.json file. The SEC publishes a canonical mapping at https://www.sec.gov/files/company_tickers.json: a single JSON document associating each currently listed ticker with its CIK and company title. This is the authoritative bulk lookup table, the right tool when you need to resolve many symbols at once or build a local ticker-to-CIK index. Its key limitation follows directly from the nature of tickers: it reflects the current mapping. A delisted company's old ticker may be absent, or worse, may now point to whatever company holds that symbol today. The file answers “what CIK does this ticker mean right now,” not “what CIK did this ticker mean in 2012.”

EDGAR full-text and company search. For names rather than tickers, or for entities the ticker file omits, EDGAR's search interfaces resolve a name or partial name to candidate CIKs. Full-text search spans filing contents, and the company-search endpoint matches on entity name — including, usefully, former names — so an entity that has since renamed can still be found by what it used to be called. Name search is fuzzier than the ticker file (multiple entities can share similar names, and disambiguation falls to the analyst), but it reaches the parts of the registry the ticker map cannot, including the large NULL-ticker population.

The robust pattern in practice is layered: try the ticker file first for a clean current symbol; fall back to name search (which sees former names) when the ticker file misses; and, once a CIK is in hand, treat it as the durable identifier for everything that follows. The lookup is a one-time cost paid at the boundary between the human-readable world and the CIK-keyed world of the actual data.

What You Can Do With It

The registry's value is almost entirely as connective tissue: it is the table that makes the rest of the SEC usable. Concretely, it supports:

Python: Resolving Tickers to CIKs and Pulling Filing History

The script below demonstrates the full entity-resolution workflow against the live SEC. It downloads the canonical company_tickers.json to build a ticker-to-CIK map, resolves a set of tickers to their zero-padded ten-digit CIKs, and then pulls each company's submission record from the EDGAR submissions API — extracting the identity fields, the SIC code, the fiscal year end, the former-name history with date ranges, and a summary of the forms the entity recently filed. The submissions API is the per-CIK companion to the registry: the registry tells you the entity exists and what it is, and the submissions record gives you its full filing history.

import json
import time
import requests

# ---------------------------------------------------------------------------
# SEC EDGAR Company Resolution: ticker -> CIK -> filing history
# Sources:
#   Ticker map:    https://www.sec.gov/files/company_tickers.json
#   Submissions:   https://data.sec.gov/submissions/CIK{cik:010d}.json
#
# Strategy:
#   1. Download the canonical company_tickers.json and build a ticker -> CIK map.
#   2. Resolve one or more tickers to their zero-padded 10-digit CIKs.
#   3. Pull each company's submission record: identity fields, former names,
#      SIC code, fiscal year end, and the full recent filing history.
#   4. Summarize the filer: what it is, when it started, and which forms it files.
# ---------------------------------------------------------------------------

HEADERS  = {"User-Agent": "research@example.com (edgar entity-resolution project)"}
DATA_API = "https://data.sec.gov"
WWW      = "https://www.sec.gov"


# -- 1. Build the ticker -> CIK lookup table ---------------------------------

def load_ticker_map():
    """Return {TICKER: cik_int} from the canonical SEC company_tickers.json."""
    url  = WWW + "/files/company_tickers.json"
    data = requests.get(url, headers=HEADERS, timeout=30).json()
    # The file is keyed by arbitrary integer string indices, not by ticker.
    out = {}
    for row in data.values():
        # row = {"cik_str": 320193, "ticker": "AAPL", "title": "Apple Inc."}
        out[row["ticker"].upper()] = int(row["cik_str"])
    return out


def ticker_to_cik10(ticker, ticker_map):
    """Resolve a ticker to a zero-padded 10-digit CIK string, or None."""
    cik = ticker_map.get(ticker.upper())
    return str(cik).zfill(10) if cik is not None else None


# -- 2. Pull and summarize a company's submission record ---------------------

def company_profile(cik10):
    """Fetch the EDGAR submissions record and extract identity + history."""
    url  = DATA_API + "/submissions/CIK" + cik10 + ".json"
    data = requests.get(url, headers=HEADERS, timeout=30).json()

    # Former names carry the date range each name was in effect.
    former = []
    for fn in data.get("formerNames", []):
        former.append({
            "name": fn.get("name", ""),
            "from": (fn.get("from") or "")[:10],
            "to":   (fn.get("to")   or "")[:10],
        })

    recent = data.get("filings", {}).get("recent", {})
    forms  = recent.get("form", [])
    dates  = recent.get("filingDate", [])

    form_counts = {}
    for f in forms:
        form_counts[f] = form_counts.get(f, 0) + 1

    tickers   = data.get("tickers", []) or []
    exchanges = data.get("exchanges", []) or []

    return {
        "cik":               int(data.get("cik", int(cik10))),
        "name":              data.get("name", ""),
        "tickers":           tickers,
        "exchange":          exchanges[0] if exchanges else None,
        "sic_code":          data.get("sic", ""),
        "sic_description":   data.get("sicDescription", ""),
        "state_of_incorp":   data.get("stateOfIncorporation", ""),
        "fiscal_year_end":   data.get("fiscalYearEnd", ""),
        "is_active":         len(forms) > 0,
        "first_filing_date": min(dates) if dates else None,
        "last_filing_date":  max(dates) if dates else None,
        "former_names":      former,
        "form_counts":       form_counts,
    }


# -- Main --------------------------------------------------------------------

TICKERS = ["AAPL", "BRK-B", "META"]  # resolve any set of exchange symbols

print("Loading company_tickers.json ...")
ticker_map = load_ticker_map()
print("  Mapped " + str(len(ticker_map)) + " tickers to CIKs")

for tkr in TICKERS:
    cik10 = ticker_to_cik10(tkr, ticker_map)
    if cik10 is None:
        print("\n" + tkr + ": no CIK found in ticker map (delisted or foreign?)")
        continue

    prof = company_profile(cik10)
    time.sleep(0.2)  # respect SEC fair-access rate limits

    print("\n" + tkr + "  ->  CIK " + cik10)
    print("  Name:        " + prof["name"])
    print("  SIC:         " + str(prof["sic_code"]) + "  " + prof["sic_description"])
    print("  Incorp:      " + (prof["state_of_incorp"] or "?")
          + "   FY end: " + (prof["fiscal_year_end"] or "?")
          + "   Exch: " + (prof["exchange"] or "?"))
    print("  Filing span: " + str(prof["first_filing_date"])
          + "  ->  " + str(prof["last_filing_date"])
          + "   active: " + str(prof["is_active"]))

    if prof["former_names"]:
        print("  Former names:")
        for fn in prof["former_names"]:
            span = fn["from"] + " to " + (fn["to"] or "present")
            print("    " + fn["name"] + "  (" + span + ")")

    top_forms = sorted(prof["form_counts"].items(),
                       key=lambda kv: kv[1], reverse=True)[:6]
    print("  Top recent forms: "
          + ", ".join(f + " x" + str(n) for f, n in top_forms))

A few implementation notes. The company_tickers.json file is keyed by arbitrary integer indices rather than by ticker, so the loader iterates its values and re-keys on the symbol; note also that the SEC writes some tickers with a hyphen for share-class suffixes (for example BRK-B), and external data sources may use a dot instead, so a real resolver normalizes the separator before lookup. The submissions API returns the most recent filings inline and pages older history into a separate files array, so the first/last filing dates computed here reflect the inline window unless those extra pages are walked. The CIK must be zero-padded to ten digits for the submissions URL. And, as with all EDGAR access, the SEC's fair-access policy requires a descriptive User-Agent header with a contact email and a request rate no higher than ten per second.

Caveats

Ticker reuse breaks naive historical joins. A ticker is stable only over the window a given company holds it. Resolving a historical ticker through today's company_tickers.json can silently return the wrong CIK if the symbol has been reassigned, and joining old market data on ticker rather than CIK will mismatch entities across a delisting boundary. The only safe key for anything spanning time is the CIK; the ticker is an input to resolution, never the join key.

SIC is coarse and dated. A four-digit SIC code places a company in a rough industry neighborhood, not a precise business line, and the taxonomy predates large parts of the modern economy. Peer sets built purely on SIC will lump together firms with materially different economics and will misclassify companies that have pivoted since registration. For fine-grained sector work, SIC is a starting filter, not a final answer.

Foreign private issuers behave differently. A country code in state_of_incorporation marks a foreign private issuer, which files under a different disclosure regime (annual reports on Form 20-F rather than 10-K, different periodic obligations, and frequently no Section 16 insider reporting). Such entities are present in the registry and join on CIK like any other, but the set of downstream datasets that actually contains rows for them differs, and treating them as if they filed the full domestic form set will produce gaps that are features of the regime, not errors in the data.

Inactive CIKs inflate the population. The registry permanently retains every entity that has ever filed, so the raw 28,392-row count is a superset of currently public companies. Analyses that need the live market must filter on is_active; analyses that ignore the flag will mix in decades of acquired, deregistered, and dormant entities. Conversely, discarding inactive CIKs throws away exactly the history that makes shell-reuse and corporate-mortality analysis possible — so the right treatment depends on the question, and there is no default that is correct for both.


Related writing

For the insider-transaction dataset keyed on the same issuer CIK — every officer, director, and ten-percent owner trade in company stock — see SEC Form 4 Insider Trading: The Federal Database Behind Corporate Insider Stock Transactions.

For the private-markets dataset that resolves issuers through the same registry — exempt offerings and the companies behind them — see SEC Form D: The Private Placement Database Behind $2 Trillion in Annual Exempt Offerings.

For the fund-holdings dataset filed under registrant CIKs that join back to this registry — position-level fund portfolios across the registered fund universe — see SEC N-PORT Mutual Fund Holdings: The Federal Database Behind Every Fund Portfolio Position.