Technical writing
SEC EDGAR Company Registry: The Federal Index That Resolves Every Public Company
Every other SEC dataset — insider trades on Form 4, institutional holdings on Form 13F, fund portfolios on N-PORT, private placements on Form D, the structured financial facts behind every annual report, the 8-Ks announcing every material event — identifies the company it concerns by a single number: its Central Index Key. The SEC EDGAR company registry is the master index that turns that number into an entity. The table described here holds 28,392 companies, each row carrying the CIK, the current name, the ticker, the industry code, the state of incorporation, the fiscal year end, the exchange, the addresses, every former name the entity has used, and whether it is still filing — the lookup that makes the rest of the SEC corpus joinable.
What It Is
The EDGAR company registry is the canonical roster of every entity that has ever been an SEC filer. EDGAR — the Electronic Data Gathering, Analysis, and Retrieval system — is the SEC's filing infrastructure, and before any entity can submit a single document to it, the SEC assigns that entity a Central Index Key. The CIK is the primary key of the entire EDGAR universe. The company registry is simply the table of those keys and the descriptive attributes attached to each: who the filer is, what industry it operates in, where it is incorporated, and the window of time over which it has been active.
It is important to be precise about what an “entity” is here, because the registry is broader than the phrase “public company” suggests. A CIK is assigned to anything that files. That includes operating companies with listed stock, but it also includes registrants that have never had a publicly traded share: companies that filed a registration statement and then withdrew, special-purpose vehicles that issue asset-backed securities, foreign private issuers, investment-company trusts that umbrella dozens of mutual fund series, and the individual people who file insider-transaction forms. The 28,392 companies in this table are the corporate-entity subset of EDGAR — the issuers and operating registrants — rather than the full filer population, which also contains hundreds of thousands of natural-person reporting owners. When this article says “company,” it means an entity in that corporate subset.
The registry exists because EDGAR was built filing-first. A document is the atomic unit; the company is a layer of metadata wrapped around the documents a given CIK has submitted. That design has a consequence that shapes everything downstream: the registry is the only place the SEC consolidates per-entity identity. There is no separate “list of public companies” the agency maintains independently of who happens to be filing. The company registry, derived from the CIK assignment process and the headers of the filings themselves, is that list.
The CIK Is the Universal Join Key
The single most important fact about this dataset is that the CIK is the join key for the entire SEC. Every structured SEC dataset is keyed, directly or indirectly, on CIK, and the company registry is the dimension table that gives those facts a name, an industry, and a history. Without the registry, the rest of the SEC corpus is a collection of fact tables referencing opaque integers; with it, every one of those integers resolves to a company.
Consider what the CIK ties together across the corpus:
- Form 4 insider transactions. Each Form 4 carries an issuer CIK and a separate reporting-person CIK. The issuer CIK joins directly to the company registry, letting you attach every insider trade to the company's name, industry, and incorporation state.
- Form 13F institutional holdings. The securities a manager reports are identified by CUSIP, but the managers themselves and the underlying issuers resolve back to CIKs, and the registry is what turns those into named entities for sector and exposure rollups.
- N-PORT fund holdings. Every NPORT-P filing is submitted under a registrant CIK — the fund trust — which joins to the registry to identify the fund complex behind a portfolio.
- Form D exempt offerings. Each private-placement notice is filed under the issuer's CIK, so the registry connects a private capital raise to the same entity record used for that company's public filings, if it has any.
- Financial facts (XBRL). The SEC's company-facts API exposes every numeric financial concept — revenue, assets, net income — keyed by CIK. The registry supplies the SIC code that turns those facts into industry-comparable peer sets.
- 8-K current reports and Schedule 13D. Material-event disclosures and activist-stake filings are both filed under the subject company's CIK, so a single CIK assembles a company's entire event and ownership history from otherwise separate datasets.
The practical upshot is that the company registry is the spine of any cross-filing analysis. A question like “show me every SEC disclosure touching this company — its insider trades, the funds that hold it, its private raises, its material events — on one timeline” is answerable precisely because all of those datasets share the CIK, and the registry is what lets you start from a human concept (a company, a ticker, an industry) and fan out into the fact tables. Start from the wrong key — a company name, a ticker — and the joins break in exactly the ways the next sections describe.
The Fields
Each row of the registry describes one entity. The schema clusters into identity, industry, legal domicile, and lifecycle.
-- sec_companies: 28,392 companies
-- One row per entity registered as an EDGAR filer, keyed by CIK.
cik INTEGER -- Central Index Key: the permanent, unique filer ID
ticker TEXT -- primary exchange ticker, where one exists (often NULL)
name TEXT -- current legal/registered name of the entity
sic_code INTEGER -- 4-digit Standard Industrial Classification code
sic_description TEXT -- human-readable SIC industry label
state_of_incorporation TEXT -- 2-char code: DE, NV, MD, ... or a country for foreign filers
fiscal_year_end TEXT -- fiscal year-end as MMDD, e.g. "1231" or "0630"
exchange TEXT -- listing venue: Nasdaq, NYSE, CBOE, OTC, or NULL
business_address TEXT -- principal executive office address
mailing_address TEXT -- mailing address (often identical to business address)
former_names TEXT -- prior names with the dates each was in effect (JSON/array)
is_active BOOLEAN -- TRUE if the entity is currently filing
first_filing_date DATE -- date of the earliest filing on record for this CIK
last_filing_date DATE -- date of the most recent filing on record for this CIKWhy CIK Is Stable and Tickers Are Not
The contrast between the cik field and the ticker field is the central lesson of this dataset. A CIK is permanent and unique: once the SEC assigns it, it never changes and is never reissued to a different entity. A company can rename itself, move its incorporation, change exchanges, get acquired, or delist entirely, and its CIK stays fixed through all of it. That permanence is exactly what makes the CIK a reliable join key — every historical filing the entity ever made remains addressable by the same number.
Tickers have none of these properties. A ticker is an exchange symbol, not an SEC identifier, and it is messy in several distinct ways:
- Tickers are reused. When a company delists, its symbol is freed and can be reassigned to a completely unrelated company years later. The symbol that pointed to one issuer in 2009 may point to a different issuer today. A ticker is therefore not a stable identifier of an entity over time — only over the window during which a given company held it.
- Tickers change. Companies change symbols on rebranding, on merger, or simply by choice. The ticker in the registry is the current primary symbol; it does not preserve the symbol history the way
former_namespreserves name history. - One entity can have several tickers. A company with multiple share classes (a Class A and a Class B common stock, each listed) has multiple tickers but a single CIK. The registry's
tickerfield carries the primary symbol; the multi-class reality lives in the underlying EDGAR data. - Many entities have no ticker at all. A large fraction of the 28,392 companies have a NULL
ticker, because they have no publicly traded equity: bond-only issuers, withdrawn registrants, holding shells, and securitization vehicles all file with the SEC without ever trading on an exchange. Treating the registry as a list of tradable tickers badly undercounts the entities it actually contains.
The rule that falls out of this: identify a company by its CIK whenever you intend to join, compare across time, or build anything durable. Use the ticker only as a human-facing label or as the input to a lookup — never as the key itself.
SIC Industry Codes and Their Quirks
The sic_code field is a four-digit Standard Industrial Classification code, withsic_description carrying the matching label. SIC is the industry taxonomy the SEC uses to classify filers: 7372 is “Services-Prepackaged Software,” 2834 is “Pharmaceutical Preparations,” 6022 is “State Commercial Banks,” 1311 is “Crude Petroleum & Natural Gas.” The first digit places the entity in a broad division (manufacturing, finance, services, mining, and so on), and successive digits narrow it. For the SEC's purposes the SIC code is the primary lens for grouping companies into peer sets, and it is the field that makes industry-level analysis of the whole corpus possible.
SIC carries real quirks that any analyst should hold in mind:
- It is old and frozen. The SIC system dates to the 1930s and was effectively frozen decades ago; the federal government formally replaced it with NAICS for statistical purposes in 1997. The SEC nonetheless still uses SIC. The consequence is that the taxonomy predates entire industries: there is no clean SIC code for cloud computing, social media, or much of the modern internet economy, so such companies are mapped into the nearest legacy bucket (often a generic software or business-services code).
- It is coarse. A single four-digit code can cover firms with very different economics. “Services-Prepackaged Software” (7372) sweeps together enterprise software giants, tiny app developers, and gaming companies. SIC tells you the rough neighborhood, not the specific business.
- It is self-reported and sometimes stale. A company's SIC code reflects what it registered as, and a firm that has pivoted its business may carry a code describing what it used to do. The code is not re-audited as the company evolves.
- Special codes exist. Investment vehicles, blank-check companies, and certain financial structures carry codes (the 6000-series financial range is dense with them) that signal a fund or shell rather than an operating business — useful for filtering, but easy to misread as ordinary industry classifications.
State of Incorporation and Delaware Dominance
The state_of_incorporation field records the legal domicile of the entity — the jurisdiction whose corporate law governs it — as a two-character code. For US entities this is a state code; for foreign private issuers it is a country code. The field is distinct from where the company operates or where its headquarters sit (those are in the address fields); incorporation is a purely legal attribute.
The dominant fact in this column is Delaware. A large majority of US public companies are incorporated in Delaware (DE) regardless of where they actually do business, a concentration with no parallel in any other corporate attribute. The reasons are well known: Delaware's Court of Chancery is a specialized business court with centuries of accumulated case law, the Delaware General Corporation Law is flexible and frequently updated, and the predictability of Delaware outcomes is itself valuable to companies and their investors. The result is that incorporation state in the registry is heavily skewed — Delaware first by a wide margin, then a long tail of Nevada, Maryland (favored by REITs and funds for its statutory provisions), and the company's actual home state.
For analysis, the incorporation field supports questions the headquarters address cannot: measuring the Delaware share of a given industry or cohort, identifying the minority of firms that deliberately incorporate elsewhere, and flagging foreign-domiciled issuers (a country code in this field is itself a marker of a foreign private issuer, with the different disclosure regime that implies).
Fiscal Year End
The fiscal_year_end field stores the entity's fiscal year-end as a four-character MMDD string. The overwhelmingly common value is 1231— a calendar-year fiscal year — but a meaningful minority of companies close their books on other dates: retailers often end in late January or early February to fall after the holiday selling season, and many technology and education companies use a June 30 (0630) or September 30 (0930) year end.
This field is small but load-bearing for any work that touches financial statements. Comparing two companies' annual results is only valid once you know whether their fiscal years line up; a June-fiscal-year company's “fiscal 2026” covers a different twelve months than a December-fiscal-year company's. The fiscal year end is also what determines when an annual report (10-K) and the related Section 16 annual forms are due, so it anchors the filing calendar for the entity. When the registry is joined to the financial-facts data, the fiscal year end is the field that lets you align periods correctly rather than naively comparing report labels.
Former Names: Tracking Renames and Mergers
The former_names field is the registry's memory. While name holds the entity's current registered name, former_names preserves every prior name the same CIK has filed under, typically with the date range each name was in effect. Because the CIK never changes, this field is how you follow an entity through a rename without losing the thread.
This is more useful than it first appears, because corporate name changes are common and carry information:
- Rebrands. When a company changes its name — Facebook, Inc. becoming Meta Platforms, Inc., or Google reorganizing under Alphabet — the CIK stays put and the old name moves into
former_names. A search for the old name still resolves to the entity through this field. - Reverse mergers and shell reuse. A frequent and analytically important pattern: a dormant public shell with an existing CIK is acquired by a private operating company, which then renames the shell to its own name and continues filing under the shell's CIK. The
former_namesfield exposes this directly — a current operating company whose former name was an unrelated, often defunct-sounding shell is the fingerprint of a reverse merger. Pairing that with afirst_filing_datethat long predates the operating business is a strong signal. - Post-acquisition continuation. An entity that survives a merger and takes on the combined company's name leaves its prior identity in
former_names, letting you reconstruct lineage that a name-only view would erase.
The broader point is that former_names is what makes name-based history coherent on top of an ID-based system. The CIK guarantees continuity; the former-names list is what lets a human reading the data see the continuity.
Active Versus Inactive Filers
The is_active flag, read together with first_filing_date andlast_filing_date, describes the entity's lifecycle on EDGAR. An active filer is one still submitting filings; an inactive CIK belongs to an entity that has stopped — because it was acquired, went private, was liquidated, deregistered, or simply went dark. The first and last filing dates bound the window of activity: the earliest document on record for the CIK and the most recent.
Inactive CIKs are not noise to be discarded; they are the historical record. Roughly speaking, the registry accumulates filers and rarely loses them — a delisted company's CIK and all its filings remain permanently addressable. That permanence is the reason the SEC corpus supports longitudinal analysis at all. But it does mean that the registry as a whole is a superset of currently public companies. Filtering on is_active = TRUE narrows to the live universe; leaving it unfiltered gives you the full population, including decades of entities that no longer exist. Which one is correct depends entirely on the question: a point-in-time market view wants active filers, while a study of corporate mortality or shell reuse needs the inactive ones precisely because that is where the history lives.
The Ticker-to-CIK Lookup Problem
Because the CIK is the join key but humans and market-data systems speak in tickers, the most common first step in any SEC analysis is resolving a ticker to a CIK — and it is harder than it looks, for all the reasons tickers are messy. The SEC provides two official mechanisms.
The company_tickers.json file. The SEC publishes a canonical mapping at https://www.sec.gov/files/company_tickers.json: a single JSON document associating each currently listed ticker with its CIK and company title. This is the authoritative bulk lookup table, the right tool when you need to resolve many symbols at once or build a local ticker-to-CIK index. Its key limitation follows directly from the nature of tickers: it reflects the current mapping. A delisted company's old ticker may be absent, or worse, may now point to whatever company holds that symbol today. The file answers “what CIK does this ticker mean right now,” not “what CIK did this ticker mean in 2012.”
EDGAR full-text and company search. For names rather than tickers, or for entities the ticker file omits, EDGAR's search interfaces resolve a name or partial name to candidate CIKs. Full-text search spans filing contents, and the company-search endpoint matches on entity name — including, usefully, former names — so an entity that has since renamed can still be found by what it used to be called. Name search is fuzzier than the ticker file (multiple entities can share similar names, and disambiguation falls to the analyst), but it reaches the parts of the registry the ticker map cannot, including the large NULL-ticker population.
The robust pattern in practice is layered: try the ticker file first for a clean current symbol; fall back to name search (which sees former names) when the ticker file misses; and, once a CIK is in hand, treat it as the durable identifier for everything that follows. The lookup is a one-time cost paid at the boundary between the human-readable world and the CIK-keyed world of the actual data.
What You Can Do With It
The registry's value is almost entirely as connective tissue: it is the table that makes the rest of the SEC usable. Concretely, it supports:
- Build an entity resolver. A service that accepts a ticker, a name, or a former name and returns the canonical CIK is the precondition for joining any SEC dataset to any external data. The registry, plus the ticker file, is the lookup behind such a resolver.
- Map an industry by SIC. Grouping the registry on
sic_codeyields the full population of filers in any industry — every software company, every pharmaceutical preparation maker, every state commercial bank — which becomes the peer set for any comparative analysis built on the financial-facts data. - Track the Delaware incorporation share. Aggregating
state_of_incorporationacross the registry, or within an industry or cohort, quantifies Delaware's dominance and surfaces the firms that deliberately domicile elsewhere. - Detect renamed and merged shells. Scanning
former_namesfor operating companies whose prior names look like dormant shells — especially wherefirst_filing_datelong predates the current business — surfaces reverse mergers and shell reuse that no single filing reveals on its own. - Drive a cross-filing company timeline. Using the CIK as the spine, assemble a single chronological view of an entity from every SEC dataset at once: its insider trades, the funds reporting it, its private raises, its material events. The registry is what makes that one CIK resolve to a company a reader recognizes.
Python: Resolving Tickers to CIKs and Pulling Filing History
The script below demonstrates the full entity-resolution workflow against the live SEC. It downloads the canonical company_tickers.json to build a ticker-to-CIK map, resolves a set of tickers to their zero-padded ten-digit CIKs, and then pulls each company's submission record from the EDGAR submissions API — extracting the identity fields, the SIC code, the fiscal year end, the former-name history with date ranges, and a summary of the forms the entity recently filed. The submissions API is the per-CIK companion to the registry: the registry tells you the entity exists and what it is, and the submissions record gives you its full filing history.
import json
import time
import requests
# ---------------------------------------------------------------------------
# SEC EDGAR Company Resolution: ticker -> CIK -> filing history
# Sources:
# Ticker map: https://www.sec.gov/files/company_tickers.json
# Submissions: https://data.sec.gov/submissions/CIK{cik:010d}.json
#
# Strategy:
# 1. Download the canonical company_tickers.json and build a ticker -> CIK map.
# 2. Resolve one or more tickers to their zero-padded 10-digit CIKs.
# 3. Pull each company's submission record: identity fields, former names,
# SIC code, fiscal year end, and the full recent filing history.
# 4. Summarize the filer: what it is, when it started, and which forms it files.
# ---------------------------------------------------------------------------
HEADERS = {"User-Agent": "research@example.com (edgar entity-resolution project)"}
DATA_API = "https://data.sec.gov"
WWW = "https://www.sec.gov"
# -- 1. Build the ticker -> CIK lookup table ---------------------------------
def load_ticker_map():
"""Return {TICKER: cik_int} from the canonical SEC company_tickers.json."""
url = WWW + "/files/company_tickers.json"
data = requests.get(url, headers=HEADERS, timeout=30).json()
# The file is keyed by arbitrary integer string indices, not by ticker.
out = {}
for row in data.values():
# row = {"cik_str": 320193, "ticker": "AAPL", "title": "Apple Inc."}
out[row["ticker"].upper()] = int(row["cik_str"])
return out
def ticker_to_cik10(ticker, ticker_map):
"""Resolve a ticker to a zero-padded 10-digit CIK string, or None."""
cik = ticker_map.get(ticker.upper())
return str(cik).zfill(10) if cik is not None else None
# -- 2. Pull and summarize a company's submission record ---------------------
def company_profile(cik10):
"""Fetch the EDGAR submissions record and extract identity + history."""
url = DATA_API + "/submissions/CIK" + cik10 + ".json"
data = requests.get(url, headers=HEADERS, timeout=30).json()
# Former names carry the date range each name was in effect.
former = []
for fn in data.get("formerNames", []):
former.append({
"name": fn.get("name", ""),
"from": (fn.get("from") or "")[:10],
"to": (fn.get("to") or "")[:10],
})
recent = data.get("filings", {}).get("recent", {})
forms = recent.get("form", [])
dates = recent.get("filingDate", [])
form_counts = {}
for f in forms:
form_counts[f] = form_counts.get(f, 0) + 1
tickers = data.get("tickers", []) or []
exchanges = data.get("exchanges", []) or []
return {
"cik": int(data.get("cik", int(cik10))),
"name": data.get("name", ""),
"tickers": tickers,
"exchange": exchanges[0] if exchanges else None,
"sic_code": data.get("sic", ""),
"sic_description": data.get("sicDescription", ""),
"state_of_incorp": data.get("stateOfIncorporation", ""),
"fiscal_year_end": data.get("fiscalYearEnd", ""),
"is_active": len(forms) > 0,
"first_filing_date": min(dates) if dates else None,
"last_filing_date": max(dates) if dates else None,
"former_names": former,
"form_counts": form_counts,
}
# -- Main --------------------------------------------------------------------
TICKERS = ["AAPL", "BRK-B", "META"] # resolve any set of exchange symbols
print("Loading company_tickers.json ...")
ticker_map = load_ticker_map()
print(" Mapped " + str(len(ticker_map)) + " tickers to CIKs")
for tkr in TICKERS:
cik10 = ticker_to_cik10(tkr, ticker_map)
if cik10 is None:
print("\n" + tkr + ": no CIK found in ticker map (delisted or foreign?)")
continue
prof = company_profile(cik10)
time.sleep(0.2) # respect SEC fair-access rate limits
print("\n" + tkr + " -> CIK " + cik10)
print(" Name: " + prof["name"])
print(" SIC: " + str(prof["sic_code"]) + " " + prof["sic_description"])
print(" Incorp: " + (prof["state_of_incorp"] or "?")
+ " FY end: " + (prof["fiscal_year_end"] or "?")
+ " Exch: " + (prof["exchange"] or "?"))
print(" Filing span: " + str(prof["first_filing_date"])
+ " -> " + str(prof["last_filing_date"])
+ " active: " + str(prof["is_active"]))
if prof["former_names"]:
print(" Former names:")
for fn in prof["former_names"]:
span = fn["from"] + " to " + (fn["to"] or "present")
print(" " + fn["name"] + " (" + span + ")")
top_forms = sorted(prof["form_counts"].items(),
key=lambda kv: kv[1], reverse=True)[:6]
print(" Top recent forms: "
+ ", ".join(f + " x" + str(n) for f, n in top_forms))
A few implementation notes. The company_tickers.json file is keyed by arbitrary integer indices rather than by ticker, so the loader iterates its values and re-keys on the symbol; note also that the SEC writes some tickers with a hyphen for share-class suffixes (for example BRK-B), and external data sources may use a dot instead, so a real resolver normalizes the separator before lookup. The submissions API returns the most recent filings inline and pages older history into a separate files array, so the first/last filing dates computed here reflect the inline window unless those extra pages are walked. The CIK must be zero-padded to ten digits for the submissions URL. And, as with all EDGAR access, the SEC's fair-access policy requires a descriptive User-Agent header with a contact email and a request rate no higher than ten per second.
Caveats
Ticker reuse breaks naive historical joins. A ticker is stable only over the window a given company holds it. Resolving a historical ticker through today's company_tickers.json can silently return the wrong CIK if the symbol has been reassigned, and joining old market data on ticker rather than CIK will mismatch entities across a delisting boundary. The only safe key for anything spanning time is the CIK; the ticker is an input to resolution, never the join key.
SIC is coarse and dated. A four-digit SIC code places a company in a rough industry neighborhood, not a precise business line, and the taxonomy predates large parts of the modern economy. Peer sets built purely on SIC will lump together firms with materially different economics and will misclassify companies that have pivoted since registration. For fine-grained sector work, SIC is a starting filter, not a final answer.
Foreign private issuers behave differently. A country code in state_of_incorporation marks a foreign private issuer, which files under a different disclosure regime (annual reports on Form 20-F rather than 10-K, different periodic obligations, and frequently no Section 16 insider reporting). Such entities are present in the registry and join on CIK like any other, but the set of downstream datasets that actually contains rows for them differs, and treating them as if they filed the full domestic form set will produce gaps that are features of the regime, not errors in the data.
Inactive CIKs inflate the population. The registry permanently retains every entity that has ever filed, so the raw 28,392-row count is a superset of currently public companies. Analyses that need the live market must filter on is_active; analyses that ignore the flag will mix in decades of acquired, deregistered, and dormant entities. Conversely, discarding inactive CIKs throws away exactly the history that makes shell-reuse and corporate-mortality analysis possible — so the right treatment depends on the question, and there is no default that is correct for both.
Related writing
For the insider-transaction dataset keyed on the same issuer CIK — every officer, director, and ten-percent owner trade in company stock — see SEC Form 4 Insider Trading: The Federal Database Behind Corporate Insider Stock Transactions.
For the private-markets dataset that resolves issuers through the same registry — exempt offerings and the companies behind them — see SEC Form D: The Private Placement Database Behind $2 Trillion in Annual Exempt Offerings.
For the fund-holdings dataset filed under registrant CIKs that join back to this registry — position-level fund portfolios across the registered fund universe — see SEC N-PORT Mutual Fund Holdings: The Federal Database Behind Every Fund Portfolio Position.