Technical writing

OFAC SDN integration in the Federal Regulatory Data Hub: conditional GET, entity normalization, and sub-second screening

· AI Analytics
Regulatory dataComplianceOFACData engineering

The OFAC SDN list in context

The OFAC Specially Designated Nationals and Blocked Persons (SDN) list is the United States' primary sanctions enforcement tool. Financial institutions, fintechs, and any company processing transactions or counterparties with US nexus is legally required to screen against it. Missing a hit — transacting with an SDN — can result in civil penalties exceeding $1M per violation and criminal charges for knowing violations.

OFAC updates the list approximately 10–15 times per month, sometimes multiple times in a single day when urgent designations are made. The Federal Regulatory Data Hub treats the SDN list as a high-priority source: it is checked for updates every hour, not once daily like most other sources.

The source: OFAC's XML publication

OFAC publishes the SDN list in several formats. The Hub uses the consolidated XML (sdn_advanced.xml) rather than the flat CSV, because the XML contains structured sub-records for aliases, addresses, IDs, and program classifications that the CSV merges into comma-separated fields, making reliable parsing difficult.

# OFAC SDN source URL (publicly accessible, no auth required)
SDN_XML_URL = "https://www.treasury.gov/ofac/downloads/sdn_advanced.xml"
SDN_XML_SHA = "https://www.treasury.gov/ofac/downloads/sdn_advanced.xml.sha256"

# Also available: the consolidated sanctions list (SDN + non-SDN programs)
CONSOLIDATED_XML_URL = "https://www.treasury.gov/ofac/downloads/consolidated/consolidated.xml"

The XML file is approximately 20–25 MB compressed. OFAC sets Last-Modifiedand ETag headers, which the Hub uses for conditional GET to avoid downloading the full file when it hasn't changed.

Conditional GET: delta detection without a diff

The hourly ingest worker checks for updates using If-None-Match:

// Cloudflare Worker — runs every hour via cron trigger
async function checkOfacSdn(env: Env): Promise<void> {
  const stored = await env.KV.get("ofac_sdn_etag");
  const shaStored = await env.KV.get("ofac_sdn_sha256");

  // Step 1: Check the SHA256 file first (tiny, 64 bytes)
  // If it hasn't changed, skip the full XML download entirely
  const shaResp = await fetch(SDN_XML_SHA, {
    headers: stored ? { "If-None-Match": shaStored ?? "" } : {},
  });

  if (shaResp.status === 304) {
    console.log("OFAC SDN: no change (SHA256 304)");
    return;
  }

  const newSha = await shaResp.text();
  if (newSha.trim() === shaStored) {
    console.log("OFAC SDN: no change (SHA256 match)");
    return;
  }

  // Step 2: Download and parse the full XML
  const xmlResp = await fetch(SDN_XML_URL, {
    headers: stored ? { "If-None-Match": stored } : {},
  });

  if (xmlResp.status === 304) {
    console.log("OFAC SDN: no change (ETag 304)");
    return;
  }

  const etag = xmlResp.headers.get("ETag") ?? "";
  const xml = await xmlResp.text();

  // Step 3: Parse and upsert
  const entries = parseOfacXml(xml);
  await upsertSdnEntries(env.DB, entries);

  await env.KV.put("ofac_sdn_etag", etag);
  await env.KV.put("ofac_sdn_sha256", newSha.trim());
  await env.KV.put("ofac_sdn_last_updated", new Date().toISOString());

  console.log(`OFAC SDN: updated ${entries.length} entries`);
}

The two-step SHA256-then-ETag check saves bandwidth. OFAC occasionally re-publishes the XML with updated metadata (e.g., a corrected date in the header) that changes the ETag but not the actual entry data. The SHA256 check catches this and avoids a spurious re-parse.

XML structure and parsing

The OFAC SDN XML uses a custom schema. A representative entry:

<sdnEntry>
  <uid>36</uid>
  <lastName>IRAN AIRCRAFT MANUFACTURING INDUSTRIAL COMPANY</lastName>
  <sdnType>Entity</sdnType>
  <programList>
    <program>IRAN</program>
    <program>NPWMD</program>
  </programList>
  <akaList>
    <aka>
      <uid>1234</uid>
      <type>a.k.a.</type>
      <category>strong</category>
      <lastName>IAMIC</lastName>
    </aka>
    <aka>
      <uid>1235</uid>
      <type>a.k.a.</type>
      <category>weak</category>
      <lastName>IRAN AIRCRAFT MFG</lastName>
    </aka>
  </akaList>
  <addressList>
    <address>
      <uid>5678</uid>
      <city>Tehran</city>
      <country>Iran</country>
    </address>
  </addressList>
  <idList>
    <id>
      <uid>9012</uid>
      <idType>Business Registration Document</idType>
      <idNumber>12345678</idNumber>
      <idCountry>Iran</idCountry>
    </id>
  </idList>
  <nationalityList>
    <nationality>
      <uid>3456</uid>
      <country>Iran</country>
    </nationality>
  </nationalityList>
</sdnEntry>

The parser extracts the canonical entry and then “explodes” each alias into a separate searchable row. An entity with 12 aliases becomes 13 rows in the sdn_aliases table — the canonical name plus one row per alias. This is the data model that makes name-matching fast: all screening queries run against the flat alias table rather than joining back to an alias sub-table at query time.

Name normalization: the preprocessing pipeline

Raw OFAC names include transliterations, abbreviations, punctuation variants, and mixed-language entries. Before storage, every name passes through a normalization pipeline:

def normalize_sdn_name(raw: str) -> str:
    """
    Normalize an OFAC SDN name for fuzzy matching.
    Applied to both stored names and query names at search time.
    """
    import unicodedata
    import re

    # Step 1: Unicode normalize to NFKD and strip combining characters
    # Handles accented chars: 'ÁLVAREZ' → 'ALVAREZ'
    nfkd = unicodedata.normalize("NFKD", raw)
    ascii_only = "".join(c for c in nfkd if not unicodedata.combining(c))

    # Step 2: Uppercase
    s = ascii_only.upper()

    # Step 3: Strip legal suffixes that don't distinguish entities
    # (preserve them in the original; strip for matching only)
    LEGAL_SUFFIXES = [
        r"CO", r"CORP", r"INC", r"LTD", r"LLC",
        r"COMPANY", r"COMPANIES", r"GROUP", r"GRUPO",
        r"HOLDINGS", r"INTERNATIONAL", r"SA", r"SARL",
        r"GMBH", r"AG", r"BV", r"NV", r"OOO",
        r"JSC", r"CJSC", r"OJSC",  # Russian entity types
    ]
    for suffix in LEGAL_SUFFIXES:
        s = re.sub(suffix, "", s)

    # Step 4: Collapse whitespace, strip punctuation
    s = re.sub(r"[^ws]", " ", s)  # punctuation → space
    s = re.sub(r"s+", " ", s).strip()

    return s

The normalization is applied at ingest time and stored as a separate normalized_name column alongside the raw name. At screening time, the query name goes through the same pipeline before lookup. This symmetry ensures that “Al-Qaida”, “Al Qaeda”, and “AL QAEDA” all normalize to the same form.

The D1 schema

-- Primary entry table
CREATE TABLE sdn_entries (
    uid           INTEGER PRIMARY KEY,  -- OFAC's uid
    sdn_type      TEXT NOT NULL,        -- 'Individual' | 'Entity' | 'Vessel' | 'Aircraft'
    canonical_name TEXT NOT NULL,
    normalized_name TEXT NOT NULL,
    programs      TEXT NOT NULL,        -- JSON array: ["IRAN", "NPWMD"]
    last_updated  TEXT NOT NULL,        -- ISO 8601
    is_active     INTEGER NOT NULL DEFAULT 1
);

-- Alias explosion — one row per alias per entry
CREATE TABLE sdn_aliases (
    alias_uid     INTEGER PRIMARY KEY,
    entry_uid     INTEGER NOT NULL REFERENCES sdn_entries(uid),
    alias_type    TEXT,               -- 'a.k.a.' | 'f.k.a.' | 'n.k.a.'
    alias_category TEXT,             -- 'strong' | 'weak'
    raw_name      TEXT NOT NULL,
    normalized_name TEXT NOT NULL
);

-- FTS5 index over both canonical and alias names
CREATE VIRTUAL TABLE sdn_fts USING fts5(
    normalized_name,
    entry_uid UNINDEXED,
    alias_uid UNINDEXED,
    tokenize='unicode61'
);

-- Addresses, IDs, nationalities (for structured filtering)
CREATE TABLE sdn_addresses (
    uid           INTEGER PRIMARY KEY,
    entry_uid     INTEGER NOT NULL REFERENCES sdn_entries(uid),
    city          TEXT,
    country       TEXT,
    postal_code   TEXT
);

CREATE TABLE sdn_ids (
    uid           INTEGER PRIMARY KEY,
    entry_uid     INTEGER NOT NULL REFERENCES sdn_entries(uid),
    id_type       TEXT,     -- 'Passport' | 'National ID No.' | 'Business Registration...'
    id_number     TEXT,
    id_country    TEXT
);

-- Indexes for screening queries
CREATE INDEX idx_sdn_aliases_normalized ON sdn_aliases(normalized_name);
CREATE INDEX idx_sdn_entries_normalized ON sdn_entries(normalized_name);
CREATE INDEX idx_sdn_ids_number ON sdn_ids(id_number);

The screening query: three-pass matching

The compliance screening endpoint runs three passes against the SDN data, returning hits at the configured confidence threshold:

// GET /v1/screen?name=Iran+Aircraft+Manufacturing&threshold=0.85
async function screenSdn(
  name: string,
  threshold: number,
  db: D1Database
): Promise<SdnHit[]> {

  const normalized = normalizeSdnName(name);

  // Pass 1: Exact normalized name match (fastest, highest confidence)
  const exactHits = await db.prepare('
    SELECT e.uid, e.canonical_name, e.programs, 1.0 AS score, 'exact' AS method
    FROM sdn_entries e
    WHERE e.normalized_name = ?
      AND e.is_active = 1
    UNION ALL
    SELECT e.uid, e.canonical_name, e.programs, 0.95 AS score, 'alias_exact' AS method
    FROM sdn_aliases a
    JOIN sdn_entries e ON e.uid = a.entry_uid
    WHERE a.normalized_name = ?
      AND e.is_active = 1
  ').bind(normalized, normalized).all<SdnRow>();

  if (exactHits.results.length > 0) return exactHits.results;

  // Pass 2: FTS5 full-text search with BM25 ranking
  const ftsHits = await db.prepare('
    SELECT e.uid, e.canonical_name, e.programs,
           (1.0 - (rank / -10.0)) AS score,
           'fts' AS method
    FROM sdn_fts
    JOIN sdn_entries e ON e.uid = sdn_fts.entry_uid
    WHERE sdn_fts MATCH ?
      AND e.is_active = 1
    ORDER BY rank
    LIMIT 20
  ').bind(normalized).all<SdnRow>();

  // Pass 3: Jaro-Winkler fuzzy match for remaining candidates from Pass 2
  const fuzzyHits = ftsHits.results.filter(row => {
    const jwScore = jaroWinkler(normalized, normalizeSdnName(row.canonical_name));
    return jwScore >= threshold;
  }).map(row => ({
    ...row,
    score: jaroWinkler(normalized, normalizeSdnName(row.canonical_name)),
    method: 'fuzzy',
  }));

  return fuzzyHits;
}

The three-pass design keeps latency low: exact matches return in under 2ms because they hit a b-tree index. FTS5 BM25 search takes 5–15ms for a 12K-entry table. Jaro-Winkler runs only on the FTS5 candidates (≤20 rows), not the full table. End-to-end screening against OFAC SDN: p50 8ms, p99 28ms.

Alias category handling

OFAC distinguishes “strong” aliases (reliable alternative names used regularly) from “weak” aliases (phonetic transliterations, partial name variants). The screening endpoint surfaces this distinction in the response:

{
  "hits": [
    {
      "entry_uid": 36,
      "canonical_name": "IRAN AIRCRAFT MANUFACTURING INDUSTRIAL COMPANY",
      "programs": ["IRAN", "NPWMD"],
      "match_name": "IAMIC",
      "match_type": "alias_exact",
      "alias_category": "strong",
      "score": 0.95,
      "sdn_type": "Entity"
    }
  ],
  "query_name": "IAMIC",
  "query_normalized": "IAMIC",
  "threshold": 0.85,
  "screening_time_ms": 4,
  "list_last_updated": "2025-06-27T14:22:00Z"
}

Compliance teams can use alias_category to triage: a “strong” alias match at score ≥ 0.95 is equivalent to a canonical name match. A “weak” alias match below 0.90 warrants manual review.

The consolidated list: non-SDN programs

The OFAC SDN list is the most prominent but not the only OFAC sanctions list. The consolidated list includes:

  • Non-SDN Foreign Sanctions Evaders (FSE) — entities that evade sanctions imposed on third countries
  • Non-SDN Palestinian Legislative Council (PLC) — Hamas-affiliated members
  • Non-SDN Menu-Based Sanctions (NS-MBS) — sectoral sanctions on Russian financial/energy/defense/intel sectors
  • Sectoral Sanctions Identifications (SSI) — Russian, North Korean, and other sectoral targets

The Federal Regulatory Data Hub ingests all programs from the consolidated XML into the same sdn_entries and sdn_aliases tables, using the programs JSON array to distinguish which sanctions authorities apply. A single screening call checks all programs simultaneously — the compliance screening risk score article covers how program membership factors into the 0–100 score.

Update statistics

Based on 12 months of ingest monitoring (July 2024 – June 2025):

MetricValue
OFAC SDN updates per month (average)12.4
Maximum updates in one day4 (immediate geopolitical designations)
Time from OFAC publish to Hub index updatep50 23 min, p99 58 min (hourly cron)
Total SDN entries (active, June 2025)12,247
Total SDN aliases (active)44,891
Average aliases per SDN entry3.7
Maximum aliases for one entry218 (large conglomerate with many subsidiary names)
Screening query p50 latency (OFAC SDN only)8ms
Screening query p99 latency (all 30+ lists)120ms

Related articles:

For how the OFAC SDN screening result is combined with 30+ other enforcement lists into a 0–100 compliance risk score: Compliance screening across 30+ federal enforcement lists: how the risk score works →

For how the Federal Regulatory Data Hub ingests all 197 datasets — including the ETL patterns, retry budgets, and schema drift handling that apply to OFAC: Federal dataset ingest: keeping 197 federal datasets fresh at the edge →

For how entity names in the SDN list are resolved against SEC CIK, SAM UEI, and LEI identifiers across the entity bridge: Entity ID normalization in the Federal Regulatory Data Hub: resolving CIK, UEI, LEI, DUNS, and NPI →

For the query routing layer that dispatches screening requests across the 8 D1 shards, including the financial_crimes shard where OFAC data lives: The Federal Regulatory Data Hub query layer: routing 35M records at the Cloudflare edge →

For the 47-feature engineering pipeline that turns political calendar data, OFAC designation timelines, and network telemetry into the Voidly shutdown forecast model inputs: Feature engineering for the 7-day internet shutdown forecast: political calendar, sanctions, and network telemetry →

For how Voidly aligns OFAC sanctions designation bursts with measured internet shutdown events — diplomatic_isolation_score, country case studies, and the forecasting model integration: Sanctions timelines and internet shutdowns: how Voidly correlates OFAC designation bursts with censorship events →