Technical writing

The Federal Regulatory API: REST, MCP, and JSON-LD for 197 Federal Datasets

· 8 min read· AI Analytics
Regulatory dataAPI designMCPCloudflare

The Federal Regulatory Data Hub at api.ai-analytics.org indexes 197 federal datasets across 45 agencies — SEC, FDA, OFAC, DOJ, EPA, CFPB, IRS, FEMA, CDC, NHTSA, FAA, CFTC, CMS, MSHA, OSHA and more. The data lives in Cloudflare D1 (SQLite at the edge, described in detail in the D1 infrastructure post). This post covers the API layer on top of it: the REST endpoints, the MCP server for agent workflows, and the JSON-LD surface for structured data indexing.

Design principles

Three principles shaped every API decision:

  • No authentication. Regulatory data is public-domain under CC0 1.0. Requiring an API key would create a barrier for journalists, academics, and civil-society organizations who are the primary consumers of this data. The API is open with rate limits applied at the IP level (Cloudflare rate limiting rules, no account required).
  • Edge-native. All requests are served by Cloudflare Workers — no origin server, no compute region. Response latency for cached requests is under 10ms globally. D1 reads add 30–80ms depending on Worker location. The entire serving infrastructure is serverless.
  • Machine-readable first. Every endpoint returns JSON. The entity timeline endpoint also returns JSON-LD for Schema.org compatibility. The /today.md and /llms.txt surfaces are generated daily for LLM agent consumption.

The entity endpoint: cross-agency resolution in one GET

The primary endpoint is /entity/:id, which accepts any of six identifier types — CIK (SEC), ticker, UEI (SAM.gov), LEI, DUNS, or NPI (healthcare) — and returns a unified timeline of every regulatory event across all 197 datasets for that entity.

# Example: ExxonMobil via ticker
curl https://api.ai-analytics.org/entity/XOM

# Response structure
{
  "entity": {
    "canonical_id": "xom",
    "name": "Exxon Mobil Corporation",
    "identifiers": {
      "cik": "0000034088",
      "ticker": "XOM",
      "lei": "549300WR5W1O3YK9A883",
      "uei": "Q9THQZQVFR43"
    }
  },
  "timeline": [
    {
      "date": "2025-11-04",
      "source": "SEC EDGAR",
      "dataset": "sec-edgar-filings",
      "type": "10-Q",
      "description": "Quarterly report filed for period ending 2025-09-30",
      "url": "https://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK=0000034088"
    },
    {
      "date": "2025-09-22",
      "source": "EPA ECHO",
      "dataset": "epa-enforcement-cases",
      "type": "enforcement_case",
      "description": "Civil enforcement case opened under Clean Air Act",
      "penalty_amount_usd": 847000
    },
    ...  // all agency events, unified timeline, descending date
  ],
  "compliance_score": {
    "score": 23,
    "tier": "elevated",
    "active_flags": ["epa_enforcement"],
    "computed_at": "2026-02-01T06:00:00Z"
  }
}

The entity resolution that powers this endpoint is described in detail in the cross-agency entity graph post. The short version: a three-pass ID resolution pipeline (exact ID match → fuzzy name match → TF-IDF similarity for free-text datasets) builds a canonical entity record that joins identifiers across all 197 datasets.

The compliance screening endpoint

GET /screen/:id returns a 0–100 compliance risk score and the active flag list across 30+ enforcement lists:

curl https://api.ai-analytics.org/screen/XOM

# Response
{
  "entity": "ExxonMobil",
  "score": 23,
  "tier": "elevated",
  "flags": [
    {
      "list": "epa-enforcement-cases",
      "severity": "medium",
      "description": "Open enforcement case — Clean Air Act violation",
      "opened": "2025-09-22",
      "resolved": null
    }
  ],
  "lists_checked": 34,
  "lists_clean": 33,
  "computed_at": "2026-02-01T06:00:00Z"
}

# Lists screened include:
# OFAC SDN, OFAC Non-SDN, FinCEN, SAM exclusions, OIG exclusions,
# SEC enforcement, CFPB enforcement, FDIC problem institutions,
# OCC enforcement, FINRA, CFTC, PCAOB, DOJ criminal referrals,
# EPA enforcement, MSHA violations, OSHA citations,
# NHTSA recalls, FDA warning letters, UFLPA, BIS Entity List,
# Trade.gov CSL, CISA KEV, and 12 more

The scoring methodology — list severity weights, recency decay, and entity resolution — is covered in the compliance screening write-up.

The search endpoint

GET /search?q=:query&datasets=:list runs full-text search across the FTS5 virtual tables we build alongside each dataset:

# Search DOJ press releases for "price fixing"
curl "https://api.ai-analytics.org/search?q=price+fixing&datasets=doj-press-releases"

# Search across all free-text datasets
curl "https://api.ai-analytics.org/search?q=PFAS+contamination"

# Response
{
  "query": "PFAS contamination",
  "results": [
    {
      "dataset": "epa-enforcement-cases",
      "id": "EPA-HQ-OGC-2024-0412",
      "snippet": "...PFAS contamination at former manufacturing site...",
      "date": "2024-08-15",
      "entity": "3M Company"
    },
    ...
  ],
  "total": 847,
  "page": 1
}

The MCP server: 38+ tools for Claude and GPT

The MCP server at api.ai-analytics.org/mcp exposes 38+ tools over JSON-RPC / Streamable HTTP, compatible with Claude's tool use API and OpenAI's function calling.

Tool categories:

  • Entity tools: get_entity, search_entities, resolve_identifier, get_entity_timeline
  • Compliance tools: screen_entity, list_active_flags, get_ofac_matches, get_sam_exclusions
  • Dataset tools: One tool per major dataset — search_sec_filings, get_fda_warning_letters, get_epa_enforcement, get_osha_citations, etc.
  • Temporal tools: get_today_regulatory_summary, get_enforcement_trends, compare_entity_periods
# Example MCP tool call (JSON-RPC)
{
  "jsonrpc": "2.0",
  "id": 1,
  "method": "tools/call",
  "params": {
    "name": "screen_entity",
    "arguments": {
      "identifier": "XOM",
      "identifier_type": "ticker",
      "include_details": true
    }
  }
}

# Example: Claude Code agent workflow
# .claude/mcp.json
{
  "mcpServers": {
    "federal-regulatory": {
      "command": "npx",
      "args": ["mcp-remote", "https://api.ai-analytics.org/mcp"]
    }
  }
}

JSON-LD: structured data for the entity timeline

The /entity/:id.jsonld endpoint returns a full Schema.org Organization graph with all regulatory events represented as Schema.org actions and actions:

{
  "@context": "https://schema.org",
  "@type": "Organization",
  "name": "Exxon Mobil Corporation",
  "ticker": "XOM",
  "legalName": "Exxon Mobil Corporation",
  "subjectOf": [
    {
      "@type": "GovernmentAction",
      "name": "EPA enforcement case — Clean Air Act",
      "agent": {
        "@type": "GovernmentOrganization",
        "name": "EPA"
      },
      "startDate": "2025-09-22",
      "object": { "@type": "Organization", "name": "Exxon Mobil Corporation" }
    },
    ...
  ]
}

The /today.md and /llms.txt surfaces

Two surfaces are generated daily by a scheduled Worker and stored in Cloudflare KV:

  • api.ai-analytics.org/today.md — A Markdown summary of what happened in federal regulatory today: new OFAC additions, SEC enforcement actions filed, FDA recalls published, OSHA citations issued, EPA cases opened. Updated daily at 07:00 UTC from the day's ingest. Designed for LLM agent consumption — plain text, dense with named entities and dates, no fluff.
  • api.ai-analytics.org/llms.txt — Per the llmstxt.org spec, an API-level directory for LLM agents: dataset descriptions, endpoint documentation, and example queries. Distinct from the site-level ai-analytics.org/llms.txt — this one describes the API rather than the site.

Rate limits and caching

The API has no authentication requirement, so abuse protection is purely IP-based:

  • Entity and screen endpoints: 120 requests/minute per IP
  • Search endpoint: 60 requests/minute per IP (FTS5 queries are heavier)
  • Today.md and coverage: no rate limit (static KV reads)

Entity timeline responses are cached at the Cloudflare edge with a 6-hour TTL (refreshed on the next daily ingest). Compliance scores are cached for 1 hour. Search results are not cached (queries are too varied; D1 handles them directly).


For the D1 database architecture powering this API: Building the Federal Regulatory Data Hub on Cloudflare D1: 35M records at the edge →

For the cron ingest pipeline that keeps the data behind this API current: Federal dataset ingest: keeping 197 federal datasets fresh at the edge →

For how the cross-agency entity bridge resolves identifiers across 197 datasets: Building the cross-agency regulatory entity graph: 35M records, one join →

For how the compliance risk score is computed across 30+ enforcement lists: Compliance screening across 30+ federal enforcement lists: how the risk score works →

For how the Worker query layer routes requests across the 8 D1 shards serving this API: The Federal Regulatory Data Hub query layer: routing 35M records at the Cloudflare edge →

For the MCP server layer that exposes these endpoints as 38+ tools for Claude and GPT agent workflows: Building a regulatory data MCP server: 38 tools for screening, entity lookup, and sanctions intelligence →