Technical writing

The Federal Regulatory Data Hub MCP server: 38+ tools for AI agent workflows

· 10 min read· AI Analytics
RegulatoryMCPInfrastructureAI

The Federal Regulatory Data Hub exposes its 197 federal datasets through three access surfaces: a REST API for developers who know what they want, downloadable snapshots for offline analysis, and an MCP server for AI agent workflows. The MCP server is the surface designed for agents that don't know in advance what they need — Claude or GPT can call search_entity('Huawei Technologies') and the server handles query routing across all 197 datasets, entity resolution across six identifier namespaces, and response formatting without the agent needing to know the schema.

The server at api.ai-analytics.org/mcp exposes 38+ tools over JSON-RPC / Streamable HTTP, compatible with Claude's tool use API and OpenAI's function calling. The dataset is CC0 public domain; no authentication is required.

Why MCP over REST for AI agents

A REST API requires the agent to know the URL structure, query parameter names, and how to interpret the response schema. That knowledge has to live in the system prompt or be hardcoded into agent scaffolding. When the API changes, the prompt has to change too.

MCP tools carry their semantics with them. Each tool has a JSON Schema description that the AI model reads at tool-selection time — what query types the tool accepts, what it returns, when to prefer it over other tools. The model decides which tool to call and with which arguments based on the user's intent, not pre-programmed routing logic. Adding a new tool to the MCP server makes it immediately available to any connected agent without a prompt update.

The practical difference: a REST-based agent answering "What regulatory issues does this vendor have?" has to be told to call GET /screen/:id then GET /entity/:id/timeline. An MCP-connected Claude agent reads the available tool descriptions, recognizes that screen_entity returns risk scores and enforcement matches, and chains to get_enforcement_history on its own — no additional prompting needed.

Server architecture

The MCP server is a Cloudflare Worker built on the @modelcontextprotocol/sdk package. It shares the same D1 database cluster as the REST API — eight shards covering 35M+ records — and the same cross-agency entity bridge.

import { McpServer } from '@modelcontextprotocol/sdk/server/mcp.js';
import { z } from 'zod';

const server = new McpServer({
  name: 'federal-regulatory-data-hub',
  version: '1.0.0',
});

Each tool is registered with server.tool(name, description, schema, handler). The Worker handles both transport modes through the same entry point — HTTP/SSE for browser-based agents and Claude.ai direct integration, and stdio for local Claude Code / Claude Desktop usage.

Tool categories

Entity tools (8 tools)

These tools operate on the canonical entity layer — the cross-agency bridge that maps six identifier namespaces (CIK, UEI, LEI, DUNS, NPI, ticker) into a single record.

  • search_entity — free-text or identifier search across the entity bridge
  • get_entity — fetch the full EntityMasterRecord for a resolved entity
  • get_entity_aliases — all former names and DBAs linked to an entity
  • get_entity_source_ids — all dataset-level identifiers for an entity (CIK, UEI, LEI, etc.)
  • list_entity_merges — audit log of entity consolidation decisions
  • get_entity_timeline — all regulatory events for an entity, descending date
  • get_entity_confidence — entity resolution confidence scores per source dataset
  • resolve_entity_id — convert any supported identifier to the canonical entity ID

Screening tools (6 tools)

  • screen_entity — fan-out screen across 30+ enforcement lists, returns 0-100 risk score
  • screen_batch — screen up to 50 entities in a single call
  • get_risk_score — fetch a cached compliance score with flag breakdown
  • get_enforcement_history — all enforcement actions for an entity, paginated
  • check_sanctions_lists — check OFAC SDN, non-SDN, FinCEN, and BIS Entity List only
  • get_debarment_status — check SAM.gov exclusions, OIG exclusions, and CMS exclusions

Dataset-specific tools (15 tools)

One tool per major dataset for cases where an agent needs raw dataset access rather than the unified entity view.

  • get_ofac_entries, get_sam_exclusions, get_sec_filings
  • get_fda_warnings, get_epa_violations, get_fdic_actions
  • get_cms_exclusions, get_doj_settlements, get_cisa_alerts
  • get_nist_vulnerabilities, get_osha_violations, get_msha_citations
  • get_nhtsa_recalls, get_usaspending_contracts, get_irs_exempt_orgs

Compliance workflow tools (5 tools)

  • get_compliance_report — generates a PDF-quality structured report for an entity
  • get_counterparty_risks — risk summary for a list of vendor or partner entities
  • get_due_diligence_package — full due diligence bundle: entity, timeline, score, enforcement
  • monitor_entity — register an entity for change alerts (requires API key for persistence)
  • get_change_alerts — fetch pending alerts for monitored entities

Search and discovery tools (4 tools)

  • full_text_search — FTS5 search across all free-text dataset fields
  • search_by_category — filter enforcement actions by agency, type, and date range
  • get_recent_enforcement_actions — all enforcement actions from the past N days
  • get_weekly_digest — pre-computed weekly summary of regulatory activity

A representative tool implementation

screen_entity is the most commonly called tool in agent compliance workflows. Its Zod schema doubles as documentation — the description and .describe() annotations on each field are what Claude reads when deciding whether to call this tool and with which arguments.

server.tool(
  'screen_entity',
  'Screen an entity name or identifier against all 30+ federal enforcement lists. ' +
  'Returns a 0-100 risk score, matched entries across lists, and confidence-weighted ' +
  'explanations.',
  {
    query: z.string().describe(
      'Entity name, CIK, UEI, LEI, EIN, or NPI'
    ),
    include_aliases: z.boolean().optional().default(true)
      .describe(
        'Whether to expand the search to include known aliases and former names'
      ),
    min_confidence: z.number().min(0).max(1).optional().default(0.7)
      .describe(
        'Minimum entity_confidence threshold for matches (0.0-1.0)'
      ),
    lists: z.array(z.string()).optional()
      .describe(
        'Specific lists to check (e.g. ["ofac_sdn", "sam_debarment"]). ' +
        'Omit to check all lists.'
      ),
  },
  async ({ query, include_aliases, min_confidence, lists }) => {
    const result = await screenEntity(query, {
      include_aliases,
      min_confidence,
      lists,
    });
    return { content: [{ type: 'text', text: JSON.stringify(result) }] };
  }
);

Cross-agency entity resolution inside tools

When an agent calls get_entity('Huawei Technologies'), the MCP server runs the three-pass entity resolution pipeline before touching any dataset table:

  1. Exact ID match — check the identifier index for CIK, UEI, LEI, DUNS, NPI, or ticker
  2. Alias table lookup — check all former names and DBAs against the alias table
  3. Jaro-Winkler fuzzy match — scored name similarity with a 0.85 threshold

The result is a unified EntityMasterRecord that includes the entity's canonical ID, all known identifiers across six namespaces, and the matched source IDs for every dataset in the hub. The agent gets one structured response instead of having to separately query SEC EDGAR (CIK lookup), OFAC (name search), SAM.gov (UEI lookup), and FDA (establishment registration search).

// EntityMasterRecord returned by get_entity
{
  "canonical_id": "huawei-technologies-co-ltd",
  "name": "Huawei Technologies Co., Ltd.",
  "confidence": 0.97,
  "source_ids": {
    "bis_entity_list": "HW-2019-0001",
    "ofac_sdn": null,
    "sam_exclusions": null,
    "usaspending": "HWTECH-DUNS-123456",
    "sec_edgar": null
  },
  "aliases": [
    "Huawei Device Co., Ltd.",
    "HiSilicon Technologies Co., Ltd."
  ],
  "identifiers": {
    "duns": "123456789",
    "lei": null,
    "uei": null
  },
  "active_flags": ["bis_entity_list"],
  "compliance_score": 71,
  "last_updated": "2026-02-04T06:00:00Z"
}

Tool descriptions as prompt engineering

The description field of each tool is consumed by the model's tool selection mechanism at inference time. A vague description like "searches for entities" gives the model no basis for choosing between tools. Tool descriptions in this server are written to be actionable: they specify what the tool returns, what query types it accepts, and when to prefer it over similar tools.

For example, screen_entity's description reads: "Screen an entity name or identifier against all 30+ federal enforcement lists. Returns a 0-100 risk score, matched entries across lists, and confidence-weighted explanations."The get_ofac_entries description reads: "Fetch raw OFAC SDN and non-SDN entries matching a name or identifier. Use this when you need the raw OFAC data rather than a cross-list risk score."

When a user asks "Is this vendor on any sanctions lists?", Claude picks screen_entity — the description “all 30+ federal enforcement lists” matches the intent better than get_ofac_entries's narrower scope. When a user asks "Show me the exact OFAC SDN entry for this company", Claude picks get_ofac_entries because the user is asking for raw data, not a cross-list risk assessment.

MCP transport options

The server supports two transports handled by the same Cloudflare Worker entry point:

  • HTTP/SSE at api.ai-analytics.org/mcp — for Claude.ai direct integration and browser-based agents. Requests are POST with JSON-RPC 2.0 bodies; large result sets stream over Server-Sent Events. This is the transport used when you add the server to Claude.ai's integration settings.
  • stdio for local Claude Code / Claude Desktop — via npx mcp-remote api.ai-analytics.org/mcp, which bridges the HTTP transport to the stdio protocol the local clients expect. No server binary to install; the bridge handles protocol translation.

The Worker dispatches based on the Accept header and request method, passing the appropriate transport object to McpServer.transport(). Both transports share the same tool registry and D1 bindings.

Rate limits and tool-level quotas

Each tool has its own rate limit calibrated to its computational cost. Heavy tools that fan out across multiple D1 shards have lower limits than single-lookup tools.

ToolRate limitReason
screen_entity100/minCross-agency fan-out, 8 D1 queries
full_text_search30/minFTS5 + fuzzy ranking
get_compliance_report10/minGenerates full PDF-quality report
get_entity500/minSingle D1 lookup

All anonymous requests — identified by the absence of an API key header — share a global 1,000 tool calls per hour limit across all tools. The limit resets on a rolling window. Agent workflows that need higher throughput can request a free API key that raises the limit to 10,000 calls per hour with per-tool quotas tracked separately.

A worked agent workflow

Consider a Claude agent answering: “What regulatory issues does this vendor have?” The agent has the vendor's name as a string — nothing more. Here is the tool call sequence it produces:

// Step 1: find the entity and resolve identifiers
search_entity({ query: "Acme Logistics LLC" })
// Returns: canonical_id, name, confidence, known identifiers

// Step 2: screen against all enforcement lists
screen_entity({
  query: "acme-logistics-llc",  // canonical_id from step 1
  include_aliases: true,
  min_confidence: 0.7,
})
// Returns: risk_score: 42, flags: [{ list: "sam_exclusions", ... }]

// Step 3: get the full enforcement history
get_enforcement_history({
  entity_id: "acme-logistics-llc",
  limit: 20,
})
// Returns: 3 SAM exclusion records, 1 OSHA citation, 1 EPA notice of violation

The agent assembles the three responses — entity metadata, risk score with flag breakdown, and ordered enforcement history — into a structured summary. No prompt engineering was needed to produce this chain; the tool descriptions alone were sufficient for Claude to determine the right sequence.

Claude Desktop integration

To run regulatory compliance queries directly in Claude Desktop, add the following entry to claude_desktop_config.json (typically at ~/Library/Application Support/Claude/claude_desktop_config.jsonon macOS):

{
  "mcpServers": {
    "federal-regulatory": {
      "command": "npx",
      "args": [
        "mcp-remote",
        "https://api.ai-analytics.org/mcp"
      ]
    }
  }
}

Restart Claude Desktop after saving. All 38+ tools become available in any conversation. Claude will automatically invoke them when the user's question involves compliance screening, sanctions checks, federal enforcement history, or entity due diligence — no slash command or explicit invocation needed.

For Claude Code, add an equivalent .claude/mcp.json file in your project root (or ~/.claude/mcp.json for global availability). The same mcp-remote bridge works for both clients.


For the REST API design that this MCP server wraps: The Federal Regulatory Data Hub REST API: no-auth CC0 endpoints, cross-agency entity resolution, and Cloudflare edge caching →

For how the entity bridge powers cross-agency queries inside each tool: Building the cross-agency regulatory entity graph: 35M records, one join →

For how the compliance risk score is computed when screen_entity returns a 0-100 score: Compliance screening across 30+ federal enforcement lists: how the risk score works →

For how entity subscriptions let AI agents monitor specific entities for regulatory changes: Entity subscriptions in the Federal Regulatory Data Hub: per-entity change monitoring across 30+ enforcement lists →