Technical writing

The Federal Regulatory Data Hub MCP server: 38+ tools for AI agent workflows

February 5, 2026· 10 min read· AI Analytics

RegulatoryMCPInfrastructureAI

The Federal Regulatory Data Hub exposes its 208 federal datasets through three access surfaces: a REST API for developers who know what they want, downloadable snapshots for offline analysis, and an MCP server for AI agent workflows. The MCP server is the surface designed for agents that don't know in advance what they need — Claude or GPT can call search_entity('Huawei Technologies') and the server handles query routing across all 208 datasets, entity resolution across six identifier namespaces, and response formatting without the agent needing to know the schema.

The server at api.ai-analytics.org/mcp exposes 38+ tools over JSON-RPC / Streamable HTTP, compatible with Claude's tool use API and OpenAI's function calling. The dataset is CC0 public domain; no authentication is required.

Why MCP over REST for AI agents

A REST API requires the agent to know the URL structure, query parameter names, and how to interpret the response schema. That knowledge has to live in the system prompt or be hardcoded into agent scaffolding. When the API changes, the prompt has to change too.

MCP tools carry their semantics with them. Each tool has a JSON Schema description that the AI model reads at tool-selection time — what query types the tool accepts, what it returns, when to prefer it over other tools. The model decides which tool to call and with which arguments based on the user's intent, not pre-programmed routing logic. Adding a new tool to the MCP server makes it immediately available to any connected agent without a prompt update.

The practical difference: a REST-based agent answering "What regulatory issues does this vendor have?" has to be told to call GET /screen/:id then GET /entity/:id/timeline. An MCP-connected Claude agent reads the available tool descriptions, recognizes that screen_entity returns risk scores and enforcement matches, and chains to get_enforcement_history on its own — no additional prompting needed.

Server architecture

The MCP server is a Cloudflare Worker built on the @modelcontextprotocol/sdk package. It shares the same D1 database cluster as the REST API — eight shards covering 50M+ records — and the same cross-agency entity bridge.

import { McpServer } from '@modelcontextprotocol/sdk/server/mcp.js';
import { z } from 'zod';

const server = new McpServer({
  name: 'federal-regulatory-data-hub',
  version: '1.0.0',
});

Each tool is registered with server.tool(name, description, schema, handler). The Worker handles both transport modes through the same entry point — HTTP/SSE for browser-based agents and Claude.ai direct integration, and stdio for local Claude Code / Claude Desktop usage.

Tool categories

Entity tools (8 tools)

These tools operate on the canonical entity layer — the cross-agency bridge that maps six identifier namespaces (CIK, UEI, LEI, DUNS, NPI, ticker) into a single record.

search_entity — free-text or identifier search across the entity bridge
get_entity — fetch the full EntityMasterRecord for a resolved entity
get_entity_aliases — all former names and DBAs linked to an entity
get_entity_source_ids — all dataset-level identifiers for an entity (CIK, UEI, LEI, etc.)
list_entity_merges — audit log of entity consolidation decisions
get_entity_timeline — all regulatory events for an entity, descending date
get_entity_confidence — entity resolution confidence scores per source dataset
resolve_entity_id — convert any supported identifier to the canonical entity ID

Screening tools (6 tools)

screen_entity — fan-out screen across 30+ enforcement lists, returns 0-100 risk score
screen_batch — screen up to 50 entities in a single call
get_risk_score — fetch a cached compliance score with flag breakdown
get_enforcement_history — all enforcement actions for an entity, paginated
check_sanctions_lists — check OFAC SDN, non-SDN, FinCEN, and BIS Entity List only
get_debarment_status — check SAM.gov exclusions, OIG exclusions, and CMS exclusions

Dataset-specific tools (15 tools)

One tool per major dataset for cases where an agent needs raw dataset access rather than the unified entity view.

get_ofac_entries, get_sam_exclusions, get_sec_filings
get_fda_warnings, get_epa_violations, get_fdic_actions
get_cms_exclusions, get_doj_settlements, get_cisa_alerts
get_nist_vulnerabilities, get_osha_violations, get_msha_citations
get_nhtsa_recalls, get_usaspending_contracts, get_irs_exempt_orgs

Compliance workflow tools (5 tools)

get_compliance_report — generates a PDF-quality structured report for an entity
get_counterparty_risks — risk summary for a list of vendor or partner entities
get_due_diligence_package — full due diligence bundle: entity, timeline, score, enforcement
monitor_entity — register an entity for change alerts (requires API key for persistence)
get_change_alerts — fetch pending alerts for monitored entities

Search and discovery tools (4 tools)

full_text_search — FTS5 search across all free-text dataset fields
search_by_category — filter enforcement actions by agency, type, and date range
get_recent_enforcement_actions — all enforcement actions from the past N days
get_weekly_digest — pre-computed weekly summary of regulatory activity

A representative tool implementation

screen_entity is the most commonly called tool in agent compliance workflows. Its Zod schema doubles as documentation — the description and .describe() annotations on each field are what Claude reads when deciding whether to call this tool and with which arguments.

server.tool(
  'screen_entity',
  'Screen an entity name or identifier against all 30+ federal enforcement lists. ' +
  'Returns a 0-100 risk score, matched entries across lists, and confidence-weighted ' +
  'explanations.',
  {
    query: z.string().describe(
      'Entity name, CIK, UEI, LEI, EIN, or NPI'
    ),
    include_aliases: z.boolean().optional().default(true)
      .describe(
        'Whether to expand the search to include known aliases and former names'
      ),
    min_confidence: z.number().min(0).max(1).optional().default(0.7)
      .describe(
        'Minimum entity_confidence threshold for matches (0.0-1.0)'
      ),
    lists: z.array(z.string()).optional()
      .describe(
        'Specific lists to check (e.g. ["ofac_sdn", "sam_debarment"]). ' +
        'Omit to check all lists.'
      ),
  },
  async ({ query, include_aliases, min_confidence, lists }) => {
    const result = await screenEntity(query, {
      include_aliases,
      min_confidence,
      lists,
    });
    return { content: [{ type: 'text', text: JSON.stringify(result) }] };
  }
);

Cross-agency entity resolution inside tools

When an agent calls get_entity('Huawei Technologies'), the MCP server runs the three-pass entity resolution pipeline before touching any dataset table:

Exact ID match — check the identifier index for CIK, UEI, LEI, DUNS, NPI, or ticker
Alias table lookup — check all former names and DBAs against the alias table
Jaro-Winkler fuzzy match — scored name similarity with a 0.85 threshold

The result is a unified EntityMasterRecord that includes the entity's canonical ID, all known identifiers across six namespaces, and the matched source IDs for every dataset in the hub. The agent gets one structured response instead of having to separately query SEC EDGAR (CIK lookup), OFAC (name search), SAM.gov (UEI lookup), and FDA (establishment registration search).

// EntityMasterRecord returned by get_entity
{
  "canonical_id": "huawei-technologies-co-ltd",
  "name": "Huawei Technologies Co., Ltd.",
  "confidence": 0.97,
  "source_ids": {
    "bis_entity_list": "HW-2019-0001",
    "ofac_sdn": null,
    "sam_exclusions": null,
    "usaspending": "HWTECH-DUNS-123456",
    "sec_edgar": null
  },
  "aliases": [
    "Huawei Device Co., Ltd.",
    "HiSilicon Technologies Co., Ltd."
  ],
  "identifiers": {
    "duns": "123456789",
    "lei": null,
    "uei": null
  },
  "active_flags": ["bis_entity_list"],
  "compliance_score": 71,
  "last_updated": "2026-02-04T06:00:00Z"
}

Tool descriptions as prompt engineering

The description field of each tool is consumed by the model's tool selection mechanism at inference time. A vague description like "searches for entities" gives the model no basis for choosing between tools. Tool descriptions in this server are written to be actionable: they specify what the tool returns, what query types it accepts, and when to prefer it over similar tools.

For example, screen_entity's description reads: "Screen an entity name or identifier against all 30+ federal enforcement lists. Returns a 0-100 risk score, matched entries across lists, and confidence-weighted explanations."The get_ofac_entries description reads: "Fetch raw OFAC SDN and non-SDN entries matching a name or identifier. Use this when you need the raw OFAC data rather than a cross-list risk score."

When a user asks "Is this vendor on any sanctions lists?", Claude picks screen_entity — the description “all 30+ federal enforcement lists” matches the intent better than get_ofac_entries's narrower scope. When a user asks "Show me the exact OFAC SDN entry for this company", Claude picks get_ofac_entries because the user is asking for raw data, not a cross-list risk assessment.

MCP transport options

The server supports two transports handled by the same Cloudflare Worker entry point:

HTTP/SSE at api.ai-analytics.org/mcp — for Claude.ai direct integration and browser-based agents. Requests are POST with JSON-RPC 2.0 bodies; large result sets stream over Server-Sent Events. This is the transport used when you add the server to Claude.ai's integration settings.
stdio for local Claude Code / Claude Desktop — via npx mcp-remote api.ai-analytics.org/mcp, which bridges the HTTP transport to the stdio protocol the local clients expect. No server binary to install; the bridge handles protocol translation.

The Worker dispatches based on the Accept header and request method, passing the appropriate transport object to McpServer.transport(). Both transports share the same tool registry and D1 bindings.

Rate limits and tool-level quotas

Each tool has its own rate limit calibrated to its computational cost. Heavy tools that fan out across multiple D1 shards have lower limits than single-lookup tools.

Tool	Rate limit	Reason
`screen_entity`	100/min	Cross-agency fan-out, 8 D1 queries
`full_text_search`	30/min	FTS5 + fuzzy ranking
`get_compliance_report`	10/min	Generates full PDF-quality report
`get_entity`	500/min	Single D1 lookup

All anonymous requests — identified by the absence of an API key header — share a global 1,000 tool calls per hour limit across all tools. The limit resets on a rolling window. Agent workflows that need higher throughput can request a free API key that raises the limit to 10,000 calls per hour with per-tool quotas tracked separately.

A worked agent workflow

Consider a Claude agent answering: “What regulatory issues does this vendor have?” The agent has the vendor's name as a string — nothing more. Here is the tool call sequence it produces:

// Step 1: find the entity and resolve identifiers
search_entity({ query: "Acme Logistics LLC" })
// Returns: canonical_id, name, confidence, known identifiers

// Step 2: screen against all enforcement lists
screen_entity({
  query: "acme-logistics-llc",  // canonical_id from step 1
  include_aliases: true,
  min_confidence: 0.7,
})
// Returns: risk_score: 42, flags: [{ list: "sam_exclusions", ... }]

// Step 3: get the full enforcement history
get_enforcement_history({
  entity_id: "acme-logistics-llc",
  limit: 20,
})
// Returns: 3 SAM exclusion records, 1 OSHA citation, 1 EPA notice of violation

The agent assembles the three responses — entity metadata, risk score with flag breakdown, and ordered enforcement history — into a structured summary. No prompt engineering was needed to produce this chain; the tool descriptions alone were sufficient for Claude to determine the right sequence.

Claude Desktop integration

To run regulatory compliance queries directly in Claude Desktop, add the following entry to claude_desktop_config.json (typically at ~/Library/Application Support/Claude/claude_desktop_config.jsonon macOS):

{
  "mcpServers": {
    "federal-regulatory": {
      "command": "npx",
      "args": [
        "mcp-remote",
        "https://api.ai-analytics.org/mcp"
      ]
    }
  }
}

Restart Claude Desktop after saving. All 38+ tools become available in any conversation. Claude will automatically invoke them when the user's question involves compliance screening, sanctions checks, federal enforcement history, or entity due diligence — no slash command or explicit invocation needed.

For Claude Code, add an equivalent .claude/mcp.json file in your project root (or ~/.claude/mcp.json for global availability). The same mcp-remote bridge works for both clients.

For the REST API design that this MCP server wraps: The Federal Regulatory Data Hub REST API: no-auth CC0 endpoints, cross-agency entity resolution, and Cloudflare edge caching →

For how the entity bridge powers cross-agency queries inside each tool: Building the cross-agency regulatory entity graph: 50M+ records, one join →

For how the compliance risk score is computed when screen_entity returns a 0-100 score: Compliance screening across 30+ federal enforcement lists: how the risk score works →

For how entity subscriptions let AI agents monitor specific entities for regulatory changes: Entity subscriptions in the Federal Regulatory Data Hub: per-entity change monitoring across 30+ enforcement lists →