Technical writing
The Federal Regulatory Data Hub MCP server: 38+ tools for AI agent workflows
The Federal Regulatory Data Hub exposes its 197 federal datasets through three access surfaces: a REST API for developers who know what they want, downloadable snapshots for offline analysis, and an MCP server for AI agent workflows. The MCP server is the surface designed for agents that don't know in advance what they need — Claude or GPT can call search_entity('Huawei Technologies') and the server handles query routing across all 197 datasets, entity resolution across six identifier namespaces, and response formatting without the agent needing to know the schema.
The server at api.ai-analytics.org/mcp exposes 38+ tools over JSON-RPC / Streamable HTTP, compatible with Claude's tool use API and OpenAI's function calling. The dataset is CC0 public domain; no authentication is required.
Why MCP over REST for AI agents
A REST API requires the agent to know the URL structure, query parameter names, and how to interpret the response schema. That knowledge has to live in the system prompt or be hardcoded into agent scaffolding. When the API changes, the prompt has to change too.
MCP tools carry their semantics with them. Each tool has a JSON Schema description that the AI model reads at tool-selection time — what query types the tool accepts, what it returns, when to prefer it over other tools. The model decides which tool to call and with which arguments based on the user's intent, not pre-programmed routing logic. Adding a new tool to the MCP server makes it immediately available to any connected agent without a prompt update.
The practical difference: a REST-based agent answering "What regulatory issues does this vendor have?" has to be told to call GET /screen/:id then GET /entity/:id/timeline. An MCP-connected Claude agent reads the available tool descriptions, recognizes that screen_entity returns risk scores and enforcement matches, and chains to get_enforcement_history on its own — no additional prompting needed.
Server architecture
The MCP server is a Cloudflare Worker built on the @modelcontextprotocol/sdk package. It shares the same D1 database cluster as the REST API — eight shards covering 35M+ records — and the same cross-agency entity bridge.
import { McpServer } from '@modelcontextprotocol/sdk/server/mcp.js';
import { z } from 'zod';
const server = new McpServer({
name: 'federal-regulatory-data-hub',
version: '1.0.0',
});Each tool is registered with server.tool(name, description, schema, handler). The Worker handles both transport modes through the same entry point — HTTP/SSE for browser-based agents and Claude.ai direct integration, and stdio for local Claude Code / Claude Desktop usage.
Tool categories
Entity tools (8 tools)
These tools operate on the canonical entity layer — the cross-agency bridge that maps six identifier namespaces (CIK, UEI, LEI, DUNS, NPI, ticker) into a single record.
search_entity— free-text or identifier search across the entity bridgeget_entity— fetch the full EntityMasterRecord for a resolved entityget_entity_aliases— all former names and DBAs linked to an entityget_entity_source_ids— all dataset-level identifiers for an entity (CIK, UEI, LEI, etc.)list_entity_merges— audit log of entity consolidation decisionsget_entity_timeline— all regulatory events for an entity, descending dateget_entity_confidence— entity resolution confidence scores per source datasetresolve_entity_id— convert any supported identifier to the canonical entity ID
Screening tools (6 tools)
screen_entity— fan-out screen across 30+ enforcement lists, returns 0-100 risk scorescreen_batch— screen up to 50 entities in a single callget_risk_score— fetch a cached compliance score with flag breakdownget_enforcement_history— all enforcement actions for an entity, paginatedcheck_sanctions_lists— check OFAC SDN, non-SDN, FinCEN, and BIS Entity List onlyget_debarment_status— check SAM.gov exclusions, OIG exclusions, and CMS exclusions
Dataset-specific tools (15 tools)
One tool per major dataset for cases where an agent needs raw dataset access rather than the unified entity view.
get_ofac_entries,get_sam_exclusions,get_sec_filingsget_fda_warnings,get_epa_violations,get_fdic_actionsget_cms_exclusions,get_doj_settlements,get_cisa_alertsget_nist_vulnerabilities,get_osha_violations,get_msha_citationsget_nhtsa_recalls,get_usaspending_contracts,get_irs_exempt_orgs
Compliance workflow tools (5 tools)
get_compliance_report— generates a PDF-quality structured report for an entityget_counterparty_risks— risk summary for a list of vendor or partner entitiesget_due_diligence_package— full due diligence bundle: entity, timeline, score, enforcementmonitor_entity— register an entity for change alerts (requires API key for persistence)get_change_alerts— fetch pending alerts for monitored entities
Search and discovery tools (4 tools)
full_text_search— FTS5 search across all free-text dataset fieldssearch_by_category— filter enforcement actions by agency, type, and date rangeget_recent_enforcement_actions— all enforcement actions from the past N daysget_weekly_digest— pre-computed weekly summary of regulatory activity
A representative tool implementation
screen_entity is the most commonly called tool in agent compliance workflows. Its Zod schema doubles as documentation — the description and .describe() annotations on each field are what Claude reads when deciding whether to call this tool and with which arguments.
server.tool(
'screen_entity',
'Screen an entity name or identifier against all 30+ federal enforcement lists. ' +
'Returns a 0-100 risk score, matched entries across lists, and confidence-weighted ' +
'explanations.',
{
query: z.string().describe(
'Entity name, CIK, UEI, LEI, EIN, or NPI'
),
include_aliases: z.boolean().optional().default(true)
.describe(
'Whether to expand the search to include known aliases and former names'
),
min_confidence: z.number().min(0).max(1).optional().default(0.7)
.describe(
'Minimum entity_confidence threshold for matches (0.0-1.0)'
),
lists: z.array(z.string()).optional()
.describe(
'Specific lists to check (e.g. ["ofac_sdn", "sam_debarment"]). ' +
'Omit to check all lists.'
),
},
async ({ query, include_aliases, min_confidence, lists }) => {
const result = await screenEntity(query, {
include_aliases,
min_confidence,
lists,
});
return { content: [{ type: 'text', text: JSON.stringify(result) }] };
}
);Cross-agency entity resolution inside tools
When an agent calls get_entity('Huawei Technologies'), the MCP server runs the three-pass entity resolution pipeline before touching any dataset table:
- Exact ID match — check the identifier index for CIK, UEI, LEI, DUNS, NPI, or ticker
- Alias table lookup — check all former names and DBAs against the alias table
- Jaro-Winkler fuzzy match — scored name similarity with a 0.85 threshold
The result is a unified EntityMasterRecord that includes the entity's canonical ID, all known identifiers across six namespaces, and the matched source IDs for every dataset in the hub. The agent gets one structured response instead of having to separately query SEC EDGAR (CIK lookup), OFAC (name search), SAM.gov (UEI lookup), and FDA (establishment registration search).
// EntityMasterRecord returned by get_entity
{
"canonical_id": "huawei-technologies-co-ltd",
"name": "Huawei Technologies Co., Ltd.",
"confidence": 0.97,
"source_ids": {
"bis_entity_list": "HW-2019-0001",
"ofac_sdn": null,
"sam_exclusions": null,
"usaspending": "HWTECH-DUNS-123456",
"sec_edgar": null
},
"aliases": [
"Huawei Device Co., Ltd.",
"HiSilicon Technologies Co., Ltd."
],
"identifiers": {
"duns": "123456789",
"lei": null,
"uei": null
},
"active_flags": ["bis_entity_list"],
"compliance_score": 71,
"last_updated": "2026-02-04T06:00:00Z"
}Tool descriptions as prompt engineering
The description field of each tool is consumed by the model's tool selection mechanism at inference time. A vague description like "searches for entities" gives the model no basis for choosing between tools. Tool descriptions in this server are written to be actionable: they specify what the tool returns, what query types it accepts, and when to prefer it over similar tools.
For example, screen_entity's description reads: "Screen an entity name or identifier against all 30+ federal enforcement lists. Returns a 0-100 risk score, matched entries across lists, and confidence-weighted explanations."The get_ofac_entries description reads: "Fetch raw OFAC SDN and non-SDN entries matching a name or identifier. Use this when you need the raw OFAC data rather than a cross-list risk score."
When a user asks "Is this vendor on any sanctions lists?", Claude picks screen_entity — the description “all 30+ federal enforcement lists” matches the intent better than get_ofac_entries's narrower scope. When a user asks "Show me the exact OFAC SDN entry for this company", Claude picks get_ofac_entries because the user is asking for raw data, not a cross-list risk assessment.
MCP transport options
The server supports two transports handled by the same Cloudflare Worker entry point:
- HTTP/SSE at api.ai-analytics.org/mcp — for Claude.ai direct integration and browser-based agents. Requests are
POSTwith JSON-RPC 2.0 bodies; large result sets stream over Server-Sent Events. This is the transport used when you add the server to Claude.ai's integration settings. - stdio for local Claude Code / Claude Desktop — via
npx mcp-remote api.ai-analytics.org/mcp, which bridges the HTTP transport to the stdio protocol the local clients expect. No server binary to install; the bridge handles protocol translation.
The Worker dispatches based on the Accept header and request method, passing the appropriate transport object to McpServer.transport(). Both transports share the same tool registry and D1 bindings.
Rate limits and tool-level quotas
Each tool has its own rate limit calibrated to its computational cost. Heavy tools that fan out across multiple D1 shards have lower limits than single-lookup tools.
| Tool | Rate limit | Reason |
|---|---|---|
screen_entity | 100/min | Cross-agency fan-out, 8 D1 queries |
full_text_search | 30/min | FTS5 + fuzzy ranking |
get_compliance_report | 10/min | Generates full PDF-quality report |
get_entity | 500/min | Single D1 lookup |
All anonymous requests — identified by the absence of an API key header — share a global 1,000 tool calls per hour limit across all tools. The limit resets on a rolling window. Agent workflows that need higher throughput can request a free API key that raises the limit to 10,000 calls per hour with per-tool quotas tracked separately.
A worked agent workflow
Consider a Claude agent answering: “What regulatory issues does this vendor have?” The agent has the vendor's name as a string — nothing more. Here is the tool call sequence it produces:
// Step 1: find the entity and resolve identifiers
search_entity({ query: "Acme Logistics LLC" })
// Returns: canonical_id, name, confidence, known identifiers
// Step 2: screen against all enforcement lists
screen_entity({
query: "acme-logistics-llc", // canonical_id from step 1
include_aliases: true,
min_confidence: 0.7,
})
// Returns: risk_score: 42, flags: [{ list: "sam_exclusions", ... }]
// Step 3: get the full enforcement history
get_enforcement_history({
entity_id: "acme-logistics-llc",
limit: 20,
})
// Returns: 3 SAM exclusion records, 1 OSHA citation, 1 EPA notice of violationThe agent assembles the three responses — entity metadata, risk score with flag breakdown, and ordered enforcement history — into a structured summary. No prompt engineering was needed to produce this chain; the tool descriptions alone were sufficient for Claude to determine the right sequence.
Claude Desktop integration
To run regulatory compliance queries directly in Claude Desktop, add the following entry to claude_desktop_config.json (typically at ~/Library/Application Support/Claude/claude_desktop_config.jsonon macOS):
{
"mcpServers": {
"federal-regulatory": {
"command": "npx",
"args": [
"mcp-remote",
"https://api.ai-analytics.org/mcp"
]
}
}
}Restart Claude Desktop after saving. All 38+ tools become available in any conversation. Claude will automatically invoke them when the user's question involves compliance screening, sanctions checks, federal enforcement history, or entity due diligence — no slash command or explicit invocation needed.
For Claude Code, add an equivalent .claude/mcp.json file in your project root (or ~/.claude/mcp.json for global availability). The same mcp-remote bridge works for both clients.
For the REST API design that this MCP server wraps: The Federal Regulatory Data Hub REST API: no-auth CC0 endpoints, cross-agency entity resolution, and Cloudflare edge caching →
For how the entity bridge powers cross-agency queries inside each tool: Building the cross-agency regulatory entity graph: 35M records, one join →
For how the compliance risk score is computed when screen_entity returns a 0-100 score: Compliance screening across 30+ federal enforcement lists: how the risk score works →
For how entity subscriptions let AI agents monitor specific entities for regulatory changes: Entity subscriptions in the Federal Regulatory Data Hub: per-entity change monitoring across 30+ enforcement lists →