REST API
Documented JSON endpoints. Auth optional for read; rate-limited.
voidly.ai/api-docsDeveloper landing · CC BY 4.0
Six years of measurement, 200 countries, 2.2B+ data points, cross-verified against OONI, CensoredPlanet, and IODA. Pick the access surface that fits your workflow.
Documented JSON endpoints. Auth optional for read; rate-limited.
voidly.ai/api-docsCSV bulk downloads. global-censorship-index + ooni-censorship-historical (1.66M+ downloads).
huggingface.co/emperor-mew83 tools for Claude, GPT, and agent frameworks to query the dataset in natural language.
voidly-ai/mcp-serverMap view, active blocking events, country drilldown, ML-powered alerts, 7-day forecast.
voidly.aiSame query — recent verified incidents in Iran since 2026-01-01 — in four ecosystems.
# Recent verified incidents in Iran
curl -s 'https://voidly.ai/api/v1/incidents?country=IR&since=2026-01-01' \
-H 'Accept: application/json' | jq '.[] | {date,domain,classification}'import httpx
r = httpx.get(
"https://voidly.ai/api/v1/incidents",
params={"country": "IR", "since": "2026-01-01"},
timeout=30,
)
for incident in r.json():
print(incident["date"], incident["domain"], incident["classification"])const url = new URL('https://voidly.ai/api/v1/incidents');
url.searchParams.set('country', 'IR');
url.searchParams.set('since', '2026-01-01');
const res = await fetch(url, { headers: { Accept: 'application/json' } });
const incidents = await res.json();
for (const i of incidents) console.log(i.date, i.domain, i.classification);library(httr2)
library(jsonlite)
resp <- request("https://voidly.ai/api/v1/incidents") |>
req_url_query(country = "IR", since = "2026-01-01") |>
req_perform()
incidents <- fromJSON(resp_body_string(resp))
print(head(incidents))[
{
"date": "2026-04-12T03:21:14Z",
"country": "IR",
"domain": "signal.org",
"classification": "TLS_INTERFERENCE",
"confidence": "verified",
"sources": ["voidly", "ooni"],
"asn": 58224,
"block_type": "sni_inspection",
"evidence_url": "https://voidly.ai/incidents/2026-04-12-IR-signal"
}
// ...
]How we processed 200M+ OONI measurements into the flat ML-ready CSV now downloaded 1.66M+ times on HuggingFace — probe version schema drift, test_keys normalization, and what we left out. 2024.
How the desktop probe that generates this dataset works: Tauri 2, Cloudflare boringtun WireGuard, tun-rs TUN device, X25519-Dalek on-device key generation, and the operator-safety constraints behind the design. 2025.
How the control_failure and anomaly_score fields are generated — three-node distributed control network (US-East, EU-West, AP-East), CDN split-horizon handling, block-page hash library of ~2,300 fingerprints, and the mapping from ControlComparison fields to classifier inputs. 2025.
How Voidly keeps 37+ probe nodes healthy: heartbeat state machine (DEGRADED → OFFLINE thresholds), measurement quality scoring, ASN coverage SLOs per country, flapping detection that caps the confidence_tier at CORROBORATED, and the classify_offline_cause() algorithm for distinguishing probe failure from ISP-level censorship events. 2025.
How the blockpage_match and blockpage_fp_id fields in the dataset are produced — four matching strategies (exact hash, structural normalization, SimHash, TLS certificate fingerprinting), the match pipeline, per-country library counts, and false positive mitigation for CDN error pages and captive portals. 2025.
The practical guide to bulk data access — Parquet partitioning by country and month, daily incremental append cadence, git-lfs versioning for reproducible research, filter recipes in Python/pandas/DuckDB/R, confidence tier guidance for different consumer types, and the citation format for CC BY 4.0 attribution. 2025.
How the ML classifier assigns DNS tampering, TLS interference, HTTP blocking, BGP withdrawal, and throttling labels to raw probe measurements — per-class binary models, country-specific calibration, and the confidence scores in the dataset. 2024.
How the classifier's training corpus is built from 200M+ OONI measurements: weak supervision label functions, 47-feature schema, time-based splits to prevent leakage, SMOTE for class imbalance, and per-country Platt calibration. 2024.
How the classifier runs as a live API in the real-time measurement pipeline — ONNX Runtime serving, per-country calibration at inference time, champion/challenger deployment, and the p50/p99 latency breakdown across all pipeline stages. 2025.
What the confidence_tier field in the dataset means — how a measurement moves from Anomaly to Corroborated to Verified Incident, and which tier is appropriate for your use case. 2025.
Complete field-by-field guide to the CC BY 4.0 dataset — probe identity, DNS/TCP/TLS/HTTP layers, control comparison deltas, ML classifier output (interference_type, prob_*, tier), cross-source corroboration scores, BGP signals, and derived aggregates. Includes filtering recipes for journalists, ML researchers, and infra teams. 2025.
How Voidly correlates three independent measurement projects — data format normalization, 4-hour sliding window alignment, and independence-weighted confidence scoring. 2025.
Architecture of the 7-day forecast: political calendar features, BGP telemetry, ARIMA + XGBoost ensemble, per-country calibration, and reliability scoring. 2025.
How the bgp_withdrawal and bgp_outage_score fields are generated — IODA prefix withdrawal detection, 90-day per-country baselines, and why BGP silence differs from BGP withdrawal. 2025.
What drives measurement cadence in the dataset — category-code priority tiers (NEWS/SMG probed every 5 minutes), anomaly-driven urgent injection (priority=10 for 30 minutes after detection), ±15% jitter for anti-detection, per-domain ASN distribution, and per-country task budgets that explain why CN and IR have higher measurement density. 2024.
Why events appear in the API within minutes of probe detection: inline scoring, parallel OONI/IODA corroboration, and the two-window alert-fatigue guard. Understanding this helps set the right polling cadence for real-time consumers. 2025.
How to query the full dataset from Claude Code, Claude Desktop, or any OpenAI function-calling agent — 83 tools across 6 categories (incident lookup, measurement queries, country coverage, domain test-list, BGP/network, dataset metadata), JSON-RPC over Streamable HTTP, and rate-limit reference. 2025.
Core endpoints, cursor-based pagination, filtering by country / confidence tier / interference type / date range, streaming NDJSON export, RFC 7807 error format, rate limits, and code samples in curl, Python, and TypeScript. Pair with the MCP server for agent workflows. 2025.
How raw probe measurements deduplicate into the incident_id field you see in the dataset — the four-tuple clustering key, 6-hour gap rule, incident lifecycle (ANOMALY → RESOLVED), the 12-hour re-open window, and retroactive CensoredPlanet alignment. Explains why incident counts differ from measurement counts by 200–800×. 2025.
How the censorship_score field in the country summary API is computed: exponential recency decay (30-day half-life), ASN diversity weighting, domain category weights, cross-source corroboration multipliers, Gaussian temporal smoothing, and bootstrap confidence bands. Covers per-country calibration for coverage disparity. 2025.
How censorship-aware routing works in practice: ML-driven path selection across 142 entry-node IPs, traffic morphing, DPI evasion. 2024.
The full verification flow: vantage selection, scan cadence, anomaly classification, and cross-source corroboration against OONI, CensoredPlanet, and IODA.
Voidly measurement data is published under CC BY 4.0. Use, redistribute, and remix with attribution. Suggested format:
Source: Voidly / AI Analytics — voidly.ai (CC BY 4.0)
Citing in a paper? Use the formatted citations on /voidly (APA + BibTeX, click to copy).
The Federal Regulatory Data Hub indexes 256 federal datasets across 45 agencies — SEC, FDA, OFAC, DOJ, EPA, CFPB, IRS, FEMA, CDC, NHTSA, FAA, CMS, MSHA, OSHA and more. 37M+ canonical records, daily refresh, licensed CC0 1.0 (public domain — no attribution required). Accessible via REST, MCP, and JSON-LD.
→ Federal Data Hub landing page · Live dataset counts · MCP server (38+ tools)