Voidly · Global censorship index · CC BY 4.0
Make internet censorship measurable, verifiable, and citable.
Voidly publishes near-real-time evidence of internet blocking, throttling, and shutdown events across 200 countries. Built for journalists chasing a story before access disappears, researchers who need a defensible dataset, and human-rights organizations that have to document harm to act on it.
Snapshot. Live counters at voidly.ai — updated continuously.
How we measure
Probes run from 37+ vantage points spanning every continent. Every five minutes each probe checks an 80-domain list across DNS, TLS, HTTP, and BGP layers. Anomalies are scored by an ML classifier, then cross-referenced against three independent measurement projects before being promoted to a verified incident.
Data flow
Probe nodes (37+ across 200 countries)
│
│ every 5 minutes · 80 domains
▼
Measurement collection (HTTPS, TLS, DNS, BGP)
│
▼
Cross-reference layer ── OONI · CensoredPlanet · IODA
│
▼
ML anomaly classifier (incident type, confidence)
│
├──▶ Public dataset ──▶ voidly.ai · HuggingFace · API
├──▶ 7-day shutdown forecast
└──▶ Real-time alerts (researchers · journalists)What gets measured
- DNS tampering
- Resolver returns the wrong IP, or refuses to answer.
- TLS interference
- Handshake interrupted, certificates altered, SNI inspection.
- HTTP blocking
- Block pages, content rewrites, throttled-to-zero responses.
- BGP withdrawal
- Networks disappear from the global routing table.
- Throttling patterns
- Bandwidth deliberately collapsed for specific services.
- Full shutdowns
- National or regional connectivity dropped entirely.
Access the data
- Live dashboardvoidly.ai →
Map view, active blocking events, country drilldown, ML-powered alerts, 7-day forecast.
- REST APIapi-docs →
Documented JSON endpoints. Bulk download. CC BY 4.0 — attribute, then use.
- HuggingFace datasetsemperor-mew →
Snapshots in CSV. global-censorship-index and ooni-censorship-historical (1.66M+ downloads).
- MCP servergithub.com/voidly-ai/mcp-server →
83 tools for Claude / GPT / agent frameworks to query the dataset directly.
- Desktop Probevoidly-probe-app →
macOS / Linux / Windows. Tauri 2 + boringtun. Run a probe from your own network; keys never leave the device.
Technical stack
- Probe runtime
- Rust (Tauri 2 desktop) · Python (server-side)
- VPN transport
- boringtun 0.7 — Cloudflare userspace WireGuard
- TUN device
- tun-rs — utun / tun / Wintun
- Key generation
- X25519-Dalek (on-device only)
- Anomaly detection
- TensorFlow / scikit-learn ensemble
- Storage
- TimescaleDB (events) · S3 (raw measurements)
- Cross-source merge
- OONI ↔ CensoredPlanet ↔ IODA reconciler
- License
- CC BY 4.0 (data) · MIT (open code)
Technical documentation
- The Voidly Probe: Tauri + boringtun network measurement at the operator's edge
How the desktop probe works: Tauri 2, Cloudflare boringtun WireGuard, tun-rs TUN device, X25519-Dalek on-device key generation stored in OS keychain, and the operator-safety constraints that shaped every technical decision. 2025.
- How Voidly measures HTTP and HTTPS censorship: DNS through TLS to body comparison
The full probe test lifecycle at every protocol layer: DNS resolution with CDN-aware control comparison, TCP RST injection detection, TLS handshake with cert chain fingerprinting, HTTP body fingerprinting against the 2,300-entry block page library, response timing anomaly detection, and how each layer maps to interference types. 2025.
- The Voidly control server: how we tell censorship from a bad network
How distributed control servers distinguish censorship from network errors and CDN split-horizon DNS — DNS, TCP reachability, TLS certificate comparison, HTTP block page fingerprinting (~2,300 known block-page hashes), and the ControlComparison struct that feeds the anomaly classifier. 2025.
- Voidly probe health monitoring: how we detect and replace failing probe nodes
How Voidly monitors 37+ probe nodes: 60-second heartbeat on a separate HTTPS transport, DEGRADED/OFFLINE state machine, measurement quality scoring, ASN coverage SLOs (≥2 ASNs standard; ≥4 for high-risk countries), flapping detection (confidence capped at CORROBORATED), and the classify_offline_cause() algorithm that distinguishes probe failure from ISP-level censorship. 2025.
- Voidly's block page fingerprint library: detecting censorship signatures across 2,300+ known pages
How the 2,300-entry block page library is built and maintained — four detection strategies (exact SHA-256, structural normalization, SimHash, TLS cert fingerprinting), per-country composition (Turkey 47, Iran 312, Russia 189, China 8), false positive mitigation for CDN error pages and captive portals, and how
lf_http_blockpage_hashfeeds the anomaly classifier. 2025. - Voidly's URL test list: how we curate the domains that reveal internet censorship
How we select and maintain the domains each probe tests: Citizen Lab's global list as the foundation, 12 OONI category codes, per-country supplemental lists for 37 high-risk countries, the measurement budget problem, and why the test list is as much a political document as a technical one. 2025.
- Voidly probe vantage selection: ASN diversity, operator safety, and reaching hard-to-measure countries
Why ASN diversity matters more than geographic spread, how we recruit and vet operators without collecting identity, the per-country safety tiers that shape probe behavior in high-risk countries, and three approaches for measuring where most people connect on mobile-only networks. 2025.
- The Voidly anomaly classifier: five interference classes, gradient boosted trees, and why we optimize for recall
How the ML classifier distinguishes DNS tampering, TLS interference, HTTP blocking, BGP withdrawal, and throttling — per-class binary models, country-specific calibration, and the recall-vs-precision tradeoff that makes cross-source corroboration the real quality gate. 2024.
- Voidly's ML training pipeline: building a labeled censorship dataset from OONI measurements
How the labeled training dataset is constructed: Snorkel-style weak supervision with 5 label functions, 47-feature schema, SMOTE for class imbalance, time-based train/val/test splits to prevent leakage, per-country Platt scaling calibration, and the weekly incremental retraining pipeline. 2024.
- Voidly's real-time inference API: classifying censorship measurements at 50ms
How the classifier runs as a live inference API — ONNX Runtime serving, 5ms feature extraction with in-memory control cache, three regional nodes routed by Cloudflare Workers, per-country calibration at inference time, champion/challenger shadow mode, and the full p50/p99 latency budget. 2025.
- From anomaly to verified incident: the Voidly confidence tier system
How a measurement moves through three confidence tiers — Anomaly, Corroborated, Verified Incident — and what each tier means for journalists, ML researchers, and infrastructure teams using the dataset. 2025.
- Voidly's country-level censorship score: aggregating 2.2B probe measurements into the global index
How per-measurement ML classifier outputs aggregate into per-country censorship scores: 30-day exponential recency decay, ASN diversity weighting (1/√K per-ASN cap), domain category weights (news_media 2.0×, social_media 1.8×), cross-source corroboration multipliers, 90-day rolling windows, and bootstrap confidence bands. 2025.
- The Voidly measurement dataset: field-by-field schema reference
Complete schema reference for the CC BY 4.0 measurement dataset: every field in probe identity, DNS/TCP/TLS/HTTP layers, control comparison, ML classifier output, BGP signals, and cross-source corroboration — with filtering recipes for journalists, ML researchers, and infrastructure teams. 2025.
- Building the OONI historical corpus: 1.66M downloads, schema normalization, and the decisions behind the dataset
How we processed 200M+ OONI measurements into a flat ML-ready CSV — probe version schema drift, test_keys normalization across 20 measurement types, and what we left out. 2024.
- BGP routing signals and internet shutdown detection: how Voidly uses IODA data
How Voidly uses BGP prefix withdrawal patterns and IODA data to detect internet shutdowns before any probe can send a packet — per-country baseline calculation, BGP silence vs. withdrawal, independence weighting in the composite score. 2025.
- Voidly's real-time event pipeline: from measurement anomaly to journalist alert in under 8 minutes
How the collector goes from a probe anomaly to a published verified incident — and an alert in a journalist's inbox — in under 8 minutes: event queue, parallel OONI/IODA corroboration, confidence ladder, the two-window alert-fatigue guard, and the nightly CensoredPlanet retroactive pass. 2025.
- The Voidly measurement scheduler: how we decide which domains to probe and when
How Voidly schedules 80-domain probe runs across 37+ nodes: OONI category-code priority table (NEWS/SMG=8, HUMR=7, POLR=6), anomaly-driven priority boosts (max +3), ±15% jitter with random skip in high-risk countries, per-domain ASN distribution, HighPrioritySignal urgent injection on anomaly detection, and per-country task budgets (CN:68, RU:72, IR:74, global avg:49). 2024.
- The Voidly open datasets on HuggingFace: structure, daily snapshots, and filter recipes
How to access the global-censorship-index and ooni-censorship-historical datasets — Parquet partitioning by country and month, daily incremental append cadence, git-lfs versioning for reproducibility, and filter recipes in Python, pandas, DuckDB, and R for journalism, ML, and infrastructure monitoring. 2025.
- Incident clustering and deduplication: how Voidly avoids counting the same censorship event twice
How thousands of probe measurements deduplicating into discrete incidents: the four-tuple clustering key, the 6-hour gap rule, incident lifecycle (ANOMALY → CORROBORATED → VERIFIED → RESOLVED), the 12-hour re-open window, retroactive CensoredPlanet alignment, flapping detection, and the
incident_idfield in the dataset. 2025. - Seven-day internet shutdown forecasting: how Voidly predicts connectivity outages
Architecture of the 7-day forecast: political calendar features, BGP telemetry, ARIMA + XGBoost ensemble, per-country calibration, and Brier score validation. 2025.
- Cross-source censorship verification: reconciling OONI, CensoredPlanet, and IODA
Data format normalization, time-window alignment, confidence scoring, and how we handle disagreements between sources. 2025.
- The Voidly MCP server: 83 censorship query tools for Claude and GPT
How the Voidly MCP server exposes 83 tools for querying the censorship dataset from Claude and GPT agents — incident lookup, country summaries, BGP events, shutdown forecasts, and example workflows for journalists, researchers, and human rights organizations. 2025.
- The Voidly REST API: querying the global censorship index in real time
Core endpoints (/incidents, /measurements, /countries, /domains, /bgp/events, /forecast), cursor-based pagination, filtering by country, tier, interference type and date range, streaming NDJSON export, and code samples in curl, Python, and TypeScript. 2025.
- Building a distributed VPN with intelligent routing
ML-driven path selection across 142 entry-node IPs, traffic morphing for DPI evasion. 2024.
Cite this dataset
Use either format. Replace the access date with the day you pulled the data.
AI Analytics. (2026). Voidly — The Global Censorship Index [Dataset]. https://voidly.ai (CC BY 4.0).
@dataset{voidly_2026,
author = {{AI Analytics}},
title = {Voidly --- The Global Censorship Index},
year = {2026},
url = {https://voidly.ai},
note = {Accessed YYYY-MM-DD},
license = {CC BY 4.0}
}Operated by AI Analytics LLC. Warrant canary: 0 warrants received as of last publication.