Technical writing

From Anomaly to Verified Incident: the Voidly Confidence Tier System

January 20, 2025· 7 min read· AI Analytics

CensorshipVoidlyMethodologyData quality

Every Voidly measurement record in the public dataset carries a confidence tier: Anomaly, Corroborated, or Verified Incident. These three labels are not cosmetic — they encode structurally different levels of certainty, and they determine whether a finding is citable by journalists and researchers. This post explains how the tier system works, why we designed it the way we did, and what each tier means for how you should use the data.

The problem of false positives in censorship detection

Network measurements are noisy by nature. DNS resolvers time out. TLS handshakes fail because of CDN misconfigurations. TCP connections reset because of routing flaps. If you declare every anomalous measurement a censorship event, your false positive rate is unacceptably high — particularly in countries with poor infrastructure quality, where noise and censorship look similar at the protocol level.

The naive solution is to raise the anomaly classifier's threshold until the false positive rate is low enough to trust. The problem: doing that also raises the false negative rate, and false negatives — missed censorship events — are worse than false positives in a monitoring system. A missed shutdown in Myanmar is a worse error than a false alarm in Germany.

The tier system resolves this by separating two concerns:

The classifier is tuned for high recall — it catches almost everything, at the cost of some false positives.
The tier system applies increasingly strict independent corroboration requirements to filter those false positives out before anything reaches the published dataset at the Verified tier.

Tier 1: Anomaly

A measurement is tagged Anomaly when the Voidly anomaly classifier fires — when the per-class confidence score exceeds the minimum threshold (0.40 composite) on at least one interference type — and no corroborating evidence has yet been found.

“Anomaly” means: something looks wrong from this vantage point.It might be censorship. It might be infrastructure failure. It might be transient network noise. The label is honest about what the data actually shows.

Anomaly-tier events are included in the public dataset with the explicit caveat that they represent unverified single-source observations. Researchers who want to study the raw signal — including the noise — can use the Anomaly tier data. Journalists who want citable findings should use Corroborated or Verified tier only.

Tier 2: Corroborated

An Anomaly is promoted to Corroborated when one of the following conditions is met within the 4-hour time window:

Multiple independent Voidly probes.Two or more probes on different ASNs in the same country both flag the same domain in the same window. Same-ASN probes are explicitly not independent (they share the same network path).
At least one external source agrees.OONI, CensoredPlanet, or IODA flags the same target in the same country in the same time window. Cross-source agreement — even a single external source — is treated as stronger corroboration than additional same-network Voidly probes.

Corroborated events are published in the public dataset without caveats. The 7-day shutdown forecast model is trained on Corroborated and Verified tier events.

Tier 3: Verified Incident

Corroborated events that meet two additional requirements become Verified Incidents:

Sustained pattern. The anomaly persists across 4 or more consecutive 5-minute measurement windows (20 minutes minimum). Transient anomalies that self-resolve within a single window are classified as network noise, not incidents.
Cross-source confirmation. At least one external source (OONI, CensoredPlanet, or IODA) must confirm the event. Multiple Voidly probes alone can reach Corroborated but cannot reach Verified without external agreement. This protects against systematic errors in the Voidly classifier that could affect all probes simultaneously.

Verified Incidents are citable. They are what media organizations and researchers mean when they cite “Voidly measurements.” They appear in the incident counter (1,574+ verified incidents on the dashboard), in the country-level timelines, and as training labels for the shutdown forecasting model.

Why external confirmation is required for Verified tier

A natural question: if we have 37+ probes across 200 countries, why do we need external sources to reach Verified? Couldn't consistent agreement across a dozen Voidly probes on different ASNs be enough?

Two reasons it can't:

Shared infrastructure failure. All Voidly probes connect to the same collector endpoint. A routing problem between the collector and a country's network could make every probe in that country appear to see a block when the real cause is a connectivity issue between the probes and the collector. OONI, CensoredPlanet, and IODA use entirely different infrastructure and vantage points — a shared failure mode can't affect all four simultaneously.
Classifier systematic errors. If the anomaly classifier has a systematic false positive for a specific domain in a specific country (e.g., a CDN that consistently returns a different IP for that country), every Voidly probe will fire on that domain. External sources using different detection methodologies will not fire on the same false positive.

The independence weight

Not all external sources are equally independent of each other. CensoredPlanet and OONI both use active measurement (sending HTTP/DNS/TLS probes from known vantage points). IODA uses passive BGP and DNS inference from traffic patterns. These are genuinely independent measurement methodologies.

We assign an independence weight to each corroborating source pair based on methodological distance:

# Methodological independence weights
INDEPENDENCE_WEIGHTS = {
    ('voidly', 'ooni'):          0.80,  # similar active measurement methods
    ('voidly', 'censoredplanet'): 0.75,  # similar but different resolver set
    ('voidly', 'ioda'):          0.95,  # passive vs active: high independence
    ('ooni', 'censoredplanet'):  0.70,  # similar methods, correlated errors
    ('ooni', 'ioda'):            0.90,
    ('censoredplanet', 'ioda'):  0.90,
}

def corroboration_score(sources: list[str]) -> float:
    """Compute combined corroboration score from agreeing sources."""
    if len(sources) == 1:
        return 0.60   # single source: Anomaly

    # Start with first pair
    pairs = list(combinations(sources, 2))
    weights = [INDEPENDENCE_WEIGHTS.get(
        (min(a, b), max(a, b)), 0.80
    ) for a, b in pairs]

    # Combine: P(at_least_one_correct) = 1 - prod(1 - w)
    combined = 1.0 - reduce(lambda acc, w: acc * (1 - w), weights, 1.0)
    return round(combined, 3)

# Examples:
# ['voidly', 'ooni'] → 0.96 (corroborated)
# ['voidly', 'ooni', 'ioda'] → 0.999 (high-confidence verified)

What the tiers mean for data consumers

Journalists and media organizations: Use Verified Incident tier only. These are events with external confirmation and a sustained pattern — suitable for attribution to specific ISPs and time windows. Cite as: “Voidly measurements verified this event” rather than “Voidly detected this event.”

ML researchers: The Corroborated tier is appropriate for training data — it includes enough signal to be useful without the noise of raw Anomaly data. The Verified tier is appropriate for evaluation sets where you want high-confidence ground truth.

Network security researchers: The Anomaly tier is the right place to look if you want to study measurement noise, CDN behaviors, or probe-level measurement variability. Filter to a specific country or ASN and compare Anomaly-to-Verified promotion rates to understand how much noise exists in different network environments.

Infrastructure reliability monitoring:The BGP withdrawal and throttling classes at the Anomaly tier can serve as early warning indicators for ISP outages before they reach Corroborated status. The 5-minute measurement cadence means an event typically appears in the Anomaly tier 5–10 minutes before it would reach Corroborated.

For the full schema of the tier and classifier_confidencefields in the published dataset: The Voidly measurement dataset: field-by-field schema reference →

For how the anomaly classifier that feeds this tier system produces its confidence scores: The Voidly anomaly classifier: five interference classes and why we optimize for recall →

For how the cross-source reconciler aligns Voidly, OONI, CensoredPlanet, and IODA in the same time window: Cross-source censorship verification: reconciling OONI, CensoredPlanet, and IODA →

For the 7-day forecast model that uses Verified Incidents as training labels: Seven-day internet shutdown forecasting: how Voidly predicts connectivity outages →

For how Verified Incident publication triggers journalist alerts in under 8 minutes: Voidly's real-time event pipeline: from measurement anomaly to journalist alert →

For how raw measurements cluster into the discrete incidents that progress through these tiers: Incident clustering and deduplication: how Voidly avoids counting the same censorship event twice →

For querying incidents filtered by confidence_tier via REST API: The Voidly REST API: querying the global censorship index in real time →

For bulk access to the dataset — Parquet filter recipes by confidence tier for journalism, ML training, and infrastructure monitoring: The Voidly open datasets on HuggingFace: structure, daily snapshots, and filter recipes →

For real-time delivery of tier-change events — SSE streaming endpoint, Last-Event-ID replay, and a comparison of SSE vs. HMAC webhooks: The Voidly SSE streaming API: real-time censorship event delivery →

For the full incident state machine — transition thresholds, timing data from 847 incidents, publication timing by tier, and how lifecycle state encodes into HuggingFace dataset fields: Censorship incident lifecycle in Voidly: from anomaly detection to verified incident to resolution →