Technical writing

The Voidly Anomaly Classifier: Five Interference Classes, Gradient Boosted Trees, and Why We Optimize for Recall

December 10, 2024· 9 min read· AI Analytics

CensorshipVoidlyMLInfrastructure

Every Voidly probe run produces four raw measurements per domain: a DNS lookup, a TLS handshake attempt, an HTTP fetch, and a BGP reachability check. Across 37+ nodes and an 80-domain test list, that's roughly 11,500 raw measurements every 5 minutes. Most of them are clean. Some are noisy for entirely legitimate reasons — CDN timeouts, transient routing issues, misconfigured servers. A small fraction represent actual interference.

The anomaly classifier's job is to separate interference from noise, label it with the right type, and produce a confidence score that the cross-source reconciler can use to decide whether to promote an observation to the verified-incident tier. This post covers how that classifier works.

Why five classes instead of binary blocked/not-blocked

The naive approach — binary classification of “this domain is blocked” vs. “this domain is accessible” — loses too much information. The typeof interference matters for two reasons:

Different actors use different techniques. DNS-level blocks are typically ISP-implemented (cheap to deploy at scale). TLS interference (SNI-based reset) requires DPI hardware and is more expensive — a signal that the block is deliberately targeted. BGP withdrawal is a national-scale event. Knowing the type narrows down who is responsible.
Some interference types coexist. A target might be blocked at the DNS layer and have its TLS traffic reset — belt and suspenders. A binary model would label this as one event; the multi-class model surfaces both, which matters for cross-source corroboration (OONI might catch the HTTP block while CensoredPlanet catches the TLS reset).

The five classes the classifier distinguishes:

DNS tampering. The resolver returns an IP that doesn't belong to the domain's known ASN, returns NXDOMAIN for a domain that exists, times out selectively (other queries to the same resolver succeed), or returns a redirect to an ISP landing page.
TLS interference. The TCP handshake completes but the TLS handshake doesn't: reset after ClientHello, alert on SNI extension, or substituted certificate with a mismatched CN or unexpected issuer chain.
HTTP blocking. The connection succeeds but the response is a block page (fingerprinted against a corpus of 800+ known block-page signatures), a 451 Unavailable For Legal Reasons, or a transparent redirect to a government notice.
BGP withdrawal. The origin AS for the domain's IP prefix is no longer reachable from the probe's vantage point — routing to the destination has been severed at the infrastructure level.
Throttling. All protocol-level checks pass but the measured bandwidth to the target is more than 3 standard deviations below the probe's per-ISP bandwidth baseline for that time window.

Feature engineering

Each class has its own feature set. Features are extracted from the raw probe measurement and joined with per-ISP and per-country baseline windows computed over the trailing 7 days.

# DNS features
dns_features = {
    'ip_in_expected_asn': check_ip_asn(returned_ip, domain_known_asns),
    'is_nxdomain': response_code == 'NXDOMAIN',
    'is_refused': response_code == 'REFUSED',
    'response_time_z': (response_ms - baseline_dns_ms) / baseline_dns_std,
    'ttl_anomaly': abs(returned_ttl - expected_ttl) > 3600,
    'ip_matches_sinkhole': returned_ip in KNOWN_SINKHOLES,
    'redirect_to_block_page': is_known_block_ip(returned_ip),
}

# TLS features
tls_features = {
    'handshake_completed': tls_ok,
    'cert_hash_expected': cert_hash in known_cert_hashes(domain),
    'cert_issuer_trusted': issuer in TRUSTED_CA_ROOTS,
    'sni_alert_type': tls_alert_code,  # e.g., 112 = unrecognized_name
    'reset_after_client_hello': tcp_reset_at_stage == 'CLIENT_HELLO',
    'handshake_time_z': (handshake_ms - baseline_tls_ms) / baseline_tls_std,
}

# HTTP features
http_features = {
    'status_code': response_code,
    'is_451': response_code == 451,
    'body_fingerprint_match': max_similarity(body, BLOCK_PAGE_CORPUS),
    'redirect_count': len(redirect_chain),
    'final_url_domain': extract_domain(final_url),
    'content_type_mismatch': expected_content_type != actual_content_type,
}

# BGP features
bgp_features = {
    'origin_as_reachable': as_path_exists(target_prefix),
    'path_length_delta': current_path_len - baseline_path_len,
    'unique_collectors_visible': sum(c.sees_prefix for c in ROUTE_COLLECTORS),
}

# Throttling features
throttle_features = {
    'bandwidth_z': (measured_bw - baseline_bw) / baseline_bw_std,
    'latency_z': (measured_latency - baseline_latency) / baseline_latency_std,
    'other_domains_ok': check_neighboring_domains(probe_id, timestamp),
}

Five per-class binary classifiers

Rather than a single multi-class model, we train five independent binary classifiers — one per interference type. This allows interference types to coexist (a domain can be DNS-tampered and TLS-intercepted) and lets us tune the threshold for each class independently.

Each classifier is a gradient boosted tree ensemble (XGBoost, 100 estimators, max depth 5). Training data comes from labeled historical measurements: confirmed interference events from the OONI corpus and CensoredPlanet, augmented with probe measurements from countries where a block was subsequently confirmed by official government statements.

import xgboost as xgb

def train_classifier(class_name: str, features: pd.DataFrame, labels: pd.Series):
    """Train one binary classifier for a single interference class."""
    model = xgb.XGBClassifier(
        n_estimators=100,
        max_depth=5,
        learning_rate=0.1,
        scale_pos_weight=neg_count / pos_count,  # handle class imbalance
        eval_metric='aucpr',  # area under precision-recall curve
        random_state=42,
    )
    model.fit(
        features, labels,
        eval_set=[(X_val, y_val)],
        verbose=False,
    )
    return model

# One model per class
classifiers = {
    'dns_tamper': train_classifier('dns_tamper', dns_features, dns_labels),
    'tls_interfere': train_classifier('tls_interfere', tls_features, tls_labels),
    'http_block': train_classifier('http_block', http_features, http_labels),
    'bgp_withdraw': train_classifier('bgp_withdraw', bgp_features, bgp_labels),
    'throttle': train_classifier('throttle', throttle_features, throttle_labels),
}

Why we optimize for recall over precision

The classifiers are tuned to maximize recall (true positive rate) rather than precision. Current operating points:

DNS tampering: 96% recall, 74% precision
TLS interference: 94% recall, 81% precision
HTTP blocking: 91% recall, 88% precision (most precise class — block pages are distinctive)
BGP withdrawal: 98% recall, 85% precision
Throttling: 89% recall, 51% precision (hardest class — most false positives)

The reason for this tradeoff: false negatives (missed censorship events) are worse than false positives in our pipeline. A missed event never gets investigated. A false positive gets surfaced as “Observed” and is filtered by the cross-source reconciler — it only reaches “Verified incident” status if OONI, CensoredPlanet, or IODA independently flags the same target in the same time window. A spurious classifier output that no other source corroborates stays at “Observed” indefinitely.

This means the cross-source verification layer is not just a quality gate — it's a design dependency. The classifier is deliberately imprecise, trusting the reconciler to wash out the noise.

Confidence scoring

Each classifier outputs a probability between 0 and 1. These are combined into a single measurement-level confidence score:

def compute_confidence(probabilities: dict[str, float]) -> float:
    """
    Combine per-class probabilities into a single confidence score.
    Returns 0.0 (no interference detected) to 1.0 (high-confidence verified block).
    """
    CLASS_WEIGHTS = {
        'dns_tamper': 0.25,
        'tls_interfere': 0.25,
        'http_block': 0.30,    # slightly higher — block pages are definitive
        'bgp_withdraw': 0.15,   # lower — BGP events are coarse-grained
        'throttle': 0.05,       # lowest — high false-positive rate
    }
    raw = sum(probabilities[c] * w for c, w in CLASS_WEIGHTS.items())

    # Boost if multiple classes fire simultaneously
    classes_above_threshold = sum(p > 0.5 for p in probabilities.values())
    if classes_above_threshold >= 2:
        raw = min(1.0, raw * 1.2)

    return round(raw, 3)

Measurements with confidence ≥ 0.40 are surfaced as “Observed” and enter the cross-source reconciliation queue. Measurements with confidence ≥ 0.75 that are corroborated by at least one external source are promoted to “Corroborated.” Reaching “Verified incident” additionally requires a sustained pattern across multiple measurement windows.

Country-specific calibration

The raw probability from each classifier is well-calibrated globally but poorly calibrated for specific countries. DNS timeout rates in Pakistan (high baseline) look like interference in a model trained on German probes. To correct for this, each classifier has a per-country probability adjustment layer trained on country-specific labeled data:

# Per-country calibration using isotonic regression
from sklearn.isotonic import IsotonicRegression

calibrators = {}
for country in COUNTRIES_WITH_LABELED_DATA:
    country_probs = raw_probs[raw_probs['country'] == country]
    ir = IsotonicRegression(out_of_bounds='clip')
    ir.fit(country_probs['raw_prob'], country_probs['label'])
    calibrators[country] = ir

def calibrated_probability(raw_prob: float, country: str, class_name: str) -> float:
    key = (country, class_name)
    if key in calibrators:
        return float(calibrators[key].predict([raw_prob])[0])
    return raw_prob  # fall back to global calibration for thin countries

Countries with fewer than 500 labeled measurements use global calibration with a conservative threshold boost (confidence × 0.85) to reduce the false-positive rate in data-sparse regions.

Throttling: the difficult class

Throttling detection has a structurally higher false-positive rate and lower precision than the other four classes. Several factors make it harder:

No hard signal. DNS, TLS, HTTP, and BGP blocks produce a binary failure. Throttling is a gradient — the question is “how slow is too slow?” The answer varies by ISP, time of day, and the content delivery network the target domain uses.
Legitimate congestion looks identical.A probe in a densely-used shared connection (common in residential deployments) may see low bandwidth for reasons unrelated to censorship. We partially mitigate this by requiring that other domains on the same probe don't show similar degradation — if five domains are slow, it's congestion; if one domain is slow while others are fine, it's more likely targeted throttling.
Baseline drift. ISPs legitimately change their capacity provisioning. A 7-day bandwidth baseline can be stale if the ISP upgraded or downgraded their link. We detect baseline drift with a Kolmogorov-Smirnov test against the 30-day trailing distribution and reset the baseline when drift is detected.

As a result, throttling events surface frequently as “Observed” but reach “Verified incident” at a much lower rate than the other four classes. We don't penalize the throttling classifier for this — it's a structural property of the signal, not a model deficiency.

What the classifier output feeds

The classifier output — per-class probabilities, composite confidence score, and the dominant interference type — is attached to every measurement record sent to the collector. The cross-source reconciler then:

Groups measurements by target domain, country, and 4-hour time window
Checks whether OONI, CensoredPlanet, or IODA flag the same target in the same window
Applies independence weights (two probes on the same ISP are not independent)
Computes a composite corroboration score that, if it crosses the verified-incident threshold, publishes the event to the public dataset

The classifier's recall bias means the reconciler's input is “noisy but complete.” The reconciler's independence weighting turns that noisy input into a high-precision output. Neither component works well without the other.

For how the probe application collects the raw measurements this classifier processes: The Voidly Probe: Tauri + boringtun network measurement at the operator's edge →

For the control server comparison that generates the features this classifier consumes: The Voidly control server: how we tell censorship from a bad network →

For how the labeled training dataset is built from OONI measurements — label functions, feature extraction, and the continuous retraining pipeline: Voidly's ML training pipeline: building a labeled censorship dataset from OONI measurements →

For how the OONI archive that provides labeled training data for this classifier was processed: Building the OONI historical corpus: 1.66M downloads, schema normalization, and the decisions behind the dataset →

For how the classifier output is reconciled across OONI, CensoredPlanet, and IODA: Cross-source censorship verification: reconciling OONI, CensoredPlanet, and IODA →

For how the classifier confidence score moves through the three-tier promotion system: From anomaly to verified incident: the Voidly confidence tier system →

For how per-measurement prob_* outputs aggregate into country-level censorship scores with recency decay and ASN diversity weighting: Voidly's country-level censorship score: aggregating 2.2B probe measurements into the global index →

For the block page fingerprint library behind the HTTP_BLOCK classification path: Voidly's block page fingerprint library: detecting censorship signatures across 2,300+ known pages →

For how the training set is grown beyond the bootstrap labels — uncertainty sampling, inter-annotator agreement, and weekly retrains: Voidly's active learning loop: growing the anomaly training set with human-in-the-loop annotation →

For how this classifier is evaluated offline before promotion — AUC-PR rationale, F2 scoring, Platt calibration ECE, and per-country case studies: Offline evaluation for the Voidly anomaly classifier: AUC-PR, F2, ECE calibration, and country case studies →

For how the OONI measurements used to build this classifier's training set are ingested, aligned, and labeled with Snorkel label functions: Building Voidly's classifier training dataset from OONI: ingestion, alignment, and label generation →