Technical writing

Voidly probe commissioning: how a new operator joins the censorship measurement network

· 8 min read· AI Analytics
CensorshipVoidlyMethodologyInfrastructure

Every Voidly probe represents a real person in a real network, running software that could draw attention in exactly the environments where censorship is worst. The commissioning process has to balance two constraints that pull in opposite directions: we need enough information about a new probe to trust its measurements, and we need to protect the operator by collecting as little identifying information as possible.

This post walks through the complete operator onboarding flow — from the moment an operator downloads the Tauri app to the moment their measurements start appearing in the public dataset. It covers key generation, registration, the 48-hour warmup period, quality scoring at promotion to ACTIVE status, and the failure modes we watch for.

Operator vetting

Voidly does not do KYC. We ask operators to agree to an acceptable use policy and declare their country and ASN. We do not collect names, emails, or any identifier that could be linked to a real person if our records were seized or leaked. The vetting is deliberate in being lightweight: the risk to an operator of their participation becoming known is much higher than the risk to us of admitting a bad-faith operator, and bad-faith measurements are caught by the quality scoring system rather than by vetting.

Operators apply via a short form that asks for: declared country code, declared residential or mobile ASN number (not data-center), and an acknowledgment of the acceptable use terms. No email address is collected; the operator's only credential after registration is a locally-generated key pair and the derived probe_id.

Key generation and registration

On first launch, the Tauri app generates an X25519 keypair using x25519-dalek. The private key is written to the OS keychain (Keychain on macOS, Secret Service on Linux, Windows Credential Manager on Windows) and never transmitted. The public key becomes the probe's permanent cryptographic identity.

// Tauri app — probe identity initialization
use x25519_dalek::{EphemeralSecret, PublicKey, StaticSecret};
use rand::rngs::OsRng;

pub struct ProbeIdentity {
    pub probe_id: String,     // hex(SHA-256(public_key_bytes))
    pub public_key: PublicKey,
    // private key stored in OS keychain, not in this struct
}

pub fn generate_probe_identity() -> (ProbeIdentity, StaticSecret) {
    let secret = StaticSecret::random_from_rng(OsRng);
    let public_key = PublicKey::from(&secret);
    let probe_id = hex::encode(
        sha2::Sha256::digest(public_key.as_bytes())
    );
    (ProbeIdentity { probe_id, public_key }, secret)
}

The probe_id is a stable identifier derived from the public key: it never changes unless the operator reinstalls the app and generates a new keypair. This lets us track measurement quality history per probe over time without ever knowing who the operator is.

Registration is a single HTTPS POST to the Voidly backend with the public key hex, declared ASN, and declared country:

POST /v1/probes/register
{
  "public_key": "a3f8...",   // 32-byte X25519 public key, hex-encoded
  "declared_asn": 12345,
  "declared_country": "DE",
  "app_version": "2.1.4"
}

// Response
{
  "probe_id": "e8f3a...",    // SHA-256 of public key
  "status": "PENDING_WARMUP",
  "warmup_start": "2025-05-03T09:14:22Z",
  "warmup_end": "2025-05-05T09:14:22Z",
  "ingest_endpoint": "https://ingest.voidly.ai",
  "ingest_public_key": "b7c2..."   // backend's X25519 public key for QUIC session
}

ASN and country verification

After registration, the backend cross-references the declared ASN against the IP address of the registration request. We use CAIDA AS-Rank to classify the ASN type (residential, mobile, transit, data-center) and MaxMind GeoIP for country verification. We do not block registration if the declared values disagree with what we observe — operators legitimately use VPNs during registration — but we log the discrepancy and weight it during warmup evaluation.

async def verify_probe_registration(
    declared_asn: int,
    declared_country: str,
    registration_ip: str,
) -> VerificationResult:
    observed_asn = await geoip.asn_lookup(registration_ip)
    observed_country = await geoip.country_lookup(registration_ip)
    asn_type = await caida_as_rank.get_type(declared_asn)

    return VerificationResult(
        asn_match=declared_asn == observed_asn,
        country_match=declared_country == observed_country,
        asn_type=asn_type,
        # Flag data-center ASNs for manual review
        requires_review=(asn_type == 'data_center'),
    )

Data-center ASN registrations go into a manual review queue. We accept data-center probes in specific cases — for example, an operator in a country where residential internet access is so surveilled that a data-center proxy is the only safe option — but we require explicit approval because data-center measurements have different characteristics than residential measurements and need to be weighted accordingly in the classifier.

The 48-hour warmup period

During warmup, the probe runs its full measurement schedule — same 80-domain test list, same cadence — but all measurements are tagged with warmup=true and excluded from the published dataset. They are used internally for three purposes:

  • Baseline establishment. We compute per-probe baselines for control measurement success rates, DNS response distributions, and TLS certificate fingerprints. These baselines become the reference for anomaly detection on this specific probe.
  • Calibration validation. We compare the new probe's measurements against the median of existing probes in the same ASN (or nearby ASNs if the probe is the first in its ASN). A probe that systematically reports all domains as blocked during warmup is either censored already or misconfigured; we flag it for investigation rather than promoting it to ACTIVE.
  • Connectivity pattern validation. We verify that the probe can maintain the QUIC/443 connection with acceptable jitter, complete measurements within the time budget, and upload results reliably. A probe that drops 40% of upload attempts during warmup will not meet the measurement quality threshold.

Quality scoring at promotion

At the end of the 48-hour warmup window, the quality scoring function runs on the accumulated warmup measurements. The same compute_quality_score() function used for ongoing probe health monitoring applies here, with a stricter promotion threshold than the ongoing health threshold.

def compute_quality_score(probe_id: str, window_hours: int = 48) -> float:
    measurements = get_measurements(probe_id, window_hours)

    measurement_rate = len(measurements) / expected_measurements(window_hours)
    error_rate = sum(1 for m in measurements if m.error) / len(measurements)
    control_reachability = sum(1 for m in measurements if m.control_reached) / len(measurements)

    # DNS response distribution: we expect a mix of results, not all-blocked
    unique_dns_responses = len(set(m.dns_response for m in measurements))
    dns_diversity = min(1.0, unique_dns_responses / 10)

    quality = (
        0.35 * measurement_rate +
        0.30 * (1 - error_rate) +
        0.25 * control_reachability +
        0.10 * dns_diversity
    )
    return quality

PROMOTION_THRESHOLD = 0.72   # vs 0.55 for ongoing health
DEGRADED_THRESHOLD = 0.45

A probe scoring below 0.72 at the end of warmup is not promoted to ACTIVE. It remains in PENDING_WARMUP and the operator receives a notification (via the Tauri app UI, which polls for status) explaining which sub-score failed. Common failure reasons:

Failure modeLikely causeResolution
Low measurement rate (<0.6)App not running continuously; sleep/wake cycle interrupting scheduleEnable background service mode in OS settings
Low control reachability (<0.5)Firewall blocking outbound to control server IPs; corporate proxyAllowlist control server ranges; domain fronting auto-enabled
Low DNS diversityResolver returning NXDOMAIN for everything (ISP filtering), or caching single responseProbe uses secondary resolver; flag as possible high-censorship environment
High error rate (>0.3)QUIC port blocked; upload failures; measurement timeout misconfigurationCheck QUIC/443 connectivity; fallback to HTTPS long-polling

Probes in high-censorship environments (Iran, China, Russia, Belarus) routinely have lower DNS diversity and higher error rates during warmup, because the environment being measured is genuinely blocking many domains. For these environments, the promotion threshold is adjusted per-country: Iran probes are promoted at 0.60 rather than 0.72, with a calibration flag noting that measurement conditions are consistent with known local filtering.

Calibration deviation detection

Beyond the quality score, we run a deviation check comparing the new probe's warmup measurements to the median of existing probes in the same country × ASN type bucket. A probe that systematically reports different outcomes from its peers — either unusually high blocking rates or unusually low blocking rates — triggers a review.

def check_calibration_deviation(
    probe_id: str,
    country_code: str,
    warmup_measurements: List[Measurement],
) -> CalibrationResult:
    peer_measurements = get_peer_measurements(country_code, window_hours=48)

    # Per-domain agreement with peer consensus
    domain_agreement = {}
    for domain in TEST_DOMAINS:
        probe_blocked = mean(
            m.p_blocked for m in warmup_measurements if m.domain == domain
        )
        peer_blocked = median(
            m.p_blocked for m in peer_measurements if m.domain == domain
        )
        domain_agreement[domain] = abs(probe_blocked - peer_blocked)

    max_deviation = max(domain_agreement.values())
    mean_deviation = mean(domain_agreement.values())

    if mean_deviation > 0.25 or max_deviation > 0.60:
        return CalibrationResult.REVIEW_REQUIRED
    return CalibrationResult.OK

A new probe in Germany that reports bbc.com as blocked at p=0.95 while all other German probes show p=0.02 is almost certainly misconfigured or behind an unusual corporate proxy. We do not auto-reject — we flag for manual review — because there are legitimate edge cases (a probe on a corporate network with content filtering, or a probe on an ISP with unusual routing) that produce real measurement signal.

Promotion and first publication

A probe that passes both quality scoring and calibration deviation checks is promoted to ACTIVE status. The next measurement run's results are published with warmup=false and enter the dataset normally. The first 7 days after promotion, the probe's measurements are given a 0.8× confidence weight in the composite confidence score (rather than the 1.0× weight for established probes) while the quality history accumulates.

For country × ASN combinations where Voidly has no existing probe, the new probe is treated as a "pioneer" and flagged in the measurement metadata: pioneer_asn=true. Pioneer measurements are included in the dataset but require 30 days of consistent quality history before being included in CORROBORATED tier confidence calculations, because cross-source validation within the same ASN is not yet possible.


Related technical articles: