Technical writing

Voidly measurement protocol stack: composing DNS, TCP, TLS, and HTTP layers into a ProbeResult

· AI Analytics
VoidlyNetwork measurementProtocol stackProbe infrastructure

Censorship detection requires measuring a domain at multiple protocol layers simultaneously. DNS manipulation alone can explain a reachability failure, but it cannot distinguish NXDOMAIN injection from a legitimate DNS error. A TCP connection failure to the resolved IP can indicate IP-level blocking, but not if the IP itself changed. TLS certificate mismatches can indicate SSL interception, but not if the wrong certificate was always served. HTTP body differences are the definitive indicator of content filtering, but only if all lower layers succeeded first.

Voidly conducts all four measurements in a single pass and combines them into aProbeResult struct that carries the outcomes of each layer, their relative timings, and the comparison against a control measurement taken from a Voidly-controlled vantage point outside the probe's network. This article documents the layer execution model, failure propagation semantics, and the partial-result handling logic for measurements where lower layers fail before higher layers can execute.

ProbeResult structure

// src/measurement/probe_result.rs

use std::net::IpAddr;
use std::time::Duration;

#[derive(Debug, Clone, serde::Serialize, serde::Deserialize)]
pub struct ProbeResult {
    pub measurement_id:   String,   // ulid
    pub ts:               String,   // ISO 8601
    pub probe_id:         String,   // anonymized
    pub country_code:     String,
    pub asn:              u32,
    pub domain:           String,
    pub test_type:        String,   // always "web_connectivity" for full-stack runs

    /// DNS layer (always attempted)
    pub dns: DnsResult,

    /// TCP layer (attempted iff dns.resolved_ips is non-empty)
    pub tcp: Option<TcpResult>,

    /// TLS layer (attempted iff tcp.connected is true)
    pub tls: Option<TlsResult>,

    /// HTTP layer (attempted iff tls.handshake_ok is true, or tcp.connected for HTTP-only)
    pub http: Option<HttpResult>,

    /// Control measurement from Voidly vantage (fetched in parallel with probe measurement)
    pub control: ControlResult,

    /// Classifier output (populated by ingestion pipeline, not by probe)
    pub censor_prob:          Option<f32>,
    pub classifier_version:   Option<String>,
}

#[derive(Debug, Clone, serde::Serialize, serde::Deserialize)]
pub struct DnsResult {
    pub resolver_ip:      String,
    pub resolved_ips:     Vec<IpAddr>,
    pub nxdomain:         bool,
    pub timeout:          bool,
    pub rtt_ms:           f32,
    pub failure:          Option<String>,   // error string if DNS failed
}

#[derive(Debug, Clone, serde::Serialize, serde::Deserialize)]
pub struct TcpResult {
    pub target_ip:    IpAddr,   // first resolved IP used
    pub target_port:  u16,
    pub connected:    bool,
    pub rtt_ms:       f32,
    pub failure:      Option<String>,
}

#[derive(Debug, Clone, serde::Serialize, serde::Deserialize)]
pub struct TlsResult {
    pub cert_valid:          bool,
    pub cert_subject_cn:     String,
    pub cert_issuer:         String,
    pub cert_not_after:      String,
    pub negotiated_protocol: String,   // "TLSv1.3" | "TLSv1.2"
    pub handshake_rtt_ms:    f32,
    pub failure:             Option<String>,
}

#[derive(Debug, Clone, serde::Serialize, serde::Deserialize)]
pub struct HttpResult {
    pub status_code:     u16,
    pub body_length:     u32,
    pub body_sha256:     String,   // hex
    pub headers:         Vec<(String, String)>,
    pub response_rtt_ms: f32,
    pub failure:         Option<String>,
}

#[derive(Debug, Clone, serde::Serialize, serde::Deserialize)]
pub struct ControlResult {
    pub resolver_ips:    Vec<IpAddr>,
    pub http_status:     Option<u16>,
    pub body_sha256:     Option<String>,
    pub body_length:     Option<u32>,
    pub rtt_ms:          f32,
    pub failure:         Option<String>,
}

Layer execution model

The four measurement layers execute sequentially within the probe, with the control measurement running in parallel on a separate async task. Sequential execution is deliberate: the output of each layer is the input to the next (the DNS-resolved IP is the TCP connect target; the TCP connection handle is passed to the TLS handshaker), and the relative timing between layers is itself a measurement signal.

// src/measurement/runner.rs

pub async fn measure_domain(
    domain: &str,
    config: &ResolvedConfig,
    probe_meta: &ProbeMeta,
) -> ProbeResult {
    let measurement_id = ulid::Ulid::new().to_string();
    let ts = chrono::Utc::now().to_rfc3339();

    // Control measurement: runs in parallel with the probe measurement.
    // Uses Voidly's control resolver (8.8.8.8 + a known-clean vantage HTTP endpoint).
    let control_task = tokio::spawn(fetch_control(domain.to_string(), config.http_timeout_ms));

    // Layer 1: DNS
    let dns = measure_dns(domain, config.dns_timeout_ms).await;

    // Layer 2: TCP (only if DNS returned at least one IP)
    let tcp = if !dns.resolved_ips.is_empty() && !dns.nxdomain {
        let target_ip = dns.resolved_ips[0];
        let port = if domain_is_https(domain) { 443u16 } else { 80u16 };
        Some(measure_tcp(target_ip, port, config.http_timeout_ms).await)
    } else {
        None  // DNS failed: skip TCP, TLS, HTTP
    };

    // Layer 3: TLS (only if TCP connected and domain uses HTTPS)
    let tls = match &tcp {
        Some(t) if t.connected && domain_is_https(domain) => {
            Some(measure_tls(domain, t.target_ip, t.target_port, config.tls_timeout_ms).await)
        }
        _ => None,
    };

    // Layer 4: HTTP (only if TLS succeeded, or TCP connected for plain HTTP)
    let http = match (&tcp, &tls) {
        (Some(t), Some(s)) if t.connected && s.handshake_ok => {
            Some(measure_http(domain, t.target_ip, config.http_timeout_ms, true).await)
        }
        (Some(t), None) if t.connected && !domain_is_https(domain) => {
            Some(measure_http(domain, t.target_ip, config.http_timeout_ms, false).await)
        }
        _ => None,
    };

    let control = control_task.await.unwrap_or_else(|_| ControlResult::error("task_panic"));

    ProbeResult {
        measurement_id,
        ts,
        probe_id:           probe_meta.anonymized_id.clone(),
        country_code:       probe_meta.country_code.clone(),
        asn:                probe_meta.asn,
        domain:             domain.to_string(),
        test_type:          "web_connectivity".to_string(),
        dns, tcp, tls, http, control,
        censor_prob:        None,  // populated by ingestion pipeline
        classifier_version: None,
    }
}

Failure propagation semantics

When a lower layer fails, higher layers return None rather than a failureTcpResult / TlsResult / HttpResult. This is a deliberate schema design choice: None means “not attempted because a prerequisite failed”, while Some(result) with result.failure = Some(error) means “attempted but failed”. The classifier feature extraction pipeline uses this distinction to set the appropriate feature flags:

StatetcptlshttpLikely censorship type
DNS NXDOMAIN + control resolvesNoneNoneNoneDNS injection / NXDOMAIN blocking
DNS resolves to different IP than controlattemptedattemptedattemptedDNS redirect / blockpage IP
DNS resolves, TCP refused/timeoutSome(failed)NoneNoneIP-level blocking
TCP connects, TLS cert mismatchSome(ok)Some(invalid cert)NoneSSL interception / MITM
TLS ok, HTTP body differs from controlSome(ok)Some(ok)Some(diff body)Content filtering / blockpage

The “DNS resolves to different IP than control” case is the most complex: the probe successfully traverses all four layers, but the content it retrieves may be a blockpage served from the injected IP. This case is detected by the body_sha256 comparison between the probe's HttpResult and the ControlResult.

Control comparison protocol

The control measurement uses Google's public DNS resolver (8.8.8.8) for DNS resolution and a Voidly-operated HTTP proxy in a low-censorship jurisdiction for the HTTP fetch. The control vantage point rotates across five geographic locations to reduce the risk that the control itself is behind a network that censors some subset of domains:

// src/measurement/control.rs

const CONTROL_VANTAGE_ENDPOINTS: &[&str] = &[
    "https://ctrl-eu1.voidly.net/fetch",   // Frankfurt
    "https://ctrl-us1.voidly.net/fetch",   // Virginia
    "https://ctrl-ap1.voidly.net/fetch",   // Singapore
    "https://ctrl-sa1.voidly.net/fetch",   // São Paulo
    "https://ctrl-au1.voidly.net/fetch",   // Sydney
];

pub async fn fetch_control(domain: String, timeout_ms: u32) -> ControlResult {
    // Select vantage deterministically from domain hash to ensure the same
    // vantage is used for repeated measurements of the same domain, making
    // body_sha256 comparison stable across measurement cycles.
    let idx = domain_hash_u64(&domain) as usize % CONTROL_VANTAGE_ENDPOINTS.len();
    let endpoint = CONTROL_VANTAGE_ENDPOINTS[idx];

    let url = format!("{}?domain={}", endpoint, urlencoding::encode(&domain));
    // ... HTTP request to control proxy ...
}

Deterministic vantage selection per domain means that body hash comparisons are stable: if a domain always serves the same content from the same vantage, the body_sha256 comparison is reliable across measurement cycles even as the body content evolves (e.g., dynamic advertising content that changes on each render is pre-normalized by the control proxy before hashing).

Related writing

Voidly DNS measurement covers the DNS layer implementation in detail: the resolver selection logic, NXDOMAIN injection detection, and TTL-based consistency checks.

Voidly blockpage fingerprints describes how HTTP bodies that differ from the control are classified into known blockpage templates from specific ISPs and regulatory authorities.