Technical writing

How Voidly measures HTTP and HTTPS censorship: the full protocol lifecycle from DNS through TLS to body comparison

November 1, 2025· AI Analytics

CensorshipVoidlyMethodologyInfrastructure

Why layer-by-layer measurement matters

Censors don't operate at a single network layer. A government can block content via DNS poisoning, TCP RST injection, TLS SNI filtering, HTTP response substitution, or upstream BGP routing changes — and often uses different techniques for different domains or different ISPs. A measurement system that only checks “can I load this URL?” conflates all these failure modes into a single boolean and loses the forensic information needed to attribute censorship with confidence.

Voidly probes perform independent checks at each protocol layer and record the outcome of each. A probe test for a single domain produces up to seven observable signals: DNS resolution result, TCP connection status, TLS handshake result (with certificate details), HTTP status code, response body (first 4096 bytes), control comparison delta, and response timing. Each signal either corroborates or contradicts the others — and that multi-layer consistency check is what lets the anomaly classifier assign precise interference types rather than just flagging “something went wrong.”

The MeasurementTask struct

Each scheduled probe measurement begins as a MeasurementTask dispatched by the measurement scheduler. The task specifies which protocols to exercise for this domain:

#[derive(Debug, Serialize, Deserialize)]
pub struct MeasurementTask {
    pub domain: String,
    pub url: Option<String>,         // full URL if non-root path matters
    pub protocols: Vec<Protocol>,    // ["dns", "tcp", "tls", "https"]
    pub priority: u8,                // 0–10
    pub expected_duration_ms: u32,
    pub jitter_ms: u32,              // ±jitter for timing randomization
}

pub enum Protocol {
    Dns,    // DNS resolution only
    Tcp,    // TCP connect (no TLS)
    Tls,    // TLS handshake (no HTTP body)
    Http,   // Plain HTTP GET
    Https,  // HTTPS GET (TLS + HTTP)
}

Most domains in the test list use [Dns, Tcp, Tls, Https]. Plain Http is added if the domain historically served HTTP (pre-2020) or has a documented HTTP-layer blocking pattern where the block page differs between HTTP and HTTPS. TCP-only is used for specific port-blocking detection on non-HTTP services.

Layer 1: DNS resolution

The probe resolves the target domain using the system resolver (the ISP's resolver, not 8.8.8.8) and a known-unfiltered control resolver simultaneously. The comparison is the first censorship signal:

pub struct DnsResult {
    // Answers from the probe's ISP resolver
    pub probe_addrs: Vec<IpAddr>,
    pub probe_ttl: u32,
    pub probe_rcode: DnsRcode,       // NOERROR, NXDOMAIN, SERVFAIL, REFUSED
    pub probe_resolver: IpAddr,

    // Answers from the control resolver
    pub control_addrs: Vec<IpAddr>,
    pub control_rcode: DnsRcode,

    // Derived fields
    pub addr_match: bool,            // any probe addr in control addr set?
    pub is_bogon: bool,              // probe resolved to RFC1918/loopback/null?
    pub nxdomain_probe_not_control: bool, // probe got NXDOMAIN, control got NOERROR
}

pub enum DnsTamperingSignal {
    None,
    BogonInjection,        // resolved to 127.0.0.1, 0.0.0.0, or RFC1918
    NxdomainInjection,     // NXDOMAIN where control sees NOERROR
    WrongAddress,          // non-bogon addr that differs from control (CDN-aware)
    ServfailRefused,       // SERVFAIL or REFUSED not seen at control
}

The addr_match check is CDN-aware. Major CDN providers (Cloudflare, Akamai, Fastly, Google) return different IPs for the same domain based on the resolver's geolocation. If the probe and control addrs differ but both resolve to the same AS (or a known CDN's AS set), addr_match is set to true to avoid false positives. The CDN AS allow-list covers approximately 180 ASNs including all major content delivery networks and large cloud providers.

Layer 2: TCP connection

With the resolved IP address, the probe attempts a TCP connection to port 80 (HTTP) and/or 443 (HTTPS) with a 5-second timeout:

pub struct TcpResult {
    pub target_addr: SocketAddr,
    pub connected: bool,
    pub time_to_connect_ms: Option<u32>,
    pub error: Option<TcpError>,
    pub reset_received: bool,   // RST packet during or after handshake
}

pub enum TcpError {
    ConnectionRefused,  // ECONNREFUSED — endpoint closed
    TimedOut,           // no SYN-ACK within 5 seconds
    NetworkUnreachable, // ENETUNREACH
    RstInjection,       // RST during SYN or after connection (middlebox)
}

RstInjection deserves special attention: a TCP RST received during or immediately after the SYN is a strong censorship signal. Legitimate servers close connections with FIN, not RST. When the probe sees RST from an IP that the control resolver confirms is a real server IP, it indicates a middlebox between the probe and the server is synthesizing RST packets — a common censorship mechanism in Russia, Iran, and Kazakhstan.

RST timing is also analyzed. An RST arriving 10–30ms after SYN (faster than the round-trip to the actual server) strongly suggests injection: the real server hasn't had time to respond. Voidly records time_to_connect_ms and compares it against the expected RTT to the destination AS, flagging anomalously fast RSTs.

Layer 3: TLS handshake

Once TCP is established, the probe performs a TLS handshake and captures the full certificate chain, the negotiated cipher suite, and any handshake errors:

pub struct TlsResult {
    pub handshake_complete: bool,
    pub negotiated_version: Option<TlsVersion>,   // TLS 1.2, 1.3
    pub cipher_suite: Option<u16>,
    pub sni_sent: String,                          // the SNI value in ClientHello

    pub cert_chain: Vec<CertInfo>,
    pub cert_valid: bool,       // chain validates to a trusted root
    pub cert_matches_sni: bool, // leaf cert CN or SAN matches the SNI

    pub handshake_error: Option<TlsError>,
    pub alert_received: Option<TlsAlert>, // fatal alert from server
}

pub struct CertInfo {
    pub subject_cn: String,
    pub issuer: String,
    pub sha256_fingerprint: [u8; 32],
    pub not_before: i64,
    pub not_after: i64,
    pub san_domains: Vec<String>,
}

pub enum TlsError {
    HandshakeTimeout,
    CertVerificationFailed,   // could be MITM or expired cert
    AlertHandshakeFailure,    // server rejected ClientHello (SNI filtering)
    AlertUnrecognizedName,    // explicit SNI not found response
    ConnectionResetDuring,    // RST during TLS handshake
}

TLS-layer censorship takes two forms:

SNI filtering: The middlebox reads the unencrypted Server Name Indication (SNI) field in the TLS ClientHello and drops or RSTs the connection. The probe sees AlertHandshakeFailure or ConnectionResetDuringwhile the control server completes the handshake successfully.
TLS MITM: The middlebox terminates the TLS connection and presents its own certificate. The probe sees a cert chain that doesn't originate from the expected CA — cert_valid is false (self-signed or unrecognized CA), and the fingerprint doesn't match the control server's cert. This is the pattern used by certain ISPs in Russia for SNI-based filtering that also needs to inspect the HTTP content.

Certificate fingerprinting: the probe records the SHA-256 fingerprint of the leaf certificate. If the fingerprint matches the block page fingerprint library's TLS category (currently 340 known censorship-cert fingerprints), it sets blockpage_match = true even before the HTTP body is checked.

Layer 4: HTTP/HTTPS request and response

With the TLS handshake complete (or directly for plain HTTP), the probe sends an HTTP GET request and captures the response:

pub struct HttpResult {
    pub status_code: Option<u16>,
    pub headers: HashMap<String, String>,
    pub body_first_4096: Vec<u8>,           // first 4096 bytes of response body
    pub body_sha256: [u8; 32],              // hash of complete response body
    pub body_length: Option<u64>,           // Content-Length or chunked total
    pub time_to_first_byte_ms: u32,
    pub total_time_ms: u32,

    // Block page detection
    pub blockpage_match: bool,
    pub blockpage_fp_id: Option<String>,
    pub blockpage_method: Option<BlockpageMethod>,

    // Control comparison
    pub body_matches_control: bool,
    pub status_matches_control: bool,
    pub header_diff: Vec<String>,          // headers present in control but not probe
}

// Populated against the control server's response to the same URL
pub struct ControlComparison {
    pub control_status: u16,
    pub control_body_sha256: [u8; 32],
    pub control_headers: HashMap<String, String>,
    pub control_tls_fingerprint: [u8; 32],
}

The blockpage_match field is populated by running the four-strategy fingerprint cascade from the block page library — SHA-256 exact hash first, then structural normalization, then SimHash locality-sensitive hashing, then TLS cert fingerprint (already checked in the TLS layer). If any match fires, blockpage_fp_id records which fingerprint matched and blockpage_method records how.

The control comparison

Every measurement is compared against a simultaneous request from the nearest control server. The control servers — one in US-East, one in EU-West, one in AP-East — are hosted on infrastructure that is known to be outside any censorship jurisdiction. The comparison:

// After the probe completes its layered test, the collector fetches
// the same URL from all three control servers and takes the consensus result.
//
// Consensus: majority vote on status_code and body_sha256.
// If all three controls agree: high confidence in the control value.
// If two agree: use the majority, flag low_control_confidence.
// If all differ: flag inconclusive_control (rare — typically server-side randomness).

pub fn compare_to_control(
    probe: &MeasurementResult,
    control: &ControlComparison,
) -> ControlDelta {
    ControlDelta {
        dns_differs: probe.dns.probe_addrs != control.control_addrs
                     && !probe.dns.addr_match,  // CDN-aware check
        tcp_failed_probe_not_control: !probe.tcp.connected
                                      && control.tcp_connected,
        tls_failed_probe_not_control: !probe.tls.handshake_complete
                                      && control.tls_complete,
        status_differs: probe.http.status_code != Some(control.control_status),
        body_differs: probe.http.body_sha256 != control.control_body_sha256,
        body_blockpage: probe.http.blockpage_match,
        control_failure: !control.tcp_connected,  // control also failed
    }
}

The control_failure flag is critical: if the control server also can't reach the target, the probe's failure is likely a server-side issue (downtime, misconfiguration) rather than censorship. The anomaly classifier heavily discounts measurements where control_failure == true.

From protocol signals to interference type

The measurement result contains 47 features extracted from these protocol layers. The anomaly classifier runs five binary models and outputs the winning class:

Interference type	Primary signals	Secondary signals
`dns_tampering`	bogon IP, NXDOMAIN not at control, wrong address	body differs, status differs (redirect to block page)
`tls_interference`	TLS handshake failure, MITM cert, cert mismatch	RST during TLS, SNI mismatch with expected CN
`http_blocking`	blockpage_match, status 200 with known block body	redirect to ISP block page URL (3xx)
`bgp_withdrawal`	TCP timeout, ICMP unreachable, all IPs unreachable	IODA BGP signal corroborates
`throttling`	time_to_first_byte elevated vs. control	body incomplete (connection closed mid-transfer)

Timing as a signal

Response timing is an underused censorship signal. The probe records timestamps at each layer transition. Abnormal timing patterns that indicate throttling or deep packet inspection overhead:

DNS latency spike: DNS response time 10× the historical baseline for that ASN suggests the resolver is contacting a censorship decision system before returning the (potentially poisoned) response.
TCP connection anomaly: Connection established significantly faster than the expected RTT to the server's AS (suggests local RST injection) or significantly slower (throttling or firewall state inspection).
TTFB spike with body truncation: time_to_first_byte_msis normal but the connection drops before the body completes — this is the fingerprint of TCP RST injection after the HTTP request is parsed but before the full response arrives. Common in Russia for certain streaming services.

// Timing features used by the classifier
pub struct TimingFeatures {
    // Absolute measurements
    pub dns_latency_ms: u32,
    pub tcp_connect_ms: u32,
    pub tls_handshake_ms: u32,
    pub ttfb_ms: u32,
    pub body_transfer_ms: u32,

    // Normalized against 90-day historical baseline for this (asn, domain) pair
    pub dns_latency_zscore: f32,
    pub tcp_connect_zscore: f32,
    pub ttfb_zscore: f32,

    // Derived
    pub connection_faster_than_expected_rtt: bool,
    pub body_truncated: bool,
    pub rst_during_body: bool,
}

The full MeasurementResult record

All layers combine into a single MeasurementResult that is the unit of storage in the dataset and the input to the anomaly classifier:

pub struct MeasurementResult {
    // Identity
    pub measurement_id: Uuid,
    pub probe_id: String,       // anonymized
    pub country_code: String,
    pub asn: String,
    pub measurement_start: DateTime<Utc>,

    // What was tested
    pub domain: String,
    pub url: String,
    pub category_code: String,
    pub protocols: Vec<Protocol>,

    // Per-layer results
    pub dns: DnsResult,
    pub tcp: TcpResult,
    pub tls: TlsResult,
    pub http: HttpResult,
    pub timing: TimingFeatures,

    // Control comparison
    pub control: ControlComparison,
    pub control_delta: ControlDelta,
    pub control_failure: bool,

    // Classifier output (populated post-measurement)
    pub interference_type: Option<InterferenceType>,
    pub interference_prob: f32,
    pub confidence_tier: ConfidenceTier,

    // Cross-source corroboration (populated async)
    pub ooni_corroborated: bool,
    pub cp_corroborated: bool,
    pub ioda_corroborated: bool,
    pub corroboration_score: f32,

    // Block page detection
    pub blockpage_match: bool,
    pub blockpage_fp_id: Option<String>,
}

Protocol coverage statistics

Across the Voidly test list of 80 domains measured across 37+ probe nodes, the protocol breakdown for measurements as of October 2025:

Protocol set	% of measurements	Typical use case
DNS + TCP + TLS + HTTPS	71%	Standard HTTPS-only sites (Twitter, BBC, Wikipedia)
DNS + TCP + TLS + HTTPS + HTTP	19%	Sites with documented HTTP blocking history
DNS + TCP + HTTPS (no TLS)	6%	Sites where TLS cert is not useful (Cloudflare shared)
DNS only	4%	Fast cadence domain availability checks