Technical writing
How Voidly measures HTTP and HTTPS censorship: the full protocol lifecycle from DNS through TLS to body comparison
Why layer-by-layer measurement matters
Censors don't operate at a single network layer. A government can block content via DNS poisoning, TCP RST injection, TLS SNI filtering, HTTP response substitution, or upstream BGP routing changes — and often uses different techniques for different domains or different ISPs. A measurement system that only checks “can I load this URL?” conflates all these failure modes into a single boolean and loses the forensic information needed to attribute censorship with confidence.
Voidly probes perform independent checks at each protocol layer and record the outcome of each. A probe test for a single domain produces up to seven observable signals: DNS resolution result, TCP connection status, TLS handshake result (with certificate details), HTTP status code, response body (first 4096 bytes), control comparison delta, and response timing. Each signal either corroborates or contradicts the others — and that multi-layer consistency check is what lets the anomaly classifier assign precise interference types rather than just flagging “something went wrong.”
The MeasurementTask struct
Each scheduled probe measurement begins as a MeasurementTask dispatched by the measurement scheduler. The task specifies which protocols to exercise for this domain:
#[derive(Debug, Serialize, Deserialize)]
pub struct MeasurementTask {
pub domain: String,
pub url: Option<String>, // full URL if non-root path matters
pub protocols: Vec<Protocol>, // ["dns", "tcp", "tls", "https"]
pub priority: u8, // 0–10
pub expected_duration_ms: u32,
pub jitter_ms: u32, // ±jitter for timing randomization
}
pub enum Protocol {
Dns, // DNS resolution only
Tcp, // TCP connect (no TLS)
Tls, // TLS handshake (no HTTP body)
Http, // Plain HTTP GET
Https, // HTTPS GET (TLS + HTTP)
}Most domains in the test list use [Dns, Tcp, Tls, Https]. Plain Http is added if the domain historically served HTTP (pre-2020) or has a documented HTTP-layer blocking pattern where the block page differs between HTTP and HTTPS. TCP-only is used for specific port-blocking detection on non-HTTP services.
Layer 1: DNS resolution
The probe resolves the target domain using the system resolver (the ISP's resolver, not 8.8.8.8) and a known-unfiltered control resolver simultaneously. The comparison is the first censorship signal:
pub struct DnsResult {
// Answers from the probe's ISP resolver
pub probe_addrs: Vec<IpAddr>,
pub probe_ttl: u32,
pub probe_rcode: DnsRcode, // NOERROR, NXDOMAIN, SERVFAIL, REFUSED
pub probe_resolver: IpAddr,
// Answers from the control resolver
pub control_addrs: Vec<IpAddr>,
pub control_rcode: DnsRcode,
// Derived fields
pub addr_match: bool, // any probe addr in control addr set?
pub is_bogon: bool, // probe resolved to RFC1918/loopback/null?
pub nxdomain_probe_not_control: bool, // probe got NXDOMAIN, control got NOERROR
}
pub enum DnsTamperingSignal {
None,
BogonInjection, // resolved to 127.0.0.1, 0.0.0.0, or RFC1918
NxdomainInjection, // NXDOMAIN where control sees NOERROR
WrongAddress, // non-bogon addr that differs from control (CDN-aware)
ServfailRefused, // SERVFAIL or REFUSED not seen at control
}The addr_match check is CDN-aware. Major CDN providers (Cloudflare, Akamai, Fastly, Google) return different IPs for the same domain based on the resolver's geolocation. If the probe and control addrs differ but both resolve to the same AS (or a known CDN's AS set), addr_match is set to true to avoid false positives. The CDN AS allow-list covers approximately 180 ASNs including all major content delivery networks and large cloud providers.
Layer 2: TCP connection
With the resolved IP address, the probe attempts a TCP connection to port 80 (HTTP) and/or 443 (HTTPS) with a 5-second timeout:
pub struct TcpResult {
pub target_addr: SocketAddr,
pub connected: bool,
pub time_to_connect_ms: Option<u32>,
pub error: Option<TcpError>,
pub reset_received: bool, // RST packet during or after handshake
}
pub enum TcpError {
ConnectionRefused, // ECONNREFUSED — endpoint closed
TimedOut, // no SYN-ACK within 5 seconds
NetworkUnreachable, // ENETUNREACH
RstInjection, // RST during SYN or after connection (middlebox)
}RstInjection deserves special attention: a TCP RST received during or immediately after the SYN is a strong censorship signal. Legitimate servers close connections with FIN, not RST. When the probe sees RST from an IP that the control resolver confirms is a real server IP, it indicates a middlebox between the probe and the server is synthesizing RST packets — a common censorship mechanism in Russia, Iran, and Kazakhstan.
RST timing is also analyzed. An RST arriving 10–30ms after SYN (faster than the round-trip to the actual server) strongly suggests injection: the real server hasn't had time to respond. Voidly records time_to_connect_ms and compares it against the expected RTT to the destination AS, flagging anomalously fast RSTs.
Layer 3: TLS handshake
Once TCP is established, the probe performs a TLS handshake and captures the full certificate chain, the negotiated cipher suite, and any handshake errors:
pub struct TlsResult {
pub handshake_complete: bool,
pub negotiated_version: Option<TlsVersion>, // TLS 1.2, 1.3
pub cipher_suite: Option<u16>,
pub sni_sent: String, // the SNI value in ClientHello
pub cert_chain: Vec<CertInfo>,
pub cert_valid: bool, // chain validates to a trusted root
pub cert_matches_sni: bool, // leaf cert CN or SAN matches the SNI
pub handshake_error: Option<TlsError>,
pub alert_received: Option<TlsAlert>, // fatal alert from server
}
pub struct CertInfo {
pub subject_cn: String,
pub issuer: String,
pub sha256_fingerprint: [u8; 32],
pub not_before: i64,
pub not_after: i64,
pub san_domains: Vec<String>,
}
pub enum TlsError {
HandshakeTimeout,
CertVerificationFailed, // could be MITM or expired cert
AlertHandshakeFailure, // server rejected ClientHello (SNI filtering)
AlertUnrecognizedName, // explicit SNI not found response
ConnectionResetDuring, // RST during TLS handshake
}TLS-layer censorship takes two forms:
- SNI filtering: The middlebox reads the unencrypted Server Name Indication (SNI) field in the TLS ClientHello and drops or RSTs the connection. The probe sees
AlertHandshakeFailureorConnectionResetDuringwhile the control server completes the handshake successfully. - TLS MITM: The middlebox terminates the TLS connection and presents its own certificate. The probe sees a cert chain that doesn't originate from the expected CA —
cert_validis false (self-signed or unrecognized CA), and the fingerprint doesn't match the control server's cert. This is the pattern used by certain ISPs in Russia for SNI-based filtering that also needs to inspect the HTTP content.
Certificate fingerprinting: the probe records the SHA-256 fingerprint of the leaf certificate. If the fingerprint matches the block page fingerprint library's TLS category (currently 340 known censorship-cert fingerprints), it sets blockpage_match = true even before the HTTP body is checked.
Layer 4: HTTP/HTTPS request and response
With the TLS handshake complete (or directly for plain HTTP), the probe sends an HTTP GET request and captures the response:
pub struct HttpResult {
pub status_code: Option<u16>,
pub headers: HashMap<String, String>,
pub body_first_4096: Vec<u8>, // first 4096 bytes of response body
pub body_sha256: [u8; 32], // hash of complete response body
pub body_length: Option<u64>, // Content-Length or chunked total
pub time_to_first_byte_ms: u32,
pub total_time_ms: u32,
// Block page detection
pub blockpage_match: bool,
pub blockpage_fp_id: Option<String>,
pub blockpage_method: Option<BlockpageMethod>,
// Control comparison
pub body_matches_control: bool,
pub status_matches_control: bool,
pub header_diff: Vec<String>, // headers present in control but not probe
}
// Populated against the control server's response to the same URL
pub struct ControlComparison {
pub control_status: u16,
pub control_body_sha256: [u8; 32],
pub control_headers: HashMap<String, String>,
pub control_tls_fingerprint: [u8; 32],
}The blockpage_match field is populated by running the four-strategy fingerprint cascade from the block page library — SHA-256 exact hash first, then structural normalization, then SimHash locality-sensitive hashing, then TLS cert fingerprint (already checked in the TLS layer). If any match fires, blockpage_fp_id records which fingerprint matched and blockpage_method records how.
The control comparison
Every measurement is compared against a simultaneous request from the nearest control server. The control servers — one in US-East, one in EU-West, one in AP-East — are hosted on infrastructure that is known to be outside any censorship jurisdiction. The comparison:
// After the probe completes its layered test, the collector fetches
// the same URL from all three control servers and takes the consensus result.
//
// Consensus: majority vote on status_code and body_sha256.
// If all three controls agree: high confidence in the control value.
// If two agree: use the majority, flag low_control_confidence.
// If all differ: flag inconclusive_control (rare — typically server-side randomness).
pub fn compare_to_control(
probe: &MeasurementResult,
control: &ControlComparison,
) -> ControlDelta {
ControlDelta {
dns_differs: probe.dns.probe_addrs != control.control_addrs
&& !probe.dns.addr_match, // CDN-aware check
tcp_failed_probe_not_control: !probe.tcp.connected
&& control.tcp_connected,
tls_failed_probe_not_control: !probe.tls.handshake_complete
&& control.tls_complete,
status_differs: probe.http.status_code != Some(control.control_status),
body_differs: probe.http.body_sha256 != control.control_body_sha256,
body_blockpage: probe.http.blockpage_match,
control_failure: !control.tcp_connected, // control also failed
}
}The control_failure flag is critical: if the control server also can't reach the target, the probe's failure is likely a server-side issue (downtime, misconfiguration) rather than censorship. The anomaly classifier heavily discounts measurements where control_failure == true.
From protocol signals to interference type
The measurement result contains 47 features extracted from these protocol layers. The anomaly classifier runs five binary models and outputs the winning class:
| Interference type | Primary signals | Secondary signals |
|---|---|---|
dns_tampering | bogon IP, NXDOMAIN not at control, wrong address | body differs, status differs (redirect to block page) |
tls_interference | TLS handshake failure, MITM cert, cert mismatch | RST during TLS, SNI mismatch with expected CN |
http_blocking | blockpage_match, status 200 with known block body | redirect to ISP block page URL (3xx) |
bgp_withdrawal | TCP timeout, ICMP unreachable, all IPs unreachable | IODA BGP signal corroborates |
throttling | time_to_first_byte elevated vs. control | body incomplete (connection closed mid-transfer) |
Timing as a signal
Response timing is an underused censorship signal. The probe records timestamps at each layer transition. Abnormal timing patterns that indicate throttling or deep packet inspection overhead:
- DNS latency spike: DNS response time 10× the historical baseline for that ASN suggests the resolver is contacting a censorship decision system before returning the (potentially poisoned) response.
- TCP connection anomaly: Connection established significantly faster than the expected RTT to the server's AS (suggests local RST injection) or significantly slower (throttling or firewall state inspection).
- TTFB spike with body truncation:
time_to_first_byte_msis normal but the connection drops before the body completes — this is the fingerprint of TCP RST injection after the HTTP request is parsed but before the full response arrives. Common in Russia for certain streaming services.
// Timing features used by the classifier
pub struct TimingFeatures {
// Absolute measurements
pub dns_latency_ms: u32,
pub tcp_connect_ms: u32,
pub tls_handshake_ms: u32,
pub ttfb_ms: u32,
pub body_transfer_ms: u32,
// Normalized against 90-day historical baseline for this (asn, domain) pair
pub dns_latency_zscore: f32,
pub tcp_connect_zscore: f32,
pub ttfb_zscore: f32,
// Derived
pub connection_faster_than_expected_rtt: bool,
pub body_truncated: bool,
pub rst_during_body: bool,
}The full MeasurementResult record
All layers combine into a single MeasurementResult that is the unit of storage in the dataset and the input to the anomaly classifier:
pub struct MeasurementResult {
// Identity
pub measurement_id: Uuid,
pub probe_id: String, // anonymized
pub country_code: String,
pub asn: String,
pub measurement_start: DateTime<Utc>,
// What was tested
pub domain: String,
pub url: String,
pub category_code: String,
pub protocols: Vec<Protocol>,
// Per-layer results
pub dns: DnsResult,
pub tcp: TcpResult,
pub tls: TlsResult,
pub http: HttpResult,
pub timing: TimingFeatures,
// Control comparison
pub control: ControlComparison,
pub control_delta: ControlDelta,
pub control_failure: bool,
// Classifier output (populated post-measurement)
pub interference_type: Option<InterferenceType>,
pub interference_prob: f32,
pub confidence_tier: ConfidenceTier,
// Cross-source corroboration (populated async)
pub ooni_corroborated: bool,
pub cp_corroborated: bool,
pub ioda_corroborated: bool,
pub corroboration_score: f32,
// Block page detection
pub blockpage_match: bool,
pub blockpage_fp_id: Option<String>,
}Protocol coverage statistics
Across the Voidly test list of 80 domains measured across 37+ probe nodes, the protocol breakdown for measurements as of October 2025:
| Protocol set | % of measurements | Typical use case |
|---|---|---|
| DNS + TCP + TLS + HTTPS | 71% | Standard HTTPS-only sites (Twitter, BBC, Wikipedia) |
| DNS + TCP + TLS + HTTPS + HTTP | 19% | Sites with documented HTTP blocking history |
| DNS + TCP + HTTPS (no TLS) | 6% | Sites where TLS cert is not useful (Cloudflare shared) |
| DNS only | 4% | Fast cadence domain availability checks |
Related technical articles:
For how the distributed control server network generates the ControlComparisonstruct and distinguishes censorship from CDN split-horizon DNS: The Voidly control server: how we tell censorship from a bad network →
For how the probe's measurement results are consumed and classified by the anomaly classifier: The Voidly anomaly classifier: five interference classes, gradient boosted trees, and why we optimize for recall →
For how the block page fingerprint library matches the body_sha256 and blockpage_fp_id fields: Voidly's block page fingerprint library: detecting censorship signatures across 2,300+ known pages →
For how the probe application that runs these tests is implemented (Tauri 2, boringtun, tun-rs): The Voidly Probe: Tauri + boringtun network measurement at the operator's edge →
For the full field-by-field schema of the MeasurementResult as it appears in the CC BY 4.0 dataset: The Voidly measurement dataset: field-by-field schema reference →
For the Rust async engine that drives these layered protocol checks — tokio concurrency, per-layer timeouts, and Ed25519 measurement signing: The Voidly probe test runner: concurrent measurement orchestration with Rust and tokio →
For the TCP layer beneath this HTTP measurement — RST injection classification, null-routing detection, connect-time delta for transparent proxies, and the 15ms injection threshold: Voidly's TCP measurement layer: RST injection detection, null-routing, and connection timing analysis →