Technical writing

Voidly middlebox detection: transparent proxies, TCP injection points, and TSPU vendor signatures

· AI Analytics
VoidlyNetwork measurementMiddlebox detectionDPICensorship

Most censorship measurement platforms treat the network path between probe and destination as a black box: they observe the outcome (blocked or accessible) without characterizing the mechanism. Knowing which type of middlebox is interposed is operationally important because different middlebox architectures require different circumvention approaches, produce different observable signatures in measurement data, and implicate different vendors whose equipment can be linked to procurement records and supply chain analysis.

Voidly's middlebox detection module runs as an optional secondary measurement alongside the standard web-connectivity stack. It uses three detection techniques: an HTTP echo test for transparent proxies, TCP RST injection timing analysis for in-path DPI appliances, and a signature library of 47 confirmed vendor fingerprints including TSPU, Sandvine, Huawei Hi-SEC, and Cisco IronPort.

HTTP echo test for transparent proxies

A transparent HTTP proxy modifies request or response headers without the client's knowledge. Voidly detects them by sending an HTTP/1.1 request with a randomly generatedX-Voidly-Echo-{nonce} header to a Voidly-controlled echo server, then checking whether the server received the header verbatim and whether the response was modified in transit:

// src/middlebox/echo_test.rs

use rand::Rng;

pub struct EchoTestResult {
    pub echo_header_received:  bool,   // server confirmed it received our custom header
    pub via_header_present:    bool,   // proxy added Via: header to request
    pub x_forwarded_for:       Option<String>,  // XFF header added by proxy
    pub response_modified:     bool,   // response body differs from expected echo
    pub injected_headers:      Vec<(String, String)>,  // headers not in our original request
    pub proxy_ip:              Option<String>,  // IP that actually connected to the echo server
    pub probe_ip:              Option<String>,  // our actual egress IP (from echo response)
    pub middlebox_detected:    bool,
}

pub async fn run_echo_test(
    echo_server_url: &str,
    timeout_ms: u32,
) -> EchoTestResult {
    let nonce: u64 = rand::thread_rng().gen();
    let echo_header = format!("X-Voidly-Echo-{nonce}");
    let echo_value  = format!("voidly-{}", rand::thread_rng().gen::<u32>());

    let client = reqwest::Client::builder()
        .timeout(std::time::Duration::from_millis(timeout_ms as u64))
        // Disable TLS to ensure the proxy cannot MITM the echo test itself
        // (we send to http://echo.voidly.net intentionally)
        .build().unwrap();

    let response = client.get(echo_server_url)
        .header(&echo_header, &echo_value)
        .header("User-Agent", "VoidlyProbe/2.1")
        .send()
        .await;

    let Ok(resp) = response else {
        return EchoTestResult { middlebox_detected: false, ..Default::default() };
    };

    // Parse the JSON response from the echo server which reports:
    // { received_headers: {}, connecting_ip: "x.x.x.x", response_body: "..." }
    let body: serde_json::Value = resp.json().await.unwrap_or_default();

    let received = body["received_headers"].as_object().cloned().unwrap_or_default();
    let echo_received = received.get(&echo_header)
        .and_then(|v| v.as_str())
        .map(|v| v == echo_value)
        .unwrap_or(false);

    let via_present = received.contains_key("Via") || received.contains_key("via");
    let xff = received.get("X-Forwarded-For")
        .or_else(|| received.get("x-forwarded-for"))
        .and_then(|v| v.as_str())
        .map(str::to_string);

    let proxy_ip = body["connecting_ip"].as_str().map(str::to_string);
    let probe_ip  = body["egress_ip"].as_str().map(str::to_string);

    let injected: Vec<(String, String)> = received.iter()
        .filter(|(k, _)| !["User-Agent", "Host", "Accept-Encoding", &echo_header]
            .iter().any(|h| h.eq_ignore_ascii_case(k)))
        .map(|(k, v)| (k.clone(), v.as_str().unwrap_or("").to_string()))
        .collect();

    let middlebox_detected = via_present
        || xff.is_some()
        || !injected.is_empty()
        || proxy_ip.as_deref() != probe_ip.as_deref();

    EchoTestResult {
        echo_header_received: echo_received,
        via_header_present:   via_present,
        x_forwarded_for:      xff,
        response_modified:    false,   // populated by caller comparing body hash
        injected_headers:     injected,
        proxy_ip, probe_ip,
        middlebox_detected,
    }
}

The echo test uses plain HTTP (not HTTPS) intentionally. A transparent HTTPS proxy would require intercepting TLS, which is detected by the TLS certificate validity check in the main measurement stack. The echo test targets the complementary case: HTTP-only transparent proxies that do not intercept TLS but do modify cleartext HTTP traffic.

TCP RST injection timing analysis

In-path DPI appliances frequently inject TCP RST packets to terminate connections to blocked destinations. These injected RSTs have a distinctive timing signature: they arrive milliseconds after the request but before the server has had time to respond, and the RST's TTL is typically different from the TTL of legitimate server packets (because the DPI appliance is at a different hop count from the probe than the destination server is):

// src/middlebox/rst_analysis.rs

#[derive(Debug, Clone)]
pub struct RstEvent {
    pub arrived_ms_after_request: f32,
    pub ip_ttl:                   u8,
    pub tcp_window:               u16,
    pub tcp_options:              Vec<u8>,
}

#[derive(Debug, Clone)]
pub struct RstAnalysisResult {
    pub rst_received:       bool,
    pub likely_injected:    bool,
    pub confidence:         f32,   // 0.0 – 1.0
    pub rst_event:          Option<RstEvent>,
    pub server_ttl_sample:  Option<u8>,   // TTL of server ACK packet before RST
}

/// Heuristics for identifying injected vs. legitimate RSTs.
/// Returns confidence score: 0 = definitely server-sent, 1 = definitely injected.
pub fn analyze_rst(rst: &RstEvent, server_ttl: Option<u8>) -> f32 {
    let mut score = 0.0_f32;

    // Injected RSTs arrive very quickly (< 50ms) because the DPI appliance
    // is in-path but doesn't need to wait for the server.
    if rst.arrived_ms_after_request < 50.0 {
        score += 0.40;
    } else if rst.arrived_ms_after_request < 150.0 {
        score += 0.15;
    }

    // Injected RSTs have a TTL inconsistent with the server's TTL.
    // Typical server TTLs: 64 (Linux), 128 (Windows), 255 (Cisco IOS).
    // TSPU appliances inject with TTL 64 regardless of server OS.
    if let Some(srv_ttl) = server_ttl {
        let ttl_diff = (rst.ip_ttl as i16 - srv_ttl as i16).unsigned_abs();
        if ttl_diff > 10 {
            score += 0.30;
        }
    }

    // Injected RSTs often have TCP window = 0 (the appliance doesn't track window state).
    if rst.tcp_window == 0 {
        score += 0.20;
    }

    // Injected RSTs typically lack TCP options (no SACK, no timestamps)
    // while legitimate server RSTs retain the session's negotiated options.
    if rst.tcp_options.is_empty() {
        score += 0.10;
    }

    score.min(1.0)
}

TSPU vendor signature library

The Technical Means of Counteracting Threats (TSPU) deep packet inspection system deployed by Russian ISPs has been documented extensively through technical analysis of blockpage content, RST packet signatures, and BGP routing anomalies. Voidly's signature library covers TSPU and 46 additional vendor fingerprints:

VendorFingerprintsCountries confirmedPrimary detection method
TSPU (Echelon)12RussiaRST TTL=64, TCP window=0, blockpage X-header
Sandvine (PTS)9Belarus, Ethiopia, Uganda, EgyptHTTP 302 to blockpage domain, Via: Sandvine header
Huawei Hi-SEC8Iran, Pakistan, CubaRST timing < 20ms, SPKI fingerprint
GFW (China)7ChinaDual-direction RST, NXDOMAIN with known poison IPs
Cisco IronPort5Saudi Arabia, UAEHTTP 403 with X-Squid-Error, Via: Cisco header
Other / unidentified626 countriesMixed; awaiting vendor confirmation

Each vendor signature is encoded as a MiddleboxSignature struct with weighted components. The scoring model is identical to the OSINT attribution pipeline described in the censorship attribution article, using RST timing, blockpage body hash, injection IP WHOIS, and TLS certificate SPKI as independent evidence axes.

Middlebox event correlation

Detected middleboxes are stored in a middlebox_events TimescaleDB hypertable, partitioned by event time. A nightly job correlates middlebox detection events with censorship anomaly onset events to compute the “middlebox lead time” — how many hours before a censorship anomaly is first detected does the associated middlebox signature first appear in probe measurements on that ASN:

-- TimescaleDB: middlebox events hypertable
CREATE TABLE middlebox_events (
  ts              TIMESTAMPTZ NOT NULL,
  country_code    TEXT NOT NULL,
  asn             INTEGER NOT NULL,
  probe_id        TEXT NOT NULL,
  detection_method TEXT NOT NULL,  -- 'echo_test' | 'rst_injection' | 'vendor_signature'
  vendor_name     TEXT,            -- NULL if unknown
  confidence      FLOAT4 NOT NULL,
  evidence_json   JSONB NOT NULL,
  PRIMARY KEY (ts, asn, probe_id)
);
SELECT create_hypertable('middlebox_events', 'ts');

-- Correlation query: median hours between middlebox first-seen and anomaly onset
WITH first_seen AS (
  SELECT country_code, asn, MIN(ts) AS first_middlebox_ts
  FROM middlebox_events
  WHERE confidence > 0.7
  GROUP BY country_code, asn
),
anomaly_onset AS (
  SELECT country_code, asn, MIN(ts) AS first_anomaly_ts
  FROM censorship_anomalies
  WHERE anomaly_type = 'sustained_block'
  GROUP BY country_code, asn
)
SELECT
  fs.country_code,
  PERCENTILE_CONT(0.5) WITHIN GROUP (
    ORDER BY EXTRACT(EPOCH FROM (ao.first_anomaly_ts - fs.first_middlebox_ts)) / 3600
  ) AS median_lead_hours
FROM first_seen fs
JOIN anomaly_onset ao USING (country_code, asn)
WHERE ao.first_anomaly_ts > fs.first_middlebox_ts
GROUP BY fs.country_code
ORDER BY median_lead_hours;

Across 31 countries where Voidly has both middlebox event data and confirmed censorship anomalies, the median lead time between middlebox detection and censorship anomaly onset is 18 hours. This lag is consistent with the hypothesis that middlebox deployment precedes activation: an ISP installs the DPI appliance (detectable by echo test) days before enabling the blocking rules. The lead time is now used as an early warning feature in the shutdown forecasting model.

Related writing

Voidly TLS measurement covers the TLS handshake layer that produces the certificate validity signals used by MITM middlebox detection.

Censorship cross-source verificationdescribes how middlebox vendor signatures are correlated with OONI, Censored Planet, and BGP route announcements to reduce false-positive middlebox attributions.