Technical writing

Voidly AS path analysis: using BGP topology to locate censorship enforcement points

July 22, 2025· 8 min read· AI Analytics

CensorshipVoidlyBGPInfrastructure

Knowing that a domain is blocked is a coarser fact than knowing where in the network the block is applied. A probe that detects interference has traversed a sequence of autonomous systems between its vantage point and the target; the enforcement could live at any one of those hops. Whether the blocking device sits at the edge ISP serving the probe, at a national transit carrier, or at an internet exchange point determines the regulatory scope of the block and the technical approach a circumvention tool needs to take. This article describes how Voidly builds an AS-level topology from three external data sources, applies Gao-Rexford relationship inference to classify each AS link, locates likely censorship choke points, and computes per-country diversity scores that feed the measurement scheduler and the anomaly classifier.

Why AS path matters for censorship measurement

Censorship enforcement is not uniformly distributed across a network path. A national firewall — like China's GFW or Iran's National Information Network infrastructure — sits at a small number of AS hops where international traffic must transit before reaching domestic users. A probe measurement that routes entirely within the country's domestic network never crosses those international transit hops, so it measures a different enforcement regime from a probe whose path exits to a regional hub and re-enters. Two probes in the same country, producing conflicting results for the same domain, may both be correct — they are simply at different positions relative to the enforcement infrastructure.

A subtler problem is probe-control colocation. Voidly uses a geographically distributed control server to establish a baseline of “unblocked” behavior. If a probe and the control server share an upstream AS, the AS path between them is short and may bypass the enforcement infrastructure entirely, producing a false negative. Tracking the AS path between each probe and its assigned control server is therefore a prerequisite for interpreting the measurement result correctly. AS path diversity in probe placement also directly governs recall: a country where all probes share a single upstream transit AS has a structural blind spot for any block applied at or above that AS.

Data sources for AS topology

Voidly draws from three independent sources to build its AS-level topology. Each contributes a different dimension of the graph:

CAIDA AS-Rank provides tier classification (T1 transit-free, T2 regional transit, T3 stub/edge) and inferred customer-provider relationships derived from BGP data. The dataset is published monthly as a set of CSV files covering approximately 73,000 active ASNs. Each ASN record includes its AS rank (a single integer capturing position in the global transit hierarchy), customer cone size (the count of downstream ASNs that depend on it for transit), and inferred relationship type with each neighboring AS. The Voidly ingest job fetches the latest AS-Rank release on the first of each month and diffs it against the previous snapshot to capture acquisitions, mergers, and relationship reclassifications.

RIPE NCC RIS route collector AS path data provides the raw material for topology inference. The same MRT update files described in the BGP data ingestion pipeline contain AS_PATH attributes for every announced prefix. These path sequences — ordered lists of ASNs from the announcing origin to the observing route collector — encode the customer-provider and peer relationships as implicit constraints under the valley-free routing model. Full RIB snapshots from rrc00 through rrc26 are parsed once daily to refresh the AS adjacency graph.

PeeringDB contributes IXP membership and peering relationship data. PeeringDB exposes a public REST API athttps://www.peeringdb.com/api/ that returns network records including IXP membership, traffic policy (open/selective/restrictive peering), and peering LAN prefixes. An ASN that appears in a PeeringDB IXP record is flagged in the topology as IXP-adjacent — significant because IXPs are known deployment points for national DPI infrastructure in several countries. RouteViews full-table dumps supplement the RIS data for ASNs not well-observed by any single RIS collector.

AS relationship classification

Voidly applies the Gao-Rexford inference algorithm to classify each observed AS adjacency into one of three relationship types. The algorithm exploits the valley-free routing property: in a correctly-operating BGP network, a route received from a customer is re-advertised to providers and peers; a route received from a provider or peer is only re-advertised to customers. By examining which ASNs appear in transit positions in observed AS paths across many prefixes, the algorithm statistically determines whether each AS pair is in a customer-provider or peer-to-peer relationship.

from enum import Enum
from dataclasses import dataclass

class AsRelationship(str, Enum):
    CUSTOMER_PROVIDER = 'CUSTOMER_PROVIDER'   # left AS is customer of right AS
    PEER_TO_PEER      = 'PEER_TO_PEER'
    SIBLING           = 'SIBLING'             # same organization, both directions
    UNKNOWN           = 'UNKNOWN'

@dataclass
class AsLink:
    left_asn:     int
    right_asn:    int
    relationship: AsRelationship
    confidence:   float     # 0.0–1.0, based on observation count
    ixp_adjacent: bool      # either AS is a PeeringDB IXP member

@dataclass
class AsPathRecord:
    probe_id:      str
    target_prefix: str
    path:          list[int]          # ordered probe→target
    link_types:    list[AsRelationship]
    censorship_exposure: int          # count of ISP-controlled hops
    recorded_at:   str

The censorship_exposure field is the number of AS hops along the path that are classified as T2 or T3 ISPs (not pure transit or CDN nodes). Pure transit providers (T1 ASNs with large customer cones and no known retail user base) are excluded from the count because they have neither the legal obligation nor the practical ability to apply country-specific censorship. Edge ISPs and regional providers do. A path with high censorship_exposure traverses more possible enforcement points, making it harder to attribute observed interference to a specific hop without additional probes.

Censorship choke point identification

Not every AS hop is equally likely to host enforcement infrastructure. Voidly maintains a ChokepointClassification enum with four values:

IXP_LEVEL — DPI or filtering hardware deployed at a national internet exchange point. Examples: Iran's IRGC-affiliated infrastructure at TICT (Tehran Internet Exchange), Russia's Roskomnadzor TSPU (Technical Means of Countering Threats) mandate, which requires DPI boxes at IXPs and large ISPs. IXP-level choke points affect all traffic transiting the exchange regardless of ISP, making them among the most expansive enforcement mechanisms.
TRANSIT_AS — An upstream carrier enforcing government blocks before delivering traffic to downstream ISPs. Pakistan PTCL (AS17557) and Bangladesh BTTB (AS17501) have both been documented routing traffic through filtering infrastructure at the transit layer, meaning ISPs that purchase upstream connectivity from them inherit the filtering even without installing their own enforcement hardware.
EDGE_ISP — Selective blocking applied by a specific ISP at its customer-facing edge, without an upstream mandate. This pattern is common in countries with informal pressure on specific carriers, or where a state-owned ISP applies blocks that private competitors do not.
UNKNOWN — Interference detected but AS path analysis cannot localize the enforcement hop, typically because the probe has insufficient path visibility or the blocking is applied symmetrically across all observed paths.

Choke point classification joins AS path data to probe measurements via a SQL lateral join that walks the path array and finds the leftmost hop where the measured interference signature changes between probes on different paths:

-- Find the common AS prefix between blocked and unblocked probes for the same target
WITH blocked AS (
    SELECT path, censorship_exposure
    FROM   as_path_records
    WHERE  probe_id IN (SELECT id FROM measurements WHERE confidence_tier >= 2
                        AND test_url = $1 AND vantage_country = $2)
),
unblocked AS (
    SELECT path
    FROM   as_path_records
    WHERE  probe_id IN (SELECT id FROM measurements WHERE confidence_tier < 2
                        AND test_url = $1 AND vantage_country = $2)
)
-- The divergence point in the AS paths is the likely enforcement AS
SELECT
    b.path,
    u.path,
    -- common_prefix_length computed in application layer from these two arrays
    b.censorship_exposure
FROM blocked b, unblocked u
LIMIT 200;

Probe placement diversity metric

The as_path_diversity_score is computed per country as the count of distinct upstream transit ASNs (T1 and T2 ASNs appearing at positions 2–4 in probe-to-target paths) that are covered by at least one active probe. Countries where all active probes share the same single upstream transit AS have a score of 1 and are flagged as coverage-risk: any censorship applied at or above that transit AS is invisible to the entire probe fleet.

Coverage targets differ by country tier:

Tier 1 countries (active blocking regime, high political salience): minimum 3 distinct upstream transit ASNs
Tier 2 countries (moderate risk, periodic incidents): minimum 2 distinct upstream transit ASNs
Tier 3 countries (low baseline blocking): no minimum enforced, opportunistic coverage

The measurement scheduler uses as_path_diversity_score when deciding which countries to prioritize for probe recruitment. A country with score 1 that also falls in Tier 1 receives the highest recruitment priority: the system surfaces it in the probe operator dashboard with a specific ask for vantages on underrepresented upstream ASNs, identified by querying the AS adjacency graph for transit ASNs that serve the country but have no current Voidly probe in their customer cone.

AS path length as a censorship propagation signal

AS path length between a probe and its measurement target encodes geographic and topological distance. Short paths (length 2–3) indicate the probe is topologically close to the target — likely in the same country or region. Censorship visible at short path lengths must be applied at the edge ISP, because there are no intermediate AS hops where transit-level enforcement could operate. Long paths (length 5–8) indicate the probe routes through multiple transit providers; enforcement could be applied at any one of those hops.

The as_path_length_delta feature is the difference between the observed AS path length to the blocked domain and the expected path length to an unblocked control target in the same AS neighborhood. A positive delta — the path to the blocked domain is longer than expected — suggests routing-level interference: the network is routing around the destination via longer paths, consistent with BGP withdrawal or traffic engineering applied to avoid the target prefix. A negative delta (shorter path) can indicate a BGP hijack where traffic is being intercepted at a closer AS. The feature is computed from MRT data using bgpkit-parser:

from bgpkit_parser import BgpkitParser
from collections import defaultdict

def compute_path_lengths(mrt_url: str) -> dict[str, list[int]]:
    """
    Parse a full RIB snapshot and return AS path lengths keyed by target prefix.
    Lengths are de-duplicated by peer ASN to avoid over-weighting well-connected collectors.
    """
    parser = BgpkitParser(mrt_url)
    prefix_paths: dict[str, set[int]] = defaultdict(set)
    for elem in parser:
        if elem.type == 'A' and elem.as_path:
            # Deduplicate AS_PATH (remove prepending) before measuring length
            deduped = [a for i, a in enumerate(elem.as_path)
                       if i == 0 or a != elem.as_path[i - 1]]
            prefix_paths[elem.prefix].add(len(deduped))
    # Return median path length per prefix across all observing peers
    return {
        prefix: sorted(lengths)[len(lengths) // 2]
        for prefix, lengths in prefix_paths.items()
    }

def as_path_length_delta(
    blocked_prefix: str,
    control_prefix: str,
    path_lengths: dict[str, int],
) -> float | None:
    blocked_len  = path_lengths.get(blocked_prefix)
    control_len  = path_lengths.get(control_prefix)
    if blocked_len is None or control_len is None:
        return None
    return float(blocked_len - control_len)

Path length is computed using the median across all RIS peers that observed the prefix, not the minimum. The minimum path length is dominated by well-connected peers near the origin AS and would undercount the effective path length seen by Voidly's probes, which are at consumer ISPs with limited peering diversity.

Integration with the anomaly classifier

Two features derived from AS path analysis feed the anomaly classifier alongside the DNS, HTTP, and TLS interference signals:

as_path_diversity_probe_country: theas_path_diversity_score for the probe's country at measurement time. A low score increases the prior probability that the classifier should weight other signals more heavily, since path diversity is insufficient to localize the enforcement point.
as_path_length_delta: the difference in median path length between the target domain's prefix and the control resolver's prefix, drawn from the most recent full RIB snapshot.

SHAP importance values for these two features are 0.03 and 0.02 respectively in the current classifier version — small but non-zero. Their practical value is concentrated in two specific interference types: BGP withdrawal (whereas_path_length_delta is often the first measurable signal, before DNS or HTTP interference is detectable) and traffic throttling (where path length changes precede measurable latency increases). For the dominant interference types — DNS injection, TLS interception, and HTTP block pages — the AS path features add little independent signal because those mechanisms are applied at the application layer and do not alter BGP path topology.

The probe-control AS path colocation check reduces false positives in one additional way. If a probe and its assigned control server share an upstream transit AS (detectable by finding the control server's IP in the same AS-level neighborhood as the probe), the classifier down-weights the measurement's contribution to the country-level interference score. The measurement is retained in the dataset but tagged withcontrol_path_overlap = true, and its anomaly score is not used for threshold triggers or alert generation.

Limitations

Three limitations constrain the AS path analysis pipeline and are tracked explicitly in measurement metadata:

BGP table snapshot staleness. Full RIB snapshots are published every 8 hours; incremental update files every 5 minutes. The path length computation uses the most recent full RIB, which can be up to 8 hours old at the time a probe measurement is analyzed. During active routing events — a BGP prefix withdrawal in progress, a route leak, a BGP session reset — the actual AS paths seen by probes may differ significantly from the snapshot. Thebgp_snapshot_age_seconds field in the measurement record captures this staleness so downstream consumers can filter measurements taken during high-BGP-churn periods.

CAIDA AS-Rank update cadence. AS-Rank is refreshed monthly. Recent AS acquisitions, peering agreement changes, and new AS registrations are not immediately reflected in the relationship classification. An ASN acquired by a state-owned operator mid-month continues to carry its prior relationship classification until the next monthly ingest. This is particularly relevant in markets where government telecommunications companies have been actively acquiring private carriers.

IPv4-only AS path topology. Voidly currently computes AS path features exclusively from IPv4 routing tables. IPv6 routing topology differs meaningfully from IPv4: many ISPs have separate IPv6 transit agreements, and the AS path lengths and peer relationships for IPv6 prefixes can diverge from their IPv4 counterparts for the same origin AS. Measurements taken over IPv6 probes receive the IPv4-derived as_path_length_delta for now, tagged withas_path_ipv6_mismatch = true. Full IPv6 AS topology ingestion is on the roadmap for a future pipeline version.

For how Voidly uses per-ASN vantages to distinguish nationwide orders from selective ISP enforcement: Voidly's ASN-level blocking analysis: how censorship propagates across autonomous systems →

For how BGP prefix withdrawal patterns signal national internet shutdowns: BGP routing signals and internet shutdown detection: how Voidly uses IODA data →

For how Voidly ingests BGP routing data from RIPE NCC RIS, RouteViews, and bgp.tools: Voidly BGP data ingestion: parsing MRT dumps, detecting prefix withdrawals, and computing country outage scores →

For how AS-level data feeds into the country-level censorship aggregate: Voidly's country-level censorship score: aggregating 2.2B probe measurements into the global index →