Technical writing

Geoblocking vs. censorship: how Voidly distinguishes licensing restrictions, CDN geofencing, and GDPR blocks from government-ordered blocking

September 29, 2025· 9 min read· AI Analytics

CensorshipVoidlyMethodologyInfrastructure

When a probe in Turkey can't reach BBC iPlayer, that is not censorship — it is geoblocking. BBC iPlayer restricts access to UK residents under its Royal Charter obligations, and it actively blocks non-UK IPs at the application layer. If Voidly counted every geoblocked domain as a censorship incident, the Turkey country score would be inflated with false positives, and the noise would undermine the signal for actual government-ordered blocking. This post covers how Voidly distinguishes geoblocking from censorship at every layer of the measurement pipeline.

What geoblocking looks like at the wire level

Geoblocking takes several forms, each with a distinct signature:

HTTP 451 Unavailable For Legal Reasons. This status code (RFC 7725) was specifically designed for geoblocking and legal takedown scenarios. A properly implemented geoblocker returns 451 with aLink header pointing to the authority responsible. In practice, only a minority of geo-blocking services use 451; most use 403 or a redirect to a “not available in your region” page. Voidly checks for 451 explicitly and classifies it as geoblock_legal rather than censorship.

Streaming service “not available” page.Netflix, BBC iPlayer, Spotify, and most major streaming services return an HTTP 200 response with a custom block page whose body contains well-known fingerprints: “not available in your country”, “this content is only available in”, or service-specific language. The 2,300-entry block page fingerprint library includes entries for geoblocking pages, tagged with geoblock_commercial rather than censorship_*. A hit against a geoblock fingerprint suppresses the censorship classification for that measurement.

GDPR access restriction. Some US-based news sites (particularly local newspapers) chose to block EU visitors wholesale rather than implement GDPR-compliant consent flows. These blocks are legal compliance decisions, not censorship. Their block pages typically contain the phrase “we do not offer our website to users in your country” and are served to EU vantage points. Voidly handles these with a GDPR geoblock fingerprint category, which fires only when the vantage country is in the EU/EEA and the domain is on the GDPR-restricted list.

Multi-country probe comparison

The most reliable geoblocking signal comes not from the response content but from the measurement pattern across multiple vantage countries. A government censorship event blocks a domain for all vantage points in that country. A commercial geoblock blocks a domain for vantage points in a specific set of countries (determined by the content provider's licensing agreements) and leaves it accessible in others.

def classify_geographic_pattern(
    domain: str,
    domain_measurements: list[Measurement],
    window_hours: int = 4,
) -> GeographicPattern:
    """
    Classify whether a blocking pattern looks like a country-specific
    government block or a multi-country commercial geoblock.
    """
    recent = [m for m in domain_measurements
              if m.hours_ago <= window_hours and m.is_anomalous]

    by_country = defaultdict(list)
    for m in recent:
        by_country[m.vantage_country].append(m)

    blocked_countries = {cc for cc, ms in by_country.items()
                         if mean_confidence(ms) >= 0.6}
    accessible_countries = {cc for cc, ms in by_country.items()
                            if mean_confidence(ms) < 0.3}

    if len(blocked_countries) == 1:
        return GeographicPattern.SINGLE_COUNTRY  # classic censorship
    if len(blocked_countries) >= 5 and len(accessible_countries) >= 3:
        # Blocked in many countries, accessible in others → likely commercial geoblock
        return GeographicPattern.MULTI_COUNTRY_SELECTIVE
    if len(blocked_countries) >= 10:
        # Blocked nearly everywhere → domain might be offline
        return GeographicPattern.GLOBAL_OUTAGE
    return GeographicPattern.AMBIGUOUS

A SINGLE_COUNTRY pattern is the strongest indicator of government censorship — a domain is accessible everywhere in the world except one country. A MULTI_COUNTRY_SELECTIVE pattern (blocked in, say, France, Germany, Italy, Spain, but accessible in the US, UK, and Japan) is the signature of GDPR compliance blocking by a US media outlet, or of a streaming service's licensing territory restrictions.

The comparison requires enough probe coverage across multiple countries within the same 4-hour measurement window. For the roughly 20 high-risk countries where censorship most commonly occurs, Voidly has at least 2 probe vantage points per country, enabling this comparison. For smaller countries with limited probe coverage, geoblocking classification falls back to block page fingerprinting alone.

Domain category weighting

Domain categories are the most efficient filter: a domain in thestreaming_media or ecommerce category is unlikely to be the subject of government censorship (in most countries), while a domain innews_media, political_opposition, orhuman_rights has a high prior probability of censorship when blocked.

GEOBLOCK_PRIOR = {
    'streaming_media': 0.82,       # most inaccessibility is commercial geoblock
    'ecommerce': 0.61,             # Shopify-hosted stores with licensing restrictions
    'sports': 0.74,                # broadcast rights frequently geoblocked
    'gaming': 0.55,                # regional release gates
    'news_media': 0.08,            # low prior for geoblock; high prior for censorship
    'political_opposition': 0.03,
    'human_rights': 0.04,
    'circumvention': 0.06,
    'social_media': 0.12,
    'education': 0.21,
    'default': 0.18,
}

These prior probabilities are multiplied into the classifier's posterior before the anomaly classification is finalized. A streaming domain blocked in a single country gets its p_geoblock posterior boosted before publication, which may keep it below the Corroborated tier threshold even if the probe-level evidence looks like censorship.

The CDN split-horizon problem

CDN split-horizon DNS is the hardest false positive case. A CDN-hosted domain legitimately returns different DNS responses from different geographic vantage points: a probe in Singapore might get an Akamai edge IP in Singapore, while a probe in Germany gets a Frankfurt edge IP. To the DNS comparison logic, the control server in the US gets a Virginia edge IP while the probe in Singapore gets a Singapore IP — the IPs differ, which looks like DNS tampering.

Voidly handles CDN split-horizon with three techniques:

CDN IP range classification. We maintain an ASN-to-CDN mapping for the 15 largest CDN operators: Akamai, Cloudflare, Fastly, CloudFront, Limelight, Edgio, and others. When both the control DNS response and the probe DNS response resolve to IPs belonging to the same CDN operator (same ASN or known CDN ASN group), the DNS comparison is classified as split_horizon_cdn, not tampered.

CDN_ASN_GROUPS = {
    'akamai': {20940, 16625, 18717, 35994, 43639, 2, 32787},
    'cloudflare': {13335, 209242},
    'fastly': {54113, 394536},
    'cloudfront': {16509},
    'google': {15169, 396982, 36492, 36040},
}

def is_cdn_split_horizon(control_asn: int, probe_asn: int) -> bool:
    for cdn, asns in CDN_ASN_GROUPS.items():
        if control_asn in asns and probe_asn in asns:
            return True
    return False

HTTP-layer validation. Even when DNS returns a different IP than the control server, if the HTTP response body matches the control body (same SHA-256 hash) or has a similarity above 0.95 (SimHash), the measurement is classified as accessible despite the DNS mismatch. CDN split-horizon is a DNS layer phenomenon; the actual content served is identical.

Historical IP set baseline. For every domain in the test list, Voidly maintains a 30-day rolling set of observed IPs across all probes. A DNS response that returns an IP not previously seen for this domain gets higher scrutiny; a DNS response that returns an IP within the historical set (or a sibling IP in the same /24 prefix) is more likely legitimate CDN routing than tampering.

Domain unavailability vs. censorship

A fourth scenario that can look like censorship: the domain itself has gone offline. If a news site server crashes, all probes globally will fail to connect. The geographic pattern classifier catches most of these — a GLOBAL_OUTAGEpattern (blocked in nearly every country) routes to the unavailability queue rather than the censorship queue. But domains with limited probe coverage may only show failure in a subset of countries, which can look like a selective block.

def check_global_unavailability(
    domain: str,
    current_failure_rate: float,   # fraction of probes failing in last 4h
    baseline_failure_rate: float,  # 30-day average failure rate for this domain
) -> bool:
    """
    Returns True if the current failure rate looks like global unavailability
    rather than geographic censorship.
    """
    # Sudden jump in failure rate globally is more consistent with outage than censorship
    relative_increase = current_failure_rate / max(baseline_failure_rate, 0.01)
    if relative_increase > 5.0 and current_failure_rate > 0.60:
        # Also check: is the failure pattern geographically clustered or dispersed?
        # (If dispersed: likely outage. If clustered to one country: censorship.)
        return True  # Route to unavailability queue; recheck in 2 hours
    return False

Voidly maintains a per-domain 30-day baseline failure rate that accounts for domains that are chronically unreliable independent of censorship — low-budget news sites in fragile countries that go offline frequently due to hosting costs, DDoS attacks, or infrastructure failures. A domain with a 30-day baseline failure rate of 40% is not experiencing censorship when half its probes fail; it is behaving normally.

Combining the signals

The geoblocking classifier combines all these signals into a per-measurementp_geoblock score that competes with the censorship classification in the anomaly classifier's output:

def compute_geoblock_score(
    m: Measurement,
    geo_pattern: GeographicPattern,
    has_geoblock_fingerprint: bool,
    is_cdn_split_horizon: bool,
    domain_category: str,
) -> float:
    score = GEOBLOCK_PRIOR[domain_category]

    if m.http_status == 451:
        score = min(1.0, score + 0.40)
    if has_geoblock_fingerprint:
        score = min(1.0, score + 0.35)
    if geo_pattern == GeographicPattern.MULTI_COUNTRY_SELECTIVE:
        score = min(1.0, score + 0.25)
    if is_cdn_split_horizon:
        score = min(1.0, score + 0.30)
    if geo_pattern == GeographicPattern.GLOBAL_OUTAGE:
        score = min(1.0, score + 0.45)
    if geo_pattern == GeographicPattern.SINGLE_COUNTRY:
        score = max(0.0, score - 0.20)  # single-country block is less likely geoblock

    return score

A p_geoblock score above 0.70 suppresses the censorship classification entirely — the measurement is tagged geoblock and does not contribute to the country's censorship score or incident tracking. A score between 0.40 and 0.70 sets a geoblock_possible flag on the measurement, which reduces the weight of the measurement in the confidence score by 0.5× but does not suppress it. Below 0.40, the geoblock hypothesis is rejected and normal censorship classification proceeds.

The field is exposed in the Voidly dataset schema as p_geoblock(0.0–1.0) alongside geoblock_reason (one of:http_451, fingerprint_match,multi_country_selective, cdn_split_horizon,domain_category, global_outage), allowing analysts to audit suppressed measurements and override the classification for their specific research context.

For how the block page fingerprint library identifies government censorship signatures vs. commercial block pages: Voidly's block page fingerprint library: detecting censorship signatures across 2,300+ known pages →

For how cross-source corroboration from OONI, CensoredPlanet, and IODA filters residual geoblocking false positives: Cross-source censorship verification: reconciling OONI, CensoredPlanet, and IODA →

For the control server methodology that distinguishes CDN split-horizon from genuine DNS tampering: The Voidly control server: how we tell censorship from a bad network →

For how geoblocking suppression affects the country-level censorship index: Voidly's country-level censorship score: aggregating 2.2B probe measurements into the global index →

For how probes detect DNS injection — the interference type that geoblocking filtering is designed NOT to suppress: How Voidly detects DNS injection: forged responses, injection rates by country, and pipeline integration →