Technical writing

Voidly's URL test list: how we curate the domains that reveal internet censorship

May 12, 2025· 8 min read· AI Analytics

CensorshipVoidlyMethodologyData engineering

The test list is where internet censorship measurement begins and ends. Every anomaly Voidly detects, every verified incident in the public dataset, every country-level timeline — all of it traces back to a curated list of URLs that our probes actively test. If a blocked domain isn't on the list, the block goes undetected. No classifier, no probe architecture, no cross-source reconciler can surface a block that we never probed for. The test list is the foundation, and its curation decisions have direct consequences for what censorship the world can see and what remains invisible.

Starting from Citizen Lab's global test list

We don't start from scratch. Citizen Lab maintains a global test list at github.com/citizenlab/test-lists that has become the de facto standard for internet censorship measurement. The global list contains roughly 2,000 URLs representing a cross-section of content categories that governments have historically targeted. OONI, CensoredPlanet, and ICLab all probe subsets of it. We do too, and we contribute back when we identify gaps.

Citizen Lab's list organizes URLs into 12 OONI category codes: NEWS (news outlets and independent media), COMM (social media and communications platforms), HUMR (human rights organizations), POLR (political parties and criticism), LGBT (LGBTQ+ resources), SRCH (search engines), PORN (adult content), ALDR (alcohol and drugs), GAME (gaming), MMED (multimedia and streaming), FILE (file sharing), and GRP (hosting and CDN proxies). Each entry carries a country code — ZZ for globally relevant URLs, or a two-letter ISO code for country-specific entries — plus a date added and optional notes from the contributor.

The format is a simple CSV. A representative slice looks like this:

# url,category_code,category_description,date_added,source,notes
https://www.hrw.org,HUMR,Human Rights,2014-04-02,citizenlab,
https://www.amnesty.org,HUMR,Human Rights,2014-04-02,citizenlab,
https://twitter.com,COMM,Social Networking,2014-04-02,citizenlab,
https://www.facebook.com,COMM,Social Networking,2014-04-02,citizenlab,
https://www.bbc.com,NEWS,News Media,2014-04-02,citizenlab,
https://www.rferl.org,NEWS,News Media,2014-04-09,citizenlab,Radio Free Europe/Radio Liberty
https://ilga.org,LGBT,LGBT,2014-04-02,citizenlab,
https://www.grindr.com,LGBT,LGBT,2016-08-11,citizenlab,
# Country-specific entries use ISO codes rather than ZZ
https://meduza.io,NEWS,News Media,2021-04-19,citizenlab,RU — blocked by Roskomnadzor

The measurement budget problem

A probe can't test every URL on every run. The global list alone has ~2,000 entries, and per-country lists add hundreds more. Testing a single URL takes roughly 3 seconds when you account for DNS resolution, the TCP handshake, the HTTP GET request, and the TLS handshake for HTTPS targets. That's a conservative estimate on a functioning connection — on a degraded network, individual probes regularly take 8–12 seconds.

Voidly probes run on volunteer-operated infrastructure. We target 90-second probe cycles to keep the battery and bandwidth burden manageable for operators. At 3 seconds per URL, that leaves roughly 30 URLs per run — about 1.5% of the full combined list. The selection can't be random: a uniform random draw would give every category equal probability, but categories are not equally likely to be censored. Probing gaming sites at the same rate as human rights organizations wastes measurement budget on low-risk targets.

We address this with a priority scoring function that combines three signals: traffic rank (via the Tranco top 1M list), category weight, and a country relevance boost for entries that match the probe's country or are globally scoped:

def url_priority(url: str, category: str, country: str, probe_country: str) -> float:
    base = TRANCO_RANK_SCORE.get(url, 0.5)          # 0–1 from traffic rank
    cat_weight = CATEGORY_WEIGHTS.get(category, 1.0) # HUMR/POLR/LGBT = 3.0
    country_boost = 2.0 if country in (probe_country, 'ZZ') else 0.8
    return base * cat_weight * country_boost

The CATEGORY_WEIGHTS dict assigns HUMR, POLR, and LGBT a 3× multiplier because these categories show the highest historically confirmed block rates across our verified incident dataset. COMM and NEWS carry 2×. The remaining categories run at 1×. The country boost means a URL scoped to the probe's country is tested 2.5× more often than a foreign URL in the same category. The Tranco rank gives high-traffic domains a slight lift — a blocked Facebook matters more than a blocked personal blog, and high-traffic domains are more likely to be independently confirmed by external sources if we do detect a block.

Per-country supplemental lists

The global list is built for global coverage, which means it misses local context by design. A Belarusian opposition news site won't appear on a list that needs to be meaningful in 200 countries simultaneously. The global list knew about the BBC in 2014; it took years for Meduza to appear. In high-risk countries, the most important URLs to probe are often the ones that aren't famous enough to be on anyone's global radar until after they've already been blocked.

We maintain supplemental lists for 37 countries classified as high-risk based on Freedom House's Internet Freedom scores, RSF press freedom rankings, and OHCHR country-specific reporting. Sources for supplemental entries include those organizations directly, local journalist networks, and tip submissions from in-country researchers. In authoritarian contexts, POLR and NEWS carry higher baseline risk than our global category weights reflect, so supplemental list entries in those categories receive an additional 1.5× weight on top of the standard category multiplier.

We also monitor newly-registered domains in high-risk country TLDs — .by, .ru, .kz, .cn, .ir — for domains that match patterns associated with independent media or civil society organizations. New registrations that attract significant traffic within 60 days of registration are candidates for supplemental list inclusion. That's how we caught zerkalo.io early: it launched as an emergency replacement for a blocked outlet and hit significant traffic within a week.

A supplemental list entry looks like this:

# supplemental/BY.yaml  — Belarus
entries:
  - url: https://zerkalo.io
    category: NEWS
    added: 2021-08-11
    note: "Major independent Belarusian outlet, blocked since Aug 2021 crackdown"
  - url: https://nexta.tv
    category: POLR
    added: 2020-08-10
    note: "Opposition Telegram-native outlet, actively targeted"
  - url: https://spring96.org
    category: HUMR
    added: 2021-09-01
    note: "Viasna Human Rights Center"

Quarterly curation and emergency additions

The core list goes through a structured review every quarter. The review covers three categories of changes: removing dead domains (DNS NXDOMAIN for 90+ days), updating URLs for outlets that changed TLD or domain, and retiring domains that have been unblocked for 12 or more consecutive months. Unblocking is rarer than blocking — when it happens, it's often the result of a change in government or a successful legal challenge — but keeping long-unblocked domains on the active list wastes probe budget and dilutes country-level block rate statistics.

Quarterly cadence works for stable conditions. It doesn't work when a coup happens on a Tuesday. We maintain an emergency addition pipeline that can push a list update to all probes within 6 hours of a triggering event. The pipeline is gated on editorial review — we don't add domains automatically from a news feed — but the review process is compressed to a single approver rather than the full curation committee.

Two historical examples illustrate the cadence. When the Myanmar military staged its coup in February 2021, we identified 14 domains associated with independent Burmese media and civil society, reviewed them, and pushed the list update within 4 hours of the first reports of network interference. When the Mahsa Amini protests began in Iran in September 2022, we added 11 domains within 12 hours of the protests becoming visible in our anomaly feed — by which point we were already seeing confirmed blocks on Instagram and WhatsApp. In both cases, having the relevant domains on-list before the censorship peaked meant we captured the full escalation arc rather than discovering the interference after the fact.

Every version of the list is tagged in git with a UTC timestamp and a human-readable description of the triggering event. The git history is the audit trail: you can reconstruct exactly which domains were being probed on any given day, which matters when journalists ask why we were or weren't measuring a specific domain during a specific event.

The test list as a political document

What's on the test list reveals what the internet freedom measurement community considers worth monitoring. That's a political judgment, not a neutral technical one. The 12 OONI category codes encode implicit theories about what kinds of content governments censor and what kinds of censorship matter enough to track. The decision to weight HUMR and POLR at 3× and GAME at 1× reflects a value judgment about whose speech is most at risk.

What's not on the list is equally consequential. Some governments have pressured organizations — through legal action, through diplomatic channels, or through direct threats to in-country researchers — to remove their domains from published test lists. A domain that isn't probed for can't generate a verified incident. The absence of a domain from the list is therefore itself a data point, though a hard one to study systematically.

We publish Voidly's supplemental lists under CC BY 4.0. Every country-specific list is reviewed by at least one regional civil society partner before publication — typically an organization with direct knowledge of the country's media environment — both to catch errors and to ensure that our curation decisions reflect the expertise of people with real stakes in the outcome.

The coverage problem is structural and worth being honest about: even the most comprehensive curated test list captures only a fraction of what's blocked in countries like China or Iran. Both use pervasive keyword and IP-level filtering that blocks millions of URLs by pattern rather than by specific domain. Our measurement approach — probing a discrete list of URLs — is well-suited to detecting targeted censorship of specific outlets and platforms, but it systematically understates the total censorship surface in countries that block by IP range or deep-packet inspection. The test list is the right tool for detecting that bbc.com is blocked in a given country. It is the wrong tool for estimating how much of the open internet is inaccessible there.

For how probe vantages are selected to maximize ASN diversity across these 200 countries: Voidly probe vantage selection: ASN diversity, operator safety, and reaching hard-to-measure countries →

For how Voidly's probe measures each URL in this list: How Voidly's probe measures each URL in this list →

For how measurements are classified into interference types: How measurements are classified into interference types →

For how Voidly reconciles results across OONI, CensoredPlanet, and IODA: How Voidly reconciles results across OONI, CensoredPlanet, and IODA →