Technical writing

Voidly's alert delivery system: PGP-encrypted email, webhooks, and RSS for censorship incidents

December 28, 2024· 9 min read· AI Analytics

CensorshipVoidlyInfrastructureReal-time systems

Detecting a censorship incident and delivering an alert about it are two separate engineering problems. The detection pipeline — probe measurements, anomaly scoring, cross-source corroboration, confidence-tier crossing — has been covered in earlier posts. This one covers what happens after: how the alert reaches a journalist in Tehran with a slow VPN connection, a researcher's TimescaleDB instance pulling a nightly export, and an NGO's Slack channel monitoring election-period blocking in real time.

The core constraint is that different subscribers need different delivery contracts. A journalist chasing a story needs a push notification within 2 minutes of VERIFIED confidence. A compliance team watching a specific domain needs a webhook they can replay. An academic researcher wants RSS sorted by country. Getting all three right from a single internal event stream required building four delivery paths that each make their own reliability guarantees.

Subscriber model

Every alert recipient is a Subscriber record with a filter specification and a delivery configuration. Filters are evaluated against each incident event before routing.

-- Subscriber table
CREATE TABLE alert_subscribers (
  subscriber_id  UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  api_key_hash   TEXT NOT NULL,           -- bcrypt of the API key
  delivery_type  TEXT NOT NULL,           -- 'webhook' | 'email' | 'rss_only'
  endpoint       TEXT,                    -- webhook URL or email address
  pgp_key_id     TEXT,                    -- for encrypted email delivery

  -- Filter criteria (NULL = match all)
  country_codes  TEXT[],                  -- e.g. {IR, CN, RU}
  interference_types TEXT[],             -- {dns_tamper, http_blocking, ...}
  domain_categories TEXT[],              -- {news_media, social_media, ...}
  min_confidence TEXT NOT NULL           -- 'ANOMALY' | 'CORROBORATED' | 'VERIFIED'
    DEFAULT 'CORROBORATED',
  domains        TEXT[],                  -- specific domain whitelist

  -- Delivery metadata
  hmac_secret    BYTEA,                   -- for webhook signing
  rate_limit_per_hour INT NOT NULL DEFAULT 20,
  active         BOOLEAN NOT NULL DEFAULT true,
  created_at     TIMESTAMPTZ NOT NULL DEFAULT now()
);

The filter evaluation runs in the alert router before any delivery attempt, keeping delivery workers simple. The router evaluates country_codes, interference_types, domain_categories, and domains with AND logic across filter groups: a subscriber asking for country_codes = {IR} and interference_types = {dns_tamper} only receives alerts that match both predicates.

def matches_filter(subscriber: Subscriber, incident: IncidentEvent) -> bool:
    if subscriber.country_codes and incident.country_code not in subscriber.country_codes:
        return False
    if subscriber.interference_types and incident.interference_type not in subscriber.interference_types:
        return False
    if subscriber.domain_categories and incident.domain_category not in subscriber.domain_categories:
        return False
    if subscriber.domains and incident.domain not in subscriber.domains:
        return False

    # Confidence tier ordering
    tier_order = {'ANOMALY': 0, 'CORROBORATED': 1, 'VERIFIED': 2}
    if tier_order[incident.confidence_tier] < tier_order[subscriber.min_confidence]:
        return False

    return True

Webhook delivery

Webhook delivery is the highest-priority path. After tier threshold crossing in the real-time pipeline, matched webhook subscribers receive a signed HTTP POST within 30 seconds. The payload schema is stable across all event types: only event_type and the data envelope change.

# Webhook payload (application/json)
{
  "event_type": "incident_verified",   # or incident_corroborated, incident_resolved
  "incident_id": "b3a7f...",
  "idempotency_key": "sha256:abc123",  # hash of (incident_id + event_type + confidence_tier)
  "timestamp": "2024-12-28T14:23:11Z",
  "data": {
    "incident_id": "b3a7f...",
    "country_code": "IR",
    "domain": "bbc.com",
    "interference_type": "dns_tamper",
    "confidence_tier": "VERIFIED",
    "p_blocked": 0.96,
    "started_at": "2024-12-28T12:01:00Z",
    "verified_at": "2024-12-28T14:22:50Z",
    "measurement_count": 34,
    "probe_count": 6,
    "asn_count": 4,
    "sources": ["voidly", "ooni"],
    "api_url": "https://voidly.ai/api/v1/incidents/b3a7f..."
  }
}

Every POST includes an X-Voidly-Signature header computed as HMAC-SHA256 over the raw request body using the subscriber-specific secret:

import hmac, hashlib

def sign_payload(body: bytes, secret: bytes) -> str:
    mac = hmac.new(secret, body, hashlib.sha256)
    return f"sha256={mac.hexdigest()}"

# Header: X-Voidly-Signature: sha256=abc123...

Subscribers verify the signature before processing. The signature covers the raw body bytes before JSON parsing, so any in-transit modification is detectable. We include a X-Voidly-Delivery-Id UUID per POST for debugging duplicate deliveries; the idempotency_key in the body is for subscriber-side deduplication and remains stable across retries.

Retry schedule

Delivery workers treat any non-2xx HTTP response or TCP error as a failure and re-queue with exponential backoff. After four failures, the event moves to a dead-letter queue and the subscriber receives an email notification.

Attempt	Delay	Cumulative wait
1 (initial)	—	0s
2	30s	30s
3	5 min	5m 30s
4	20 min	25m 30s
dead-letter	—	25m 30s + DLQ email

The 30s first retry handles transient HTTP 502/503 during subscriber deploys without generating noise. The 20-minute final attempt catches scenarios where a subscriber is doing a longer maintenance window. We do not retry beyond four attempts; at that point the subscriber's endpoint is likely broken in a way that self-heals on the next incident rather than on retry.

PGP-encrypted email

Journalists operating in high-risk environments often cannot use webhook receivers. The email delivery path is designed for subscribers who explicitly request it and have uploaded a PGP public key. All VERIFIED tier alerts are delivered as PGP/MIME encrypted messages (RFC 3156). CORROBORATED alerts are delivered as plaintext with a disclaimer, because encrypting time-sensitive pre-verification notifications to a key that may not be immediately accessible defeats the purpose.

From: alerts@voidly.ai
To: <recipient>
Subject: [VERIFIED] DNS tampering — Iran, bbc.com — Voidly incident b3a7f
Content-Type: multipart/encrypted; protocol="application/pgp-encrypted"

--boundary
Content-Type: application/pgp-encrypted
Version: 1

--boundary
Content-Type: application/octet-stream

-----BEGIN PGP MESSAGE-----
(encrypted MIME body containing incident JSON + readable summary)
-----END PGP MESSAGE-----

The email subject line is designed to be useful even in a client that doesn't render the encrypted body: it includes the confidence tier, interference type, country, domain, and incident ID so the recipient knows whether to decrypt immediately or defer. We deliberately avoid including measurement details in the subject to prevent metadata leakage if the email is intercepted.

PGP key management uses a local keyring populated from subscriber-uploaded armored public keys. We do not use a keyserver network for lookups — requiring an outbound keyserver query before sending an alert creates a dependency that can fail at exactly the wrong moment. Keys are validated at upload time and cached for 90 days before requiring subscriber re-upload.

RSS and Atom feeds

RSS feeds are generated per-country and per-confidence-tier. A researcher watching Iran can subscribe to https://voidly.ai/feed/ir/verified.xml and receive only VERIFIED incidents for that country. The global feed at /feed/global/corroborated.xml covers all 200 countries at CORROBORATED and above.

Feed generation runs on a 60-second cadence triggered by the event queue consumer. Each new event writes to the in-memory feed buffer, which is flushed to Cloudflare KV (TTL 300s) and served as static XML with appropriate cache headers. Feed items contain the full incident JSON in a CDATA block inside <description>alongside the human-readable summary:

<item>
  <title>[VERIFIED] Iran — DNS tampering — bbc.com</title>
  <link>https://voidly.ai/incidents/b3a7f</link>
  <pubDate>Sat, 28 Dec 2024 14:23:11 +0000</pubDate>
  <guid isPermaLink="true">https://voidly.ai/incidents/b3a7f</guid>
  <category>dns_tamper</category>
  <category>IR</category>
  <description><![CDATA[
    34 measurements across 6 probes in 4 ASNs confirm DNS tampering for bbc.com
    in Iran. Corroborated by OONI. Started 2024-12-28T12:01:00Z.
    {"incident_id":"b3a7f",...}
  ]]></description>
</item>

The JSON embed lets API clients scrape the feed instead of polling the REST API, which is useful for researchers who already have RSS infrastructure. We also publish Atom 1.0 versions of all feeds at the same paths with .atom extension; Atom is better suited for machine consumption because it has a mandatory <updated> element and a well-defined namespace for extensions.

Alert deduplication

A single censorship incident can generate multiple events as its confidence tier advances: it starts as ANOMALY, may cross CORROBORATED, and eventually reaches VERIFIED. A subscriber filtered to CORROBORATED-and-above should receive at most two alerts per incident (one for CORROBORATED, one for VERIFIED) — not one per measurement that crossed a threshold.

CREATE TABLE alert_delivery_log (
  log_id         UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  subscriber_id  UUID NOT NULL REFERENCES alert_subscribers(subscriber_id),
  incident_id    TEXT NOT NULL,
  event_type     TEXT NOT NULL,   -- 'incident_corroborated' | 'incident_verified' | ...
  delivered_at   TIMESTAMPTZ NOT NULL DEFAULT now(),
  delivery_type  TEXT NOT NULL,
  UNIQUE (subscriber_id, incident_id, event_type)  -- deduplicate per tier
);

Before routing an event to a subscriber, the router checks the delivery log using the(subscriber_id, incident_id, event_type) unique index. An alert is only sent if no matching log entry exists. This prevents re-alerting when retried events re-enter the queue and guarantees at-most-once delivery per (incident, event_type, subscriber) triple.

The 12-hour RESOLVED_PENDING state described in the incident resolution article creates a subtle edge case: if an incident transitions RESOLVED → RESOLVED_PENDING → VERIFIED within 12 hours, the event_type is incident_reopened rather than incident_verified, so the deduplication index allows a second alert. This is intentional: a journalist who already filed a story on the "resolved" incident needs to know it was false.

Rate limiting

Alert fatigue is the primary reason monitoring tools get turned off. We enforce rate limits at the subscriber level to prevent a noisy period (election crackdown, large-scale BGP withdrawal) from flooding delivery channels.

Limit scope	Default	Override
Per subscriber (all countries)	20 alerts/hour	Up to 200/hr on request
Per subscriber × country	5 alerts/hour	No override
Per subscriber × incident_id	1 per tier transition	Fixed (dedup log)
BGP withdrawal global	1 alert/30 min per country	No override

Rate limits use a sliding window counter in Redis. When a subscriber hits the hourly limit, queued alerts for that subscriber are held for up to 60 minutes before being retried; they are not dropped. This means a subscriber who receives 20 alerts in a burst at hour boundary will receive the remaining alerts at the start of the next window, with accurate timestamps showing when each incident was detected, not when the alert was delivered.

BGP withdrawal escalation

BGP withdrawal events that reach VERIFIED confidence get a separate escalation path: in addition to normal delivery, they trigger a Slack message to an on-call channel monitored by the Voidly team and available to subscribers who opt in. BGP withdrawal at the VERIFIED tier means an entire country has disappeared from the routing table, which is almost always a major news event. The escalation has a 2-minute cool-down per country to prevent routing table oscillation from generating a flood.

def should_escalate_bgp(incident: IncidentEvent, redis_client) -> bool:
    if incident.interference_type != 'bgp_withdrawal':
        return False
    if incident.confidence_tier != 'VERIFIED':
        return False
    # 2-minute cool-down per country
    key = f"bgp_escalation:{incident.country_code}"
    if redis_client.get(key):
        return False
    redis_client.setex(key, 120, "1")
    return True

Latency budget

The end-to-end latency from tier threshold crossing to webhook delivery has the following component budget at p50:

Stage	p50	p99
Threshold crossing → event queue	<100ms	<500ms
Event queue → router (Kafka consumer)	1.2s	4.8s
Router filter evaluation	8ms	22ms
Dedup log check (PostgreSQL)	3ms	11ms
Webhook HTTP POST (subscriber)	280ms	2.1s
Total (webhook path)	~1.6s	~7.5s

The dominant variable is the webhook POST latency to the subscriber's endpoint, which we cannot control. We set a 10-second timeout on each POST attempt; endpoints that regularly exceed 5 seconds get a warning in the subscriber dashboard. The Kafka consumer lag (event queue → router) is the second variable; we autoscale alert router replicas when lag exceeds 5 minutes.

Email delivery adds 60–120 seconds of SMTP queue latency on top of the webhook budget, making the total probe-to-inbox time roughly 8–10 minutes for VERIFIED incidents when accounting for the earlier pipeline stages. RSS feeds update within 60 seconds of the event being processed by the router.