Technical writing

Voidly's Server-Sent Events streaming API: real-time censorship incident subscriptions

January 13, 2025· 8 min read· AI Analytics

CensorshipVoidlyAPI designInfrastructure

The Voidly REST API is good for point-in-time queries — fetching active incidents, historical measurements, country summaries, domain histories. But journalists and monitoring systems that need to react to censorship events as they happen need something different: a connection that stays open and pushes events as they are verified. Voidly exposes this through Server-Sent Events at GET /v1/stream.

Why SSE instead of WebSockets

SSE is unidirectional — server to client — which matches the use case exactly. There is nothing for the client to send: subscribers want to receive a stream of verified censorship events, not engage in a bidirectional conversation. WebSockets add bidirectional overhead and require more complex connection management on both sides for no benefit in this context.

SSE works over plain HTTP/1.1 and HTTP/2, reuses existing Authorization headers without any additional handshake, and has native browser support via the EventSource API. For reconnection after a dropped connection, SSE's Last-Event-ID header provides reliable event replay without any client-side bookkeeping beyond storing the last received event ID. The browser EventSourceimplementation sends this header automatically on every reconnect — you do not need to implement reconnection logic in a browser context.

Endpoint and filtering parameters

The stream endpoint accepts a bearer token in the standard Authorization header and requires Accept: text/event-stream:

GET /v1/stream
Authorization: Bearer <api_key>
Accept: text/event-stream

All filtering is done via query parameters on the initial request. The server applies these filters for the lifetime of the connection — you do not need to re-send filters on reconnect as long as you use the same URL. Supported parameters:

country_code — ISO 3166-1 alpha-2, comma-separated for multiple countries (e.g. CN,IR,RU). Omit to receive events from all countries.
confidence_tier — anomaly, corroborated, or verified. Default: verified. Setting this to anomaly includes all three tiers; setting it to corroborated includes corroborated and verified.
interference_type — one of dns_tamper, tls_interference, http_blocking, bgp_withdrawal, or throttling. Comma-separated for OR logic. Omit to receive all interference types.
domain_category — OONI category code: NEWS, SMG (social media), HUMR (human rights), POLR (political), and others from the OONI test list taxonomy. Comma-separated for OR logic.

A connection filtered to country_code=CN,IR and confidence_tier=verified will only push events for incidents in China or Iran that have reached the verified confidence tier. Events that are created at the anomaly tier and later promoted to verified will trigger an incident_updated event at the point of promotion, not retroactively at creation.

Event types and SSE wire format

The SSE wire format uses the standard fields: id, event, and data. Each event's data field is a single-line JSON object. Events are separated by blank lines. Voidly emits four event types:

# New incident created
id: 8f3a1c2d
event: incident_created
data: {"incident_id":"inc_8f3a1c2d","country_code":"IR","domain":"bbc.com","interference_type":"dns_tamper","confidence_tier":"corroborated","p_blocked":0.87,"started_at":"2025-01-13T14:22:11Z"}

# Incident promoted to verified
id: 8f3a1c2e
event: incident_updated
data: {"incident_id":"inc_8f3a1c2d","confidence_tier":"verified","corroborating_sources":["ooni","censoredplanet"],"updated_at":"2025-01-13T14:47:33Z"}

# Incident resolved
id: 8f3a1c2f
event: incident_resolved
data: {"incident_id":"inc_8f3a1c2d","resolved_at":"2025-01-13T22:15:00Z","duration_hours":7.88}

# Keepalive (every 60s)
: keepalive 2025-01-13T15:00:00Z

The incident_created event fires when a new incident is first recorded at or above the subscribed confidence tier. incident_updated fires on tier promotions — anomaly to corroborated, corroborated to verified — and also when the corroborating_sources list grows (e.g., IODA confirms an event that was already verified by OONI). incident_resolved fires when the last measurement within an incident window is more than 2 hours old with no new matching measurements. The duration_hours field in the resolved event is computed as resolved_at - started_at in fractional hours.

The keepalive line is a SSE comment (lines beginning with :) and is not delivered to EventSource message handlers. Its purpose is to prevent load balancers and proxies from closing idle connections before any events arrive. Keepalives are sent every 60 seconds. If your connection receives no events and no keepalives for more than 90 seconds, treat the connection as stale and reconnect.

Reconnection and event replay

If a client disconnects and reconnects with the Last-Event-ID header set, the server replays all events with IDs greater than the supplied value. This replay comes from a 24-hour in-memory event ring buffer, sized at 10,000 events per connection filter. The filter here means the ring buffer is scoped to the same country_code, confidence_tier, and interference_type combination — reconnecting with the same parameters and the same Last-Event-ID will always replay the correct subset.

Events older than 24 hours cannot be replayed from the stream. To recover a gap larger than 24 hours, use the REST API with a since timestamp:

GET /v1/incidents?since=2025-01-12T14:00:00Z&country_code=IR&confidence_tier=verified

The event ID format is a monotonically increasing hex-encoded 32-bit integer. IDs are comparable with standard string comparison: an ID of 8f3a1c2e is greater than 8f3a1c2d. The sequence is global across all filtered streams — connecting with a different filter but the same Last-Event-ID will correctly replay events for that filter beginning after the specified point in the global sequence.

Python client example

The Python standard library does not include an SSE client. The example below uses httpx with its streaming response support. The outer while True loop handles reconnection: after any disconnect — clean or otherwise — the loop immediately re-enters the stream, sending the last seen event ID so the server can replay any missed events:

import httpx
import json

def stream_censorship_events(api_key: str, countries: list[str]):
    url = "https://api.voidly.ai/v1/stream"
    params = {
        "country_code": ",".join(countries),
        "confidence_tier": "verified",
    }
    headers = {
        "Authorization": f"Bearer {api_key}",
        "Accept": "text/event-stream",
        "Cache-Control": "no-cache",
    }
    last_event_id = None

    with httpx.Client(timeout=None) as client:
        while True:  # reconnect loop
            if last_event_id:
                headers["Last-Event-ID"] = last_event_id
            with client.stream("GET", url, params=params, headers=headers) as resp:
                resp.raise_for_status()
                for line in resp.iter_lines():
                    if line.startswith("id: "):
                        last_event_id = line[4:]
                    elif line.startswith("data: "):
                        event = json.loads(line[6:])
                        yield event

The generator yields one decoded event dict per event. To consume it:

for event in stream_censorship_events(api_key="sk-...", countries=["IR", "CN", "RU"]):
    incident_id = event.get("incident_id")
    event_type = event.get("confidence_tier") or event.get("resolved_at") and "resolved"
    print(f"{incident_id}: {event}")

A few notes on the implementation: the timeout=None on the httpx.Client disables the default read timeout, which is necessary because a long-lived stream connection with infrequent events would otherwise trigger a read timeout. The Cache-Control: no-cache header prevents intermediate caches from buffering the stream — some CDN-backed deployments require this to receive events without delay.

JavaScript / Node.js example

In browsers, the native EventSource API handles reconnection automatically and sends Last-Event-ID on every reconnect without any additional code. In Node.js, the eventsource package provides an identical interface:

const EventSource = require('eventsource');

const stream = new EventSource(
  'https://api.voidly.ai/v1/stream?country_code=CN,IR&confidence_tier=verified',
  { headers: { Authorization: 'Bearer <api_key>' } }
);

stream.addEventListener('incident_created', (e) => {
  const incident = JSON.parse(e.data);
  console.log(`New incident: ${incident.domain} blocked in ${incident.country_code}`);
});

stream.addEventListener('incident_updated', (e) => {
  const update = JSON.parse(e.data);
  console.log(`Updated: ${update.incident_id} → ${update.confidence_tier}`);
});

stream.addEventListener('incident_resolved', (e) => {
  const resolved = JSON.parse(e.data);
  console.log(`Resolved: ${resolved.incident_id} after ${resolved.duration_hours}h`);
});

stream.addEventListener('error', () => {
  // EventSource reconnects automatically with Last-Event-ID
});

The error handler here is intentionally empty: the EventSource implementation reconnects automatically and sends the correct Last-Event-ID. You only need to add logic to the error handler if you want to implement custom backoff, connection logging, or alerting on prolonged disconnections. The browser spec defines the default reconnect delay as 3 seconds, which the eventsource npm package also uses.

Rate limits and connection limits

The stream endpoint enforces per-API-key limits on concurrent connections and on event throughput. A single connection to the global unfiltered stream at the anomaly tier during a major shutdown event could push thousands of events per minute — the throughput limit prevents downstream consumers from being overwhelmed:

Tier	Concurrent stream connections	Events / min
Unauthenticated	1	60
API key (free)	2	300
API key (research)	10	unlimited

If the events-per-minute limit is exceeded, the server inserts a synthetic rate_limited event and then drops events until the rate falls within the limit. The synthetic event has no ID (so it is not replayed on reconnect) and carries a data field indicating how many events were dropped in the current window. If your use case requires higher throughput than the research tier, contact info@ai-analytics.org.

Exceeding the concurrent connection limit returns HTTP 429 with a Content-Type: application/problem+json body before the stream begins — the connection never enters the event-stream state. The response includes a Retry-After header if an existing connection is expected to close soon (i.e., it was opened more than 23 hours ago and is approaching the 24-hour server-side maximum lifetime).

How streaming differs from webhooks

The alert delivery article covers HMAC-signed webhooks. SSE and webhooks both deliver censorship events in real time but they have meaningfully different operational characteristics:

SSE is a pull model: the client opens and maintains the connection. Delivery is in-process — events arrive in the same process that established the stream. No retry infrastructure is needed on the client side because missed events are replayed via Last-Event-ID on reconnect. SSE connections do not survive client restarts without the client re-establishing the connection and replaying from its stored last event ID.

Webhooks are a push model: Voidly makes HTTP POST requests to your endpoint. This requires a publicly reachable server, HMAC signature verification on every delivery, and idempotency handling for retried deliveries. The webhook system includes exponential-backoff retry with a dead-letter queue, which means delivery is guaranteed at-least-once even through extended outages of your endpoint. Webhooks survive client restarts entirely — your server will receive events it missed while it was down, up to the DLQ expiry window.

The right choice depends on the integration context:

Monitoring dashboards and newsroom alert systems — use SSE. The connection model is simpler, the EventSource API handles reconnection automatically, and in-browser delivery requires no server infrastructure on your side.
Production compliance workflows and integrations that need audit trails — use webhooks. The at-least-once delivery guarantee, HMAC-verified provenance, and dead-letter queue make webhooks the appropriate choice when missing an event has a compliance cost.

For the request-response complement to this streaming endpoint: The Voidly REST API: querying the global censorship index in real time →

For what the confidence_tier values in stream events mean and how incidents are promoted between tiers: From anomaly to verified incident: the Voidly confidence tier system →

For the HMAC-signed webhook alternative to SSE — push delivery with at-least-once guarantees and a dead-letter queue: Voidly's alert delivery system: PGP-encrypted email, webhooks, and RSS →

For how an anomaly becomes a stream event in the first place — the full pipeline from raw measurement to published incident: Voidly's real-time event pipeline: from measurement anomaly to journalist alert in under 8 minutes →