Technical writing
Voidly probe local measurement buffer: SQLite ring buffer, batch compression, and resilient upload
A Voidly probe in Iran faces a structural problem: the censorship infrastructure it is measuring also disrupts its upload path. When the WireGuard tunnel is blocked, the probe can't send measurements to the collector. Without local buffering, every measurement taken during a 30-minute disruption is permanently lost — exactly the measurements taken during an active censorship event, which are also the most valuable ones.
The solution is a local measurement buffer: a SQLite database on the probe's device that stores every measurement as it is recorded, independent of upload status. When connectivity resumes, the probe drains the buffer in batches. The buffer holds 72 hours of capacity at normal measurement rates, meaning the probe can survive extended upload failures and deliver a complete dataset when the tunnel comes back up.
This post covers the buffer design: the SQLite ring buffer schema and eviction logic, LZ4 batch compression, Ed25519 batch signatures, the priority queue for anomalous measurements, exponential backoff retry, and partial batch delivery with per-chunk acknowledgment.
The upload disruption problem
Upload disruptions have two causes. The first is transient network instability — brief connectivity losses that resolve in seconds or minutes. The second is active interference: the deep-packet-inspection (DPI) infrastructure that the probe is measuring also classifies WireGuard traffic and can block it selectively during periods of elevated censorship activity. This second cause is the harder one: blocks lasting 30 minutes or more are common in Iran and Myanmar during government-ordered network interventions.
Without buffering, the data loss is asymmetric. Measurements taken during normal operation upload immediately and are never at risk. Measurements taken during an active censorship event — when the probe is recording blocks, DNS tampering, and TLS failures — are the ones that fail to upload. The buffer inverts this: the probe records everything locally first, uploads second, and the upload can retry indefinitely until the tunnel is restored.
The SQLite ring buffer
The local measurement store lives at ~/.voidly/measurement_store.db on Linux and macOS, and at %APPDATA%\Voidly\measurement_store.db on Windows. The Tauri backend opens it with PRAGMA journal_mode=WAL and PRAGMA synchronous=NORMAL — WAL for concurrent read/write access from the measurement loop and the upload thread, NORMAL for durability without the full fsync cost of FULL mode.
The schema:
CREATE TABLE measurement_buffer ( id INTEGER PRIMARY KEY AUTOINCREMENT, measurement_id TEXT NOT NULL UNIQUE, domain TEXT NOT NULL, test_type TEXT NOT NULL, -- 'http', 'dns', 'tcp', 'tls' result_json TEXT NOT NULL, -- compressed JSON blob probe_run_id TEXT NOT NULL, is_anomalous INTEGER NOT NULL DEFAULT 0, recorded_at TEXT NOT NULL, uploaded_at TEXT, -- NULL until successfully acknowledged batch_id TEXT -- set when assigned to an upload batch ); CREATE INDEX idx_measurement_buffer_upload ON measurement_buffer(uploaded_at, is_anomalous);
The ring buffer pattern is enforced after each insert. If the row count exceeds 50,000 (72-hour capacity at 1,000 tests/day), the probe deletes the oldest non-anomalous rows to bring the total back to 48,000. The eviction query:
DELETE FROM measurement_buffer
WHERE id IN (
SELECT id FROM measurement_buffer
WHERE is_anomalous = 0
AND uploaded_at IS NULL
ORDER BY recorded_at ASC
LIMIT ? -- eviction_count
);Anomalous measurements — those where is_anomalous = 1 — are never evicted by the ring buffer, regardless of age. A probe can record 200 anomalous measurements, fill the buffer to its soft limit with normal measurements, and the anomalous rows will survive. They are removed only after the server acknowledges receipt.
LZ4 batch compression
Measurements are assembled into batches of up to 500 before upload. Each batch is compressed with LZ4 before transmission. The median raw batch size is 47 KB (500 measurements at ~94 bytes each). Median compressed: 9 KB. That is a 5.2x reduction — meaningful when the probe is uploading over a metered mobile connection or through a congested tunnel.
LZ4 is chosen over zstd despite lower compression ratio because of latency: LZ4 compresses a 500-measurement batch in 1.2 ms. zstd level 3 takes 8.4 ms for the same input. The difference matters because batch assembly happens on the main upload thread — 7 ms of extra latency per batch adds up when the probe is draining a full 50,000-row buffer after a 6-hour connectivity outage.
The batch structure before compression:
interface UploadBatch {
batch_id: string;
probe_id: string;
created_at: string;
measurement_count: number;
measurements: MeasurementResult[];
// Fields added immediately before upload:
signature: string; // Ed25519 over SHA-256 of uncompressed JSON
sequence: number; // monotonic, resets on probe restart
}Priority queue for anomalous measurements
Normal measurements are batched in FIFO order — oldest unuploaded rows first. But measurements where is_anomalous = 1 follow a different path: they are placed in a priority queue and included in the next available upload batch regardless of their age relative to the upload backlog.
The threshold for is_anomalous = 1 is classifier confidence above 0.6 on any of the five interference classes: DNS manipulation, TCP injection, TLS interception, HTTP blockpage, or BGP-level unreachability. When the probe detects an active block, that measurement is immediately written to SQLite with is_anomalous = 1 and added to the priority queue in memory.
The effect: when the upload channel is open, anomalous measurements reach the server within 8 minutes of detection — even if the probe has a multi-hour backlog of normal measurements queued ahead of them. The priority queue holds up to 200 measurements in memory. If the queue is full (not the common case — 200 anomalies in a short window means an extreme event), additional anomalous measurements still get the is_anomalous flag in SQLite and are never evicted, but they wait in the normal upload queue rather than jumping ahead.
Upload retry with exponential backoff
The upload retry loop is a simple state machine in Rust. On each failed attempt, the delay doubles, capped at 4 hours:
pub struct UploadRetryState {
pub attempt: u32,
pub next_retry_at: Instant,
pub last_error: Option<UploadError>,
}
impl UploadRetryState {
pub fn backoff_delay(attempt: u32) -> Duration {
let base = Duration::from_secs(30);
let max = Duration::from_secs(4 * 3600); // 4 hours
let delay = base * 2u32.saturating_pow(attempt.min(7));
delay.min(max)
}
// Attempt 0: 30s, 1: 60s, 2: 120s, 3: 240s,
// 4: 480s, 5: 960s, 6: 1920s, 7+: 14400s
}The minimum retry interval of 30 seconds is chosen so that brief connectivity losses resolve before the first retry. The maximum of 4 hours ensures the probe doesn't flood the tunnel with connection attempts during a sustained block, while still resuming upload within a reasonable window once connectivity returns.
On a successful upload (server returns a batch_ack), the retry state resets to attempt 0. On a transient error (network timeout, DNS failure), the attempt counter increments. On a terminal error (INVALID_SIGNATURE after 3 consecutive attempts), the probe flags a key compromise condition and triggers the key regeneration flow.
Partial batch delivery and chunk resumption
A 500-measurement batch compresses to roughly 9 KB — small enough to upload as a single HTTP request in most conditions. But during congested or severely throttled connections, even 9 KB can time out mid-transfer. To handle this, each batch is split into 64 KB chunks before compression, and each chunk is uploaded separately.
Each chunk carries its batch_id, chunk_index, and total_chunks in the upload metadata header. The server acknowledges received chunks individually. If upload fails mid-batch — say, on chunk 5 of 8 — the server has already acknowledged chunks 0 through 4. On retry, the probe queries the server for which chunks of batch_id it has received and skips them, uploading only chunks 5, 6, and 7.
This matters most during buffer drain after a long outage. Without per-chunk acknowledgment, a probe draining 50,000 measurements over a flaky connection would re-send the same first 400 measurements every time a mid-batch failure occurred. With per-chunk resumption, each successful chunk transfer is permanent, and the probe makes monotonic progress through the backlog regardless of how many times individual transfers fail.
Ed25519 batch signatures
Before compression, each batch is signed. The signature covers the full uncompressed JSON serialization of the batch:
// Pseudocode for the signing step
let batch_json = serde_json::to_vec(&batch)?;
let hash = sha2::Sha256::digest(&batch_json);
let signature = probe_signing_key.sign(&hash);
// Signature goes into the upload metadata header,
// not inside the compressed body.
let upload_header = UploadHeader {
batch_id: batch.batch_id.clone(),
probe_id: batch.probe_id.clone(),
signature: base64::encode(signature.to_bytes()),
chunk_index,
total_chunks,
};The server verifies the signature before writing any measurements from the batch to TimescaleDB. Forged or corrupted batches are rejected with an INVALID_SIGNATURE error. If the probe receives INVALID_SIGNATURE on three consecutive batches — suggesting that its signing key has been compromised or corrupted — it enters a key regeneration flow: generates a new Ed25519 keypair, stores the private key in the OS keychain, and initiates re-registration with the collector.
The Ed25519 signing key is distinct from the X25519 WireGuard key described in the probe architecture post. The WireGuard key authenticates the tunnel connection; the signing key authenticates the measurement data inside the tunnel. A compromise of one does not compromise the other.
Upload acknowledgment and buffer cleanup
When the server successfully receives and verifies a batch, it returns a batch_ack response:
{
"batch_id": "01JD4K...",
"accepted": 498,
"rejected": 2,
"reject_reasons": [
"duplicate_measurement_id",
"future_timestamp"
]
}The probe processes the acknowledgment immediately. For the 498 accepted measurements, it sets uploaded_at = NOW() in SQLite. For the 2 rejected measurements, it writes the rejection reason to a rejection_reason column and sets a do_not_retry flag, excluding them from future upload attempts.
Rows don't accumulate indefinitely after upload. A nightly vacuum removes all rows where uploaded_at IS NOT NULL and uploaded_at is older than 24 hours:
DELETE FROM measurement_buffer
WHERE uploaded_at IS NOT NULL
AND uploaded_at < datetime('now', '-24 hours');
VACUUM;The 24-hour retention window after upload exists so that the probe can re-send a batch if the server's acknowledgment is lost in transit (a rare but possible edge case where the server wrote the measurements but the TCP connection dropped before the ack arrived). If the server receives a duplicate batch, it deduplicates on measurement_id and returns a normal batch_ack.
Tauri storage path configuration
The measurement store path and buffer parameters are configurable via the Tauri app's settings file:
{
"storage": {
"measurement_db_path": "~/.voidly/measurement_store.db",
"max_rows": 50000,
"priority_queue_size": 200,
"retention_hours": 72
}
}On Windows, the default path is %APPDATA%\Voidly\measurement_store.db. The max_rows limit (50,000) controls when ring buffer eviction triggers.priority_queue_size (200) sets the maximum number of anomalous measurements held in the in-memory priority queue. retention_hours (72) is informational — it documents the intended buffer capacity at 1,000 measurements/day, but the actual capacity bound is max_rows.
Operators with unusually high measurement rates can increase max_rows to extend buffer capacity. The practical limit is disk space: at 94 bytes per measurement JSON plus SQLite overhead, 200,000 rows consumes roughly 30 MB — well within the available storage on any device that can run the probe application.
Measurement loss in production
Over 6 months across 37 probes, the observed measurement loss rate is 0.003% — 3 measurements lost per 100,000 recorded. Both loss events occurred during probe OS crashes (not orderly shutdowns), where SQLite WAL recovery failed on two separate occasions. The WAL file contained writes that had not yet been checkpointed to the main database file at the time of the crash, and the recovery process reported the WAL as corrupted.
The ring buffer has never reached capacity on any active probe. The highest observed fill rate is 31%, on a probe in a rural Myanmar location with daily 6-hour connectivity windows. At that fill rate, the probe has approximately 49 hours of additional buffer capacity before the oldest non-anomalous measurements begin to be evicted.
The 0.003% loss rate compares favorably to the upload-only (no local buffer) model, where the expected loss rate in high-disruption environments like Iran during a network intervention would be 15–40% of measurements, depending on the duration of the disruption. Local buffering reduces measurement loss by roughly four orders of magnitude for the failure modes it covers, at the cost of a small SQLite database on the operator's device.
For the Tauri probe architecture that generates these measurements: The Voidly Probe: Tauri + boringtun network measurement at the operator's edge →
For how the probe maintains upload connectivity through NAT, firewalls, and censored networks: Voidly probe networking: staying connected through NAT, firewalls, and censored infrastructure →
For how the server-side ingest pipeline receives and normalizes uploaded batches: Voidly's probe-to-dataset ingest pipeline: normalization, quality filtering, and TimescaleDB indexing →
For the real-time event pipeline that processes uploaded measurements into alerts: Voidly's real-time event pipeline: from measurement anomaly to journalist alert in under 8 minutes →