Technical writing

Swarm SDK operational security: traffic analysis resistance, message size normalization, and timing jitter

March 10, 2026· 8 min read· AI Analytics

Swarm SDKCryptographySecurity

Strong encryption solves confidentiality. It does not solve traffic analysis. An adversary with government-grade RF monitoring equipment who cannot read your messages can still observe when your swarm is communicating, how many nodes are active, whether key rotation events are occurring, and whether the burst pattern on the RF channel matches the signature of an electronic warfare coordination event. Packet sizes and transmission timing are metadata, and metadata is visible even when ciphertext is not. The Swarm SDK addresses this with three independent mechanisms: message size normalization into fixed bins, transmission timing jitter, and a store-and-forward ring buffer that smooths burst patterns before they reach the radio.

The traffic analysis problem

Even with AES-256-GCM encryption covering every byte of payload, a passive RF monitor observing a drone mesh network can recover a significant amount of operational information from the observable channel characteristics. Frame sizes distinguish message types: a SenderKeyDistributionMessage is consistently 84 bytes on the wire; a SealedSenderEnvelope carrying an ML-KEM-768 ciphertext clusters around 1,256 bytes across 6 MAVLink fragments. An adversary who has profiled the SDK's message type size distribution can classify traffic in real time without ever touching the ciphertext.

Timing patterns compound the problem. A SenderKeyMessage broadcast to 20 nodes is deterministically followed by 20 individual SealedSender DM responses within a predictable window — each response arriving at an interval that reflects the gossip mesh fanout and the Double Ratchet processing latency on the recipient hardware. This sequence is a timing oracle: a monitor who sees a broadcast followed by a cluster of 20 point-to-point responses knows the swarm size and can infer that a group key event just occurred. Modern signal intelligence equipment performs this classification automatically.

Message size normalization

The SDK normalizes all outgoing messages to one of six fixed size bins before encryption. The bin boundaries were chosen by analyzing the actual Swarm SDK message type size distribution: these six values cover 97.3% of observed traffic without requiring excessive padding waste on typical messages.

/// Fixed transmission size bins for traffic analysis resistance.
/// All outgoing messages are padded to the next bin boundary.
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
pub enum MessageSizeBin {
    B64   = 64,
    B128  = 128,
    B256  = 256,
    B512  = 512,
    B1024 = 1024,
    B2048 = 2048,
}

impl MessageSizeBin {
    /// Select the smallest bin that fits payload_len bytes.
    /// Returns None if the payload exceeds the largest bin (2048 bytes).
    pub fn bin_for_size(payload_len: usize) -> Option<Self> {
        match payload_len {
            0..=64    => Some(Self::B64),
            65..=128  => Some(Self::B128),
            129..=256 => Some(Self::B256),
            257..=512 => Some(Self::B512),
            513..=1024 => Some(Self::B1024),
            1025..=2048 => Some(Self::B2048),
            _ => None,
        }
    }

    pub fn size_bytes(self) -> usize {
        self as usize
    }
}

A 64-byte DirectMessage sits exactly at the B64 boundary: zero padding wasted. A 65-byte message moves to B128, wasting 63 bytes. This is the worst-case bin jump, and it is acceptable — 63 bytes of padding at AES-256-GCM speeds on STM32H7 costs less than 0.01 ms additional encryption time. A SealedSenderEnvelope at ~1,256 bytes falls into B2048, consuming 792 bytes of padding; but SealedSender messages are rare key-establishment events, not steady-state telemetry.

The padding itself uses PKCS7-compatible random bytes, not zeroes. Deterministic padding (e.g., zeroes) is compressible and creates a distinguishable ciphertext pattern when the AES-GCM nonce reuse detection heuristics are applied. Random padding is statistically indistinguishable from ciphertext. Critically, the padding is encrypted: AES-256-GCM covers the entire padded payload, so a monitor cannot identify the padding region by looking for a predictable suffix.

Transmission timing jitter

Deterministic transmission intervals are timing oracles. If the SDK sends a gossip heartbeat every exactly 250 ms, an RF monitor can build a phase-locked loop against the transmitter and use phase deviation to detect anomalous events (key rotation, EW coordination, swarm split/merge) without decrypting anything. The Swarm SDK applies uniform random jitter of ±15% to every nominal transmission interval.

/// Timing jitter configuration for a single transmit queue.
pub struct JitterConfig {
    /// Nominal interval between transmissions, in milliseconds.
    pub base_interval_ms: u32,
    /// Jitter fraction applied symmetrically. 0.15 = ±15%.
    pub jitter_fraction: f32,
    /// Maximum number of queued frames before the oldest non-critical
    /// frames are dropped.
    pub max_backlog: usize,
}

impl JitterConfig {
    pub const DEFAULT: Self = Self {
        base_interval_ms: 250,
        jitter_fraction: 0.15,
        max_backlog: 32,
    };

    /// Compute the next transmission time offset using the STM32H7 hardware TRNG.
    /// Returns a duration in milliseconds in the range
    /// [base * (1 - jitter_fraction), base * (1 + jitter_fraction)].
    pub fn next_tx_time_ms(&self, trng: &mut Trng) -> u32 {
        // Read one 32-bit word from the STM32H7 RNG_DR register.
        let raw = trng.read_u32();
        let jitter_range = (self.base_interval_ms as f32 * self.jitter_fraction) as u32;
        // Map raw to [0, 2 * jitter_range], then shift to center on base_interval_ms.
        let offset = raw % (2 * jitter_range + 1);
        self.base_interval_ms - jitter_range + offset
    }
}

The STM32H7's hardware TRNG peripheral (RNG_DR register, sampled via the HAL) provides non-deterministic entropy seeded from thermal noise inside the chip. Using a software PRNG seeded at boot would produce a jitter pattern that, while locally non-deterministic, is globally reproducible if the adversary knows the seed — a real concern for devices with predictable boot sequences. The hardware TRNG eliminates this class of attack.

Jitter is applied to the inter-frame gap, not to ARQ retransmissions. Automatic repeat request retransmissions need deterministic timing to support round-trip-time estimation by the transport layer; adding jitter to retransmissions would corrupt the RTT samples and degrade link quality estimation. The jitter boundary is at the transmit queue drain rate, not inside the ARQ loop.

Store-and-forward ring buffer

Even with per-message jitter applied, burst transmissions are distinguishable. The gossip anti-entropy protocol sends 200 message IDs in a single AntiEntropyDigest — that is a burst of 14 MAVLink fragments at near-simultaneous intervals that no amount of per-message jitter will hide, because the burst itself is the distinguishing feature. The store-and-forward buffer decouples message generation from transmission, smoothing bursts into a rate-limited stream.

use std::collections::VecDeque;

/// Priority class for queued frames. CRITICAL frames are never dropped
/// when the buffer reaches capacity.
#[derive(Debug, Clone, Copy, PartialEq, Eq, PartialOrd, Ord)]
pub enum FramePriority {
    Low      = 0,  // AntiEntropyDigest
    Normal   = 1,  // DirectMessage, SenderKeyMessage
    High     = 2,  // SealedSenderEnvelope
    Critical = 3,  // RevocationMessage, KeyRotation
}

pub struct QueuedFrame {
    pub payload: Vec<u8>,
    pub priority: FramePriority,
}

/// Token-bucket rate-limited transmit buffer with priority-aware drop policy.
pub struct StoreForwardBuffer {
    queue: VecDeque<QueuedFrame>,
    capacity: usize,
    /// Target transmit rate in frames per second.
    target_rate_fps: u32,
    /// Accumulated token count (fractional tokens tracked via fixed-point).
    tokens: u32,
    last_refill_ms: u64,
}

impl StoreForwardBuffer {
    pub fn new(capacity: usize, target_rate_fps: u32) -> Self {
        Self {
            queue: VecDeque::with_capacity(capacity),
            capacity,
            target_rate_fps,
            tokens: 0,
            last_refill_ms: 0,
        }
    }

    /// Enqueue a frame. If the buffer is full, drop the oldest LOW-priority
    /// frame to make room. If no LOW frame exists, drop the oldest NORMAL frame.
    /// CRITICAL and HIGH frames are never dropped.
    pub fn enqueue(&mut self, frame: QueuedFrame) {
        if self.queue.len() < self.capacity {
            self.queue.push_back(frame);
            return;
        }
        // Find the oldest droppable frame (lowest priority, earliest position).
        let drop_idx = self.queue.iter().position(|f| f.priority == FramePriority::Low)
            .or_else(|| self.queue.iter().position(|f| f.priority == FramePriority::Normal));
        if let Some(idx) = drop_idx {
            self.queue.remove(idx);
            self.queue.push_back(frame);
        }
        // If no droppable frame exists, the incoming frame is silently dropped.
        // This only occurs when the buffer holds 128 CRITICAL or HIGH frames, which
        // indicates a severely degraded channel; the application layer should have
        // triggered degraded-mode before this point.
    }

    /// Drain up to one frame per token. Call once per millisecond tick.
    pub fn drain_tick(&mut self, now_ms: u64) -> Option<QueuedFrame> {
        let elapsed = now_ms.saturating_sub(self.last_refill_ms);
        // Refill: one token per (1000 / target_rate_fps) milliseconds.
        let token_interval_ms = 1000 / self.target_rate_fps as u64;
        if elapsed >= token_interval_ms {
            self.tokens = self.tokens.saturating_add(1);
            self.last_refill_ms = now_ms;
        }
        if self.tokens > 0 {
            if let Some(frame) = self.queue.pop_front() {
                self.tokens -= 1;
                return Some(frame);
            }
        }
        None
    }
}

Buffer depth in practice varies with channel quality. At 5% packet loss the buffer drains faster than it fills, holding an average depth of 0 frames between gossip cycles. At 30% loss the average depth is 12 frames. At 60% loss — approaching the degraded-mode threshold — average depth reaches 67 frames. These figures are from STM32H7 bench testing with a simulated 64-node swarm at 10 gossip cycles per second.

Degraded-channel operational mode

High-jamming environments demand a different operating posture. Continuing to broadcast at normal gossip rates maximizes RF exposure time; every transmitted frame is an opportunity for a jammer to lock on, for direction-finding equipment to obtain a bearing fix, or for an adversary's ML classifier to accumulate training data. The SDK'sPacketLossEstimator triggers a mode switch when it detects sustained packet loss above 70% for more than 30 continuous seconds.

#[derive(Debug, Clone, Copy, PartialEq, Eq)]
pub enum OperationalMode {
    /// Normal operation: k=3 gossip fanout, full anti-entropy, all size bins.
    Normal,
    /// Degraded: k=1 fanout, anti-entropy disabled, reduced bins, ±30% jitter.
    Degraded,
    /// Emergency beacon: position + authentication only, once per 60 seconds.
    EmergencyBeacon,
}

impl OperationalMode {
    pub fn gossip_fanout(self) -> usize {
        match self {
            Self::Normal         => 3,
            Self::Degraded       => 1,
            Self::EmergencyBeacon => 0,
        }
    }

    pub fn anti_entropy_enabled(self) -> bool {
        matches!(self, Self::Normal)
    }

    pub fn allowed_bins(self) -> &'static [MessageSizeBin] {
        match self {
            Self::Normal => &[
                MessageSizeBin::B64, MessageSizeBin::B128, MessageSizeBin::B256,
                MessageSizeBin::B512, MessageSizeBin::B1024, MessageSizeBin::B2048,
            ],
            Self::Degraded => &[MessageSizeBin::B128, MessageSizeBin::B256],
            Self::EmergencyBeacon => &[MessageSizeBin::B128],
        }
    }

    pub fn jitter_fraction(self) -> f32 {
        match self {
            Self::Normal          => 0.15,
            Self::Degraded        => 0.30,
            Self::EmergencyBeacon => 0.30,
        }
    }
}

In Degraded mode the gossip fanout drops from k=3 to k=1 (unicast to a single peer, not broadcast), anti-entropy is suspended entirely, message size bins are restricted to [128, 256] bytes, and jitter increases to ±30%. The net effect is a dramatic reduction in RF exposure while preserving point-to-point messaging capability between immediately adjacent nodes.

EmergencyBeacon is a last resort. In this mode all application-layer communication is suspended. The SDK transmits a single 128-byte authenticated position frame once every 60 seconds. The frame carries position, altitude, node ID, and an Ed25519 signature over the payload — enough for an operator to account for the node in the operational picture. All other transmissions cease.

RF fingerprint resistance

Commercial drone radio transceivers exhibit chip-specific RF fingerprints: power ramp-up curves, carrier frequency error, and IQ imbalance that differ between production lots and even between individual units. Sophisticated signal intelligence equipment can use these fingerprints to track individual platforms across frequency hops or to correlate transmissions from the same device across separate missions.

The Swarm SDK addresses this at the software layer with two mechanisms. First, it implements software-controlled power ramping: rather than allowing the radio's hardware default 2 ms ramp, the SDK drives a gradual 8 ms ramp via the HAL GPIO, smoothing the ramp-up signature and reducing its distinctiveness across units. Second, the SDK exposes a set_frequency_hop_plan() interface that the flight controller uses to coordinate frequency changes across the swarm. The hop plan itself is distributed over the gossip mesh as an EwEvent, ensuring all nodes synchronize to the new plan before the first hop occurs.

These are explicitly second-order defenses. The Swarm SDK does not control the radio hardware directly — it operates at the MAVLink/TUNNEL layer and hands frames to the flight controller's radio driver. The primary defense against RF fingerprinting is hardware diversity and physical RF countermeasures that are outside the SDK's scope. The SDK's contribution is to avoid making the situation worse: it does not generate timing or size patterns that make fingerprinting easier, and it provides a hook for coordinated frequency hopping that the platform operator can use.

What the SDK does not protect against

Traffic analysis resistance is not anonymity. The SDK's protections operate at the application message layer; they do not reach the radio-layer MAC address, the boot-time RF emission sequence, or chip-level physical characteristics.

The radio-layer MAC address is visible to any monitor in range. Drone operators who require link-layer anonymity must apply MAC randomization at the radio driver level, before packets reach the SDK. The SDK has no mechanism to randomize or conceal the MAC address — that identifier lives in the radio firmware, not in the application stack.

Boot-time RF emission is also outside scope. The interval between power-on and the first SDK-layer transmission is controlled by the flight controller boot sequence, radio firmware initialization, and the time required to acquire GPS lock. The SDK does not control when the radio first keys up, which means the boot-time emission pattern is a fingerprint that the jitter and bin-normalization systems do not touch. Operators who require boot-time emission control must implement it at the platform level, for example by delaying radio power-on until the flight controller is fully initialized and the operator explicitly authorizes RF emission.

Physical RF fingerprinting — chip-specific power amplifier nonlinearity, phase noise, and oscillator pulling characteristics — is entirely outside SDK scope. These are hardware properties that require hardware countermeasures. The SDK's software power ramping reduces one visible symptom (the ramp-up curve) but does not address the underlying physical characteristics that a sophisticated fingerprinting system uses.