Technical writing

Swarm SDK message framing: binary wire format, fragmentation, and MAVLink packing

February 27, 2026· 8 min read· AI Analytics

Swarm SDKCryptographyProtocol design

The Double Ratchet produces opaque ciphertext. The radio transports MAVLink v2 frames. Between those two layers sits the framing layer: the component responsible for serializing encrypted messages into a binary wire format, slicing them into fragments that fit inside a 253-byte MAVLink TUNNEL payload, and reassembling those fragments on the other end before handing the result back to the decryption layer. This article covers every decision in that middle layer — the binary header structure, the fragmentation algorithm, reassembly state, CONTROL frame authentication, and the measured cost on an STM32H7.

The framing problem

A Double Ratchet encrypted message for a typical 200-byte application payload comes out at roughly 800 bytes once you account for the AES-256-GCM authentication tag, the encrypted header carrying the ratchet public key and chain indices, and the serialization overhead of the message type. MAVLink v2 TUNNEL frames carry at most 253 bytes of payload. The gap is obvious: a single encrypted message does not fit in a single frame.

The framing layer has three jobs. First, serializethe application message into a compact binary representation before passing it to the cryptographic layer. Second, after encryption, fragmentthe resulting ciphertext into chunks that each fit within a single TUNNEL payload, prepending a binary header to each chunk that identifies which message it belongs to, how many total fragments exist, and which position this chunk occupies. Third, on the receiving side, reassemble arriving fragments back into the complete ciphertext using a keyed buffer, detect timeout or loss, and deliver the reconstructed ciphertext to the decryption layer.

A fourth concern cuts across all three: correctness checking. Each fragment carries a CRC-16 over both its header and its payload slice. A corrupted fragment is caught at the receiver before it poisons the reassembly buffer. This is separate from — and complementary to — the AES-256-GCM authentication tag that covers the entire reassembled ciphertext. The per-frame CRC catches transmission errors early; the per-message AEAD tag catches any adversarial tampering that might survive the CRC.

The ordering of these operations matters. Encryption happens before fragmentation, not after. Each fragment therefore carries a slice of opaque ciphertext — an observer who intercepts a single fragment cannot determine the message type, measure the plaintext length, or make any inference about content. A partial interception reveals only that some message was in transit.

SwarmFrame binary structure

Every fragment — whether it carries encrypted application data, an ACK, a NACK, or a CONTROL message — begins with a 16-byte header. The header is a packed C-compatible struct with no padding:

#[repr(C, packed)]
pub struct SwarmFrameHeader {
    pub magic: [u8; 2],        // 0x5700 ('W' for Swarm)
    pub version: u8,           // 0x01
    pub frame_type: FrameType, // DATA=0, ACK=1, NACK=2, CONTROL=3
    pub sequence: u16,         // message sequence number
    pub total_frames: u8,      // total fragments for this message
    pub frame_index: u8,       // 0-indexed fragment position
    pub payload_len: u16,      // bytes of payload in this frame
    pub message_id: u32,       // stable ID across all fragments of same message
    pub crc16: u16,            // CRC-16/CCITT of header + payload
}

The magic bytes 0x5700 let the receiver quickly discard malformed frames without parsing the rest of the header. The ASCII byte0x57 is the letter W, chosen for Swarm. The zero second byte reserves space for a future protocol family discriminator without breaking the two-byte magic check.

The sequence field is a per-session message counter that wraps at 65535. It identifies this particular message independent of its fragmentation. The message_id is a 32-bit value generated once before fragmentation and repeated identically in every fragment of the same message. The two fields serve different purposes: sequence is used for ACK and NACK references (acknowledging the message, not the fragment), while message_id is the reassembly buffer key — it stays stable even when fragments arrive out of order or are retransmitted.

The total_frames and frame_indexfields are one byte each. Total fragments is therefore capped at 255. The maximum message size the framing layer can handle is 255 fragments times 237 bytes per fragment payload: 60,435 bytes. In practice the largest message type in the current SDK is a SealedSenderMessage at roughly 1,140 bytes after encryption, requiring 6 fragments. The 255-fragment cap is an implementation ceiling, not a practical one.

The payload budget per fragment: MAVLink v2 TUNNEL carries 253 bytes maximum. Subtract the 16-byte SwarmFrame header and each fragment carries at most 237 bytes of ciphertext. This constraint drives everything else in the fragmentation algorithm.

Message sizes before and after encryption

The number of fragments a message requires is determined by its size after encryption, not before. The encryption layer adds overhead: the AES-256-GCM tag (16 bytes per message), the encrypted header carrying the Double Ratchet public key and chain indices (76 bytes), and the serialization framing of the message type itself. The table below shows typical sizes for each message type in the current SDK:

Message type                    Plaintext    After encryption    Frames needed
───────────────────────────────────────────────────────────────────────────────
DirectMessage (text, 200B)       212B         284B (DR + MAC)       2
SealedSenderMessage (empty)      1,108B       1,140B               6
SenderKeyMessage (group, 200B)   200B         240B                  2
ControlMessage (RevocationMsg)    48B          80B                  1
PreKeyBundle (initial session)   512B         580B                  3
EnrollmentAnnouncement           320B         388B                  2

The SealedSenderMessage outlier deserves explanation. Even with an empty application payload, this message type is 1,108 bytes in plaintext. The overhead is structural: 512 bytes for the ML-KEM-768 encapsulated key (the post-quantum KEM ciphertext that hides the sender identity); 32 bytes for an ephemeral X25519 public key; 32 bytes for the HKDF salt; 64 bytes for the SenderCertificate (a signed binding of sender identity to the session); 64 bytes for the AEAD authentication tag on the inner sealed layer; and 404 bytes of serialization overhead for the nested protobuf message structure. This is the cost of sender identity hiding — the framing layer handles it transparently by allocating six fragments.

The Double Ratchet encryption overhead — 104 bytes of header material plus 16 bytes of GCM tag — is constant regardless of plaintext size. For very short messages (ControlMessage at 48 bytes plaintext), encryption adds 67% overhead but the total still fits in a single fragment. For larger messages the fixed overhead becomes a small fraction of the total.

Fragmentation algorithm

Fragmentation takes the encrypted ciphertext blob, a sequence number, and a message ID, and produces a vector of SwarmFrame values ready to hand to the MAVLink transport layer:

pub fn fragment_message(
    ciphertext: &[u8],
    sequence: u16,
    msg_id: u32,
) -> Vec<SwarmFrame> {
    const MAX_PAYLOAD: usize = 237;
    let total = (ciphertext.len() + MAX_PAYLOAD - 1) / MAX_PAYLOAD;
    assert!(total <= 255, "message too large to fragment");

    ciphertext
        .chunks(MAX_PAYLOAD)
        .enumerate()
        .map(|(i, chunk)| {
            let header = SwarmFrameHeader {
                magic: [0x57, 0x00],
                version: 0x01,
                frame_type: FrameType::Data,
                sequence,
                total_frames: total as u8,
                frame_index: i as u8,
                payload_len: chunk.len() as u16,
                message_id: msg_id,
                crc16: crc16_ccitt(&[header_bytes(), chunk].concat()),
            };
            SwarmFrame { header, payload: chunk.to_vec() }
        })
        .collect()
}

The ceiling division (len + MAX_PAYLOAD - 1) / MAX_PAYLOADcomputes the exact fragment count without floating-point arithmetic — essential for embedded targets where floating-point is expensive or disabled. The last fragment will naturally contain fewer than 237 bytes; its payload_lenfield records the actual byte count so the receiver knows where the payload ends without needing a sentinel.

The CRC-16/CCITT computation covers both the header bytes and the payload slice. There is a subtlety here: when computing the CRC, the crc16field itself is treated as zero — the CRC is computed over the header with that field zeroed, then the computed value is written back into the header before transmission. The receiver zeroes the crc16 field before verifying, replicating the same computation.

The message_id is generated by the caller before invoking fragment_message. This design decision is deliberate: the ID must exist before fragmentation so it can be embedded identically in every fragment. The caller generates it from a cryptographically random 32-bit value using the session CSPRNG. Randomness prevents an attacker from predicting message IDs and pre-staging a spoofed fragment in the reassembly buffer.

Reassembly buffer

The receiver maintains a reassembly map keyed on the pair(source_device_id, message_id). This two-part key is necessary: message IDs are per-session counters, so two different devices could independently generate the same message_id value. The device ID disambiguates them.

pub struct ReassemblyState {
    pub total_frames: u8,
    pub frames: BTreeMap<u8, Vec<u8>>,  // frame_index -> payload
    pub created_at: Instant,
    pub sequence: u16,
}

// Keyed on (source_device_id, message_id)
type ReassemblyMap = HashMap<(DeviceId, u32), ReassemblyState>;

On each arriving DATA frame the receiver checks the CRC first, discards the frame if it fails, then looks up or creates a ReassemblyStateentry for the (source, message_id) key. It inserts the payload slice at frame_index in the BTreeMap. A BTreeMap rather than a Vec is used here because fragments arrive in arbitrary order and the BTreeMap keeps them sorted by index without requiring random-access writes to a pre-allocated buffer.

Completion is detected when the number of entries in the BTreeMap equalstotal_frames. At that point the payloads are iterated in BTreeMap key order (which is fragment-index order), concatenated into a single byte vector, and passed to the decryption layer. TheReassemblyState entry is then removed from the map.

Timeout is five seconds, chosen to match the MAVLink mesh transport retry window. The mesh transport retries a lost frame up to three times with exponential backoff; the total worst-case delivery time for a single fragment across three retries is approximately 4.2 seconds. Any reassembly entry older than five seconds can be assumed to have experienced unrecoverable loss. On timeout the entry is purged from the map and a NACK frame is generated listing the missing fragment indices.

Duplicate fragment detection is implicit: the BTreeMap insert at a given index is idempotent for identical payloads. A retransmitted fragment that arrives after the original has already been stored simply overwrites the same entry with the same bytes. If the CRC passes on the new copy but would fail on the stored copy (a corrupted original that was erroneously admitted), the retransmission corrects the buffer. This is an intentional design choice: CRC-16 has a non-zero collision probability, so permitting overwrites of stored entries provides a correction mechanism at no additional code complexity.

Encryption layer ordering

Three ordering decisions in the framing layer are worth making explicit because each has a security rationale that is not immediately obvious.

Encrypt before fragment. The alternative — fragment first, then encrypt each fragment independently — seems attractive because it allows each fragment to be independently authenticated. But it creates a significant traffic analysis leak: the fragment count and the size of the final fragment together constrain the range of possible plaintext lengths to a narrow window. An adversary who can passively observe fragment counts can distinguish a DirectMessage (2 fragments) from a SealedSenderMessage (6 fragments) without breaking any cryptography. Encrypting before fragmentation produces uniform opaque chunks; the plaintext type is not inferrable from the fragment count because the mapping from message type to encrypted size is not fixed across sessions.

Generate message_id before fragmentation.The message ID must be stable across all fragments of the same message, which means it must exist before the fragmentation loop runs. If message_id were generated per-fragment (or derived from a fragment-level counter), reassembly would require some other correlation mechanism. Using a pre-generated random 32-bit ID that is embedded in every fragment header makes reassembly a simple hash map lookup regardless of arrival order or retransmission.

CRC-16 per fragment, AEAD per message.Computing a single AEAD tag over the entire reassembled ciphertext would require buffering all fragments before any integrity check could be performed. Per-fragment CRC-16 lets the receiver reject corrupted fragments immediately on arrival, before they enter the reassembly buffer. This matters for memory pressure: a corrupted fragment that is admitted to the buffer and later detected at AEAD verification wastes buffer space for the full reassembly timeout. The per-fragment CRC is not a cryptographic integrity guarantee — CRC-16 is not collision-resistant under adversarial conditions — but it reliably catches the random bit-flip errors that dominate RF transmission failures. The AEAD tag on the fully reassembled ciphertext provides the cryptographic integrity guarantee.

CONTROL frames

CONTROL frames are structurally different from DATA frames. They do not carry encrypted Double Ratchet payload. Instead they carry authenticated plaintext: control data that must be readable by the framing layer itself, before any Double Ratchet session exists or after a session has been torn down.

The message types sent as CONTROL frames are: PreKeyBundleRequest (a device requesting the public key bundle it needs to initiate a session); PreKeyBundleResponse (the response carrying the bundle); RevocationMessage (notification that a device ID or key has been revoked); HeartbeatRequest and HeartbeatResponse (liveness probes used by the mesh layer to maintain routing tables); and AntiEntropyDigest (a compact Bloom-filter summary of known revocations, used for eventual consistency of the revocation set across the mesh).

These messages cannot be Double Ratchet encrypted because they are necessary to establish or maintain the sessions that Double Ratchet depends on. But transmitting them as completely unauthenticated plaintext would allow an adversary to inject false PreKeyBundles, forge revocations, or flood the mesh with bogus heartbeats. The Swarm SDK authenticates CONTROL frames with HMAC-SHA256 using the session MAC key — a symmetric key established during enrollment and shared across the mesh through the key management layer.

// CONTROL frame payload layout
// Total max payload: 237 bytes (fits in one frame; CONTROL messages are always 1 frame)
struct ControlPayload {
    control_type:  u8,        // PreKeyBundleRequest=0, PreKeyBundleResponse=1,
                              // RevocationMessage=2, HeartbeatRequest=3,
                              // HeartbeatResponse=4, AntiEntropyDigest=5
    body_len:      u16,       // length of control_body in bytes
    control_body:  [u8],      // serialized control message (variable length)
    hmac_tag:      [u8; 32],  // HMAC-SHA256(session_mac_key, control_type || body_len || control_body)
}

// The HMAC input is: control_type byte || body_len (2 bytes, little-endian) || control_body bytes
// The session_mac_key is the 32-byte MAC key established during enrollment
// and distributed via the key management layer

The 32-byte HMAC tag consumes 32 of the 237 available payload bytes, leaving 205 bytes for the control message body. The largest CONTROL message type is a PreKeyBundleResponse carrying a full public key bundle, which fits comfortably in 205 bytes. All CONTROL messages are guaranteed to fit in a single frame, so the total_frames field is always 1 and frame_index is always 0 for CONTROL frames. The fragmentation machinery does not apply.

HMAC-SHA256 authentication of CONTROL frames provides integrity and authenticity but not confidentiality. PreKeyBundle contents — public keys — are not secret by definition. RevocationMessages contain device IDs and key fingerprints, which are considered non-sensitive operational data. HeartbeatRequest and HeartbeatResponse carry only a timestamp and a device ID. AntiEntropyDigest is a Bloom filter over revocation hashes. None of these require encryption; all require tamper detection, which the HMAC provides.

ACK and NACK frames

Acknowledgement operates at the message level, not the fragment level. After successful reassembly and decryption of a complete message, the receiver sends a single ACK frame to the source device. The ACK payload is two bytes: the sequence number of the acknowledged message in little-endian byte order. ACK frames use frame_type=1, total_frames=1, frame_index=0, and payload_len=2.

NACK frames are sent on reassembly timeout. The NACK payload encodes the sequence number being negatively acknowledged (two bytes) followed by a one-byte bitmask of missing fragment indices. Bit N of the bitmask being set means fragment N has not been received. The bitmask covers indices 0-7, meaning NACK can describe missing fragments only within the first eight positions. This is sufficient for current message sizes — the largest message (SealedSenderMessage, 6 fragments) fits within the 8-bit range. If the SDK is extended to support message types requiring more than 8 fragments, the NACK bitmask field will need to expand; the payload_len field in the header accommodates variable-length NACK payloads already.

The mesh transport layer handles retransmission in response to a NACK. The framing layer generates the NACK and passes it up to the transport layer; the transport layer is responsible for locating the cached copy of the original fragments (it retains outgoing frames for the transport retry window) and re-queuing the missing ones. The framing layer does not maintain its own retransmission queue.

Performance on STM32H7

Framing overhead on the STM32H7 (Cortex-M7, 480 MHz) was measured in isolation from the cryptographic layer to quantify the cost of the binary serialization and CRC computation independently of AES-256-GCM. All benchmarks used Rust release mode with codegen-units=1 andlto=thin.

Framing benchmark results — STM32H7 (Cortex-M7, 480 MHz)

Operation                                           Time
─────────────────────────────────────────────────────────
Fragmentation, 200B message (2 frames)              0.09 ms
  (header serialization + chunk iteration + CRC)
CRC-16/CCITT per frame (237-byte payload)           0.02 ms
Reassembly completion check + concatenation (2 fr.) 0.04 ms
Total framing overhead, SealedSender (6 frames)     0.54 ms

For comparison — encryption costs (same platform):
AES-256-GCM per frame (hardware accelerator)        0.14 ms
AES-256-GCM per frame (software)                   0.61 ms
Total encryption cost, SealedSender (6 frames, HW) 0.84 ms
Total encryption cost, SealedSender (6 frames, SW) 3.66 ms

Framing overhead at 0.54 ms for the most expensive message type is dominated by CRC computation: six frames at 0.02 ms each is 0.12 ms, plus header serialization at roughly 0.07 ms per frame gives 0.42 ms, with the remaining 0.12 ms spent on chunk iteration and buffer allocation. The framing layer is not the bottleneck.

When the STM32H7 hardware AES accelerator is enabled, the total pipeline cost for a SealedSenderMessage is 0.54 ms framing plus 0.84 ms encryption, totalling 1.38 ms. The Double Ratchet chain advance (HKDF and key deletion) adds approximately 0.16 ms, bringing the end-to-end cost to roughly 1.54 ms per SealedSenderMessage. At a 10 ms MAVLink update rate, this is 15.4% of the budget for the most expensive message type. In practice SealedSenderMessages are rare — they are sent only when establishing a new anonymous session — so this cost does not appear on the steady-state critical path.

Without the hardware accelerator, the software AES path raises the encryption cost to 3.66 ms, making the total pipeline 4.36 ms — 43.6% of the 10 ms budget for a SealedSenderMessage. This is still within budget but leaves little headroom for other processing. Enabling the hardware accelerator is strongly recommended for deployments that anticipate frequent session establishment (for example, a swarm that dynamically splits and merges sub-groups in flight, triggering repeated SealedSender session initiations).