Technical writing

Swarm SDK on embedded Rust: no_std, static allocation, and binary size on STM32H7

· 10 min read· AI Analytics
Swarm SDKEmbedded RustCryptographyDrone

The Swarm SDK ships in two configurations: a standard build that links against the Rust standard library for companion computers like the Jetson Nano (running Ubuntu), and a no_std build for bare-metal flight controllers like the STM32H7. The no_std build has no operating system, no thread scheduler, no heap allocator in the runtime — only the hardware and whatever the SDK itself brings. Every allocation must be tracked, every data structure must have a bounded size, and the total binary footprint must fit in the flight controller's flash alongside the autopilot firmware. This post covers how the no_std build works, what we had to change to make the cryptographic dependencies compile in this environment, and how binary size optimization brought the final static library from 1.2MB to 284KB.

The no_std constraint

Rust's standard library depends on an operating system for memory allocation, I/O, threading, and process management. On a bare-metal STM32H7, none of these exist. The #![no_std] attribute strips the std dependency and allows the crate to compile for thumbv7em-none-eabihf (Cortex-M7F with hardware floating point, no OS). The Rust alloc crate (heap-allocated types like Vec, Box, String) is available separately, provided a global allocator is registered.

// sdk/Cargo.toml (excerpt)
[features]
default = ["std"]
std = ["dep:std-only-feature"]
embedded = ["dep:cortex-m-alloc", "dep:cortex-m-rt"]

[dependencies]
cortex-m-alloc = { version = "0.4", optional = true }
cortex-m-rt    = { version = "0.7", optional = true }
ml-kem         = { version = "0.3", default-features = false, features = ["alloc"] }
x25519-dalek   = { version = "2.0", default-features = false, features = ["alloc", "static_secrets"] }
aes-gcm        = { version = "0.10", default-features = false, features = ["alloc", "aes"] }
ed25519-dalek  = { version = "2.1", default-features = false, features = ["alloc"] }
hkdf           = { version = "0.12", default-features = false }
hmac           = { version = "0.12", default-features = false }
sha2           = { version = "0.10", default-features = false }

The default-features = false on every cryptographic dependency disables their optional std features. The alloc feature enables heap-using APIs (like Vec<u8> return types) while keeping the crate off the standard library. Every crate in the dependency graph must be audited for hidden std dependencies — a single use std::sync::Mutexanywhere in the tree breaks the no_std build.

The static heap allocator

The Rust alloc crate requires a #[global_allocator] to be registered before it can use Vec, Box, or any other heap-allocated type. On bare metal, we use cortex-m-alloc with a statically declared backing buffer:

// sdk/src/embedded/heap.rs
use cortex_m_alloc::CortexMHeap;
use core::mem::MaybeUninit;

#[global_allocator]
static ALLOCATOR: CortexMHeap = CortexMHeap::empty();

// 96KB static heap — enough for SDK state + message buffers
// Declared in .bss section; zero-initialized at startup
static mut HEAP: [MaybeUninit<u8>; 98_304] = [MaybeUninit::uninit(); 98_304];

/// Call once at startup, before any SDK initialization.
/// SAFETY: must be called exactly once, from a single-threaded context.
pub unsafe fn init_heap() {
    ALLOCATOR.init(HEAP.as_ptr() as usize, 98_304);
}

The 96KB heap is sized to accommodate peak SDK usage: the Double Ratchet message key cache (100 entries × ~80 bytes each = 8KB), the gossip deduplication ring (1,000 UUIDs × 16 bytes = 16KB), the fragment reassembly buffer (4 pending messages × max 3,760 bytes each = 15KB), and the SenderKey group state for up to 32 drones (32 × ~200 bytes = 6.4KB). The remaining ~50KB provides overhead for transient allocations during cryptographic operations (key derivation outputs, message serialization) that are promptly freed.

Heap fragmentation is the primary risk in this allocation model. The SDK is designed to avoid long-lived allocations that grow and shrink frequently; instead, it uses bounded data structures with a fixed maximum size and pre-allocated capacity:

// Bounded gossip dedup ring — fixed capacity, no reallocation
struct DeduplicationRing {
    ring: VecDeque<[u8; 16]>,  // VecDeque pre-allocated with_capacity(1000)
    capacity: usize,
}

impl DeduplicationRing {
    pub fn new() -> Self {
        Self {
            ring: VecDeque::with_capacity(1000),
            capacity: 1000,
        }
    }

    pub fn insert(&mut self, id: [u8; 16]) -> bool {
        if self.ring.contains(&id) { return false; }
        if self.ring.len() == self.capacity {
            self.ring.pop_front();  // evict oldest
        }
        self.ring.push_back(id);
        true
    }
}

no_std compatibility across the dependency tree

The cryptographic crates vary in their no_std readiness:

Crateno_stdalloc neededNotes
ml-kem 0.3Yes (encapsulate output)ML-KEM-768 ciphertext is 1,088 bytes; returned as Vec
x25519-dalek 2.0No (fixed-size arrays)All operations on [u8; 32]; no heap needed
aes-gcm 0.10Optional (in-place variant)encrypt_in_place() avoids alloc; we use this path
ed25519-dalek 2.1No (fixed-size signature)64-byte signature as fixed array; no heap
hkdf 0.12NoOutputs into caller-provided fixed array
hmac 0.12NoStack-allocated; HMAC output is [u8; 32]
sha2 0.10NoStack-allocated; digest output is [u8; 32]

The biggest no_std challenge was ml-kem: theEncapsulate operation returns a(Ciphertext1088, SharedSecret32) where the ciphertext type is a newtype wrapper around Vec<u8> in the current upstream implementation. We use the alloc feature flag to enable this, at the cost of requiring the heap. An alternative approach (a fixed-size [u8; 1088] output type) is on the ml-kem roadmap but not yet released; when it lands, we will remove the heap dependency for encapsulate-only code paths.

For AES-GCM, we use the encrypt_in_place() API that modifies the plaintext buffer in-place rather than allocating a new ciphertext buffer. This requires pre-allocating the message buffer with enough trailing capacity for the 16-byte GCM authentication tag:

// Encrypt in place to avoid heap allocation
pub fn encrypt_in_place(
    key: &[u8; 32],
    nonce: &[u8; 12],
    buf: &mut Vec<u8>,  // must have capacity >= len + 16 (GCM tag)
) -> Result<(), AeadError> {
    use aes_gcm::{Aes256Gcm, KeyInit, AeadInPlace, Nonce};
    let cipher = Aes256Gcm::new(key.into());
    cipher.encrypt_in_place(Nonce::from_slice(nonce), b"", buf)
        .map_err(|_| AeadError::EncryptFailed)
}

Binary size optimization

The unoptimized debug build of the Swarm SDK static library forthumbv7em-none-eabihf is 4.7MB — far too large for a flight controller. The release build with standard optimization (opt-level = 3) is 1.2MB. After applying the full optimization stack:

# .cargo/config.toml
[profile.release]
opt-level = "z"          # optimize for size (not speed)
lto = true               # link-time optimization: cross-crate dead code elimination
codegen-units = 1        # single codegen unit required for LTO
panic = "abort"          # replace unwind tables with abort on panic (saves ~120KB)
strip = "symbols"        # strip debug symbols from output

[profile.release.build-override]
opt-level = 3            # build scripts and proc-macros still compile at speed
# Build and measure
cargo build --release --target thumbv7em-none-eabihf --features embedded --no-default-features
size target/thumbv7em-none-eabihf/release/libswarm_sdk.a

   text    data     bss     dec     hex filename
 271,432   8,192   4,096 283,720  454C8 libswarm_sdk.a

Binary size breakdown after optimization:

ComponentBefore LTOAfter LTO + opt-zSavings
ML-KEM-768 (keygen + encap + decap)412 KB148 KB64%
Double Ratchet + X3DH218 KB62 KB72%
Sender Keys + Sealed Sender134 KB38 KB72%
Gossip mesh + fragmentation88 KB19 KB78%
Panic handler (unwind → abort)124 KB4 KB97%
Total~1,200 KB284 KB76%

The largest single savings came from switching to panic = "abort", which eliminates the Rust stack unwinding machinery (DWARF exception tables, landing pad infrastructure) from the binary. On a flight controller where a panic means the vehicle is in an unsafe state, aborting immediately rather than unwinding is the correct behavior anyway.

The opt-level = "z" (size optimization, more aggressive than"s") reduced the ML-KEM implementation by 24% compared withopt-level = 3, at a cost of approximately 8% slower encapsulation on the STM32H7's Cortex-M7. Given that encapsulation happens only during session establishment (infrequent), the size-speed trade-off is favorable.

SRAM layout

The STM32H7 has 1MB of SRAM split across two banks. The SDK's memory map:

/* memory.x — linker script excerpt */
MEMORY {
    FLASH : ORIGIN = 0x08000000, LENGTH = 2048K
    DTCM  : ORIGIN = 0x20000000, LENGTH = 128K   /* data tightly coupled; fastest */
    SRAM1 : ORIGIN = 0x24000000, LENGTH = 512K   /* main SRAM */
    SRAM4 : ORIGIN = 0x38000000, LENGTH = 64K    /* backup SRAM; retained in Stop mode */
}

/* SDK uses SRAM1 for heap + buffers; DTCM for hot crypto state */
_stack_start  = ORIGIN(DTCM) + LENGTH(DTCM);  /* stack in DTCM for interrupt speed */
_heap_start   = ORIGIN(SRAM1);                 /* 96KB heap at SRAM1 start */
_sdk_state    = ORIGIN(SRAM1) + 98304;         /* static SDK state after heap */
_key_material = ORIGIN(SRAM4);                 /* long-term keys in backup SRAM */

Long-term key material (identity keypairs, fleet CA certificate) is stored in SRAM4 (backup SRAM), which is retained during STM32H7 Stop mode (the low-power state used between measurement cycles). This means the SDK does not need to re-derive session keys from flash on each wake cycle — session state survives power-saving sleep. On tamper detection, the emergency wipe procedure scrubs SRAM4 first (180ms) before proceeding to flash.

Hardware-accelerated AES

The STM32H7 includes a hardware AES accelerator (CRYP peripheral) that performs AES-128/192/256 encryption at up to 200Mbps in hardware. The Rust aes-gcm crate defaults to a software implementation; enabling the hardware accelerator requires a custom BlockCipherimplementation that delegates AES rounds to the CRYP peripheral via a hardware abstraction layer:

// sdk/src/embedded/hw_aes.rs
use stm32h7xx_hal::cryp::{Cryp, Config, Mode};

pub struct HwAes256 {
    cryp: Cryp,
}

impl HwAes256 {
    pub fn encrypt_block(&mut self, key: &[u8; 32], block: &mut [u8; 16]) {
        self.cryp.configure(Config {
            mode: Mode::Ecb,
            key_size: stm32h7xx_hal::cryp::KeySize::Bits256,
        });
        self.cryp.set_key(key);
        self.cryp.process_block(block);  // block modified in-place
    }
}

The hardware AES path reduces AES-256-GCM encrypt time on the STM32H7 from 0.61ms (software) to 0.14ms per call for a 235-byte payload — a 4.3× speedup that directly reduces the per-message latency for SenderKeyMessage encryption.


For the Swarm SDK cryptographic architecture that this no_std build implements: Post-quantum mesh cryptography for drone swarms: the Swarm SDK design →

For performance benchmarks of the cryptographic operations on STM32H7 and Jetson Nano: The Swarm SDK double ratchet: forward secrecy and post-compromise security in drone mesh networks →

For how the SDK messages this binary produces are wrapped in MAVLink v2 TUNNEL frames: Swarm SDK MAVLink v2 integration: encrypting mesh messages inside 253-byte drone protocol frames →

For what shipped in v0.4 — Situational Awareness API, EW Coordination, Adversarial Resilience, and RF Fingerprinting: Swarm SDK v0.4: situational awareness, electronic warfare coordination, and adversarial resilience →

For how the no_std binary manages key rotation — SPK 7-day timer, OTP replenishment, and BKPSRAM zeroization: Swarm SDK key rotation: automated cryptographic material refresh in field-deployed drone meshes →