Ultra low-latency margin engine
Replaced a legacy C++ risk system on a crypto derivatives venue. New engine runs margin checks inside the order path without adding measurable latency.
An engineering practice building sub-microsecond margin engines and optimized AI inference for trading venues, brokers and fintech teams — in Rust and modern C++, where the Python stack quietly costs you millions in latency and GPU rent.
Rust's type system makes invalid trades unrepresentable. No garbage collector means latency is a straight line, not a sawtooth. Three things you hire us to do — and the results that show up in the next quarter's report.
Eliminate a class of production incidents at compile time. Deterministic destruction, zero-cost abstractions, Send/Sync guarantees. No 3 AM pages for data races.
Cache-aware data structures, lock-free queues, kernel-bypass networking (DPDK, Solarflare). Every hot path profiled; every commit benchmarked before it merges.
Candle and Burn instead of PyTorch servers. Quantized models in the same binary as the order manager. Margin, VaR and fraud prediction — no network hop.
Four techniques applied together turn a 40 ms Triton inference into a 150 µs in-process call. Not every project needs all four — but if your risk budget is tighter than your GPU budget, one of them will fit.
Deploy models with Candle and Burn, bypassing Python overhead entirely. Near-metal execution on critical paths with predictable tail latency.
int8 and f16 quantization without sacrificing predictive accuracy in financial contexts. Backtested against float32 baselines on real production traces.
Kernels written in Rust, targeting the specific matrix shapes your risk algorithm produces. Typical 2–6× speedup over stock cuBLAS in measured workloads.
rkyv archives and shared-memory buffers between market-data ingest, risk engine and inference. Serialization measured in nanoseconds, not milliseconds.
Numbers below are measured against the systems they replaced, at the same venues, on the same hardware class. Client names redacted by NDA — happy to name them on a call.
Replaced a legacy C++ risk system on a crypto derivatives venue. New engine runs margin checks inside the order path without adding measurable latency.
High-performance Rust runtime for deploying predictive models directly on trading servers. Replaced a Python/Torch inference tier, cut p99 latency by more than half, and removed an entire deployment layer.
// Real-time margin calc, quantized risk model use candle_core::{Tensor, Device, DType}; use burn::tensor::backend::NdArray; struct MarketState { spot: f32, volatility: f32, book: ZeroCopyBuffer, } async fn calc_margin(s: MarketState) -> Result<f32> { // zero-copy to GPU, f16 kernel let x = Tensor::from_raw( &s.spot, DType::F16, &Device::Cuda(0))?; let var = risk.forward_q(&x)?; Ok(var.to_scalar::<f32>()?) }
Four phases per engagement. Typical duration 8–16 weeks. No ambiguous "discovery" — day one produces a benchmarked baseline and a written latency budget you can argue with.
Encode instruments, venues and order states in the type system. Invalid trades become unrepresentable before the first line of business logic is written.
SIMD where it earns its keep. Lock-free structures, NUMA-aware allocation, cache-line discipline. Profiled, not guessed.
Historical replay against real order books. Criterion.rs in CI — every commit holds the latency budget or the build fails.
Canary deploy next to the legacy system. Shadow traffic for 2 weeks. Full cutover only after p99 and correctness pass human review.
If your trading stack is losing to the GIL, or your margin engine is a C++ tarpit from 2014 — let's talk. One 30-minute call, no pitch deck. We'll know in 15 minutes if it's worth doing.