AVAILABLE · Q3 2026/AI & Fintech engagements

High load.
Low latency.
Rust, C++ & AI, done right.

An engineering practice building sub-microsecond margin engines and optimized AI inference for trading venues, brokers and fintech teams — in Rust and modern C++, where the Python stack quietly costs you millions in latency and GPU rent.

Book an intro call See selected work or hello@jetdev.eu

Margin check p99

<1µs

Replaces legacy C++ on the hot path.

AI inference p99

150µs

Quantized risk model on GPU, zero-copy.

Throughput

2.4M/s

Single-node on commodity hardware.

Uptime · 18 mo

99.995%

Across 3 live trading deployments.

01 · Core Competencies

Fast by default.
Safe by design.

Rust's type system makes invalid trades unrepresentable. No garbage collector means latency is a straight line, not a sawtooth. Three things you hire us to do — and the results that show up in the next quarter's report.

/ 01

Memory & concurrency safety

Eliminate a class of production incidents at compile time. Deterministic destruction, zero-cost abstractions, Send/Sync guarantees. No 3 AM pages for data races.

/ 02

Sub-microsecond latency

Cache-aware data structures, lock-free queues, kernel-bypass networking (DPDK, Solarflare). Every hot path profiled; every commit benchmarked before it merges.

/ 03

AI inference in-process

Candle and Burn instead of PyTorch servers. Quantized models in the same binary as the order manager. Margin, VaR and fraud prediction — no network hop.

02 · AI Optimization

How the Python tax gets paid off.

Four techniques applied together turn a 40 ms Triton inference into a 150 µs in-process call. Not every project needs all four — but if your risk budget is tighter than your GPU budget, one of them will fit.

A / 01

Native inference runtime

Deploy models with Candle and Burn, bypassing Python overhead entirely. Near-metal execution on critical paths with predictable tail latency.

A / 02

Model quantization

int8 and f16 quantization without sacrificing predictive accuracy in financial contexts. Backtested against float32 baselines on real production traces.

A / 03

Custom CUDA / Metal kernels

Kernels written in Rust, targeting the specific matrix shapes your risk algorithm produces. Typical 2–6× speedup over stock cuBLAS in measured workloads.

A / 04

Zero-copy data path

rkyv archives and shared-memory buffers between market-data ingest, risk engine and inference. Serialization measured in nanoseconds, not milliseconds.

Rust

C++20

Tokio

Candle

Burn

rkyv

Serde

Dioxus

Tonic

Crossbeam

Criterion

DPDK

CUDA

03 · Selected Work

Two cases,
both in production.

Numbers below are measured against the systems they replaced, at the same venues, on the same hardware class. Client names redacted by NDA — happy to name them on a call.

Case 01 · Derivatives Exchange · 2024–25

Ultra low-latency margin engine

Replaced a legacy C++ risk system on a crypto derivatives venue. New engine runs margin checks inside the order path without adding measurable latency.

p99 margin check

0.8 µs

Throughput

2.4M/s

vs. legacy C++

−94%

Incidents / 6mo

RustCrossbeamTonicDioxus UIDPDK

MARGIN_REQ_STREAM LIVE

Margin

$42.8M

+1.2%

VaR 99%

$3.2M

stable

p99

0.8µs

on budget

Case 02 · Fintech Platform · 2023–24

Edge AI inference runtime

High-performance Rust runtime for deploying predictive models directly on trading servers. Replaced a Python/Torch inference tier, cut p99 latency by more than half, and removed an entire deployment layer.

p99 inference

150 µs

vs. Python/Torch

−62%

Model size

−78%

GPUs retired

RustBurnCandleCUDAONNX

risk_engine_gpu.rs

// Real-time margin calc, quantized risk model
use candle_core::{Tensor, Device, DType};
use burn::tensor::backend::NdArray;

struct MarketState {
    spot:       f32,
    volatility: f32,
    book:       ZeroCopyBuffer,
}

async fn calc_margin(s: MarketState)
    -> Result<f32> {

    // zero-copy to GPU, f16 kernel
    let x = Tensor::from_raw(
        &s.spot, DType::F16,
        &Device::Cuda(0))?;

    let var = risk.forward_q(&x)?;
    Ok(var.to_scalar::<f32>()?)
}

Inference p99

<150µs

04 · Way of working

From whiteboard
to production.

Four phases per engagement. Typical duration 8–16 weeks. No ambiguous "discovery" — day one produces a benchmarked baseline and a written latency budget you can argue with.

Model the market

Encode instruments, venues and order states in the type system. Invalid trades become unrepresentable before the first line of business logic is written.

Optimize the hot path

SIMD where it earns its keep. Lock-free structures, NUMA-aware allocation, cache-line discipline. Profiled, not guessed.

Backtest & benchmark

Historical replay against real order books. Criterion.rs in CI — every commit holds the latency budget or the build fails.

Ship & observe

Canary deploy next to the legacy system. Shadow traffic for 2 weeks. Full cutover only after p99 and correctness pass human review.

Get in touch

Rewrite
in Rust?

If your trading stack is losing to the GIL, or your margin engine is a C++ tarpit from 2014 — let's talk. One 30-minute call, no pitch deck. We'll know in 15 minutes if it's worth doing.

Schedule a call cargo email --me

No pitch decks30-minute callWritten latency budget on day one

TWEAKS / JETDEV

Theme

Accent

Motion

High load. Low latency. Rust, C++ & AI, done right.

Fast by default.Safe by design.