Platform · Runtime

Device fit.

Nimbus8 reads your chip, measures your RAM, benchmarks on first load, and tiers every model recommendation to what your device can actually handle — no guesswork.

How it works

On first launch, Nimbus8 identifies your device's chip (A15, A16, A17 Pro, M1, M2, etc.) and measures available memory. When you install a model, the runtime runs a short benchmark — a handful of tokens — and records the actual throughput. That number is stored in the model registry and used to sort recommendations in the catalog.

Device tiers

The catalog tags every model with a recommended tier: devices with 6 GB RAM get small models (1B–3B), 8 GB devices can run mid-range (3B–8B), and iPads or iPhones with 16 GB+ can handle the larger open models. These aren't hard limits — you can always install a bigger model — but the UI will warn you if it's likely to be slow or cause memory pressure.

Memory budgets

iOS doesn't give apps unlimited memory. Nimbus8 tracks the system's memory pressure notifications and refuses to load a model if available memory is below a safe threshold (currently 400 MB headroom). This prevents the OS from killing the app mid-conversation.

Thermal management

Sustained inference generates heat. The adaptive engine monitors thermal state and adjusts batch size and thread count when the device heats up. The result: slightly slower tokens, but no thermal throttling cliff and no uncomfortably hot phone in your hand.