How it works
On first launch, Nimbus8 identifies your device's chip (A15, A16, A17 Pro, M1, M2, etc.) and measures available memory. When you install a model, the runtime runs a short benchmark — a handful of tokens — and records the actual throughput. That number is stored in the model registry and used to sort recommendations in the catalog.
Device tiers
The catalog tags every model with a recommended tier: devices with 6 GB RAM get small models (1B–3B), 8 GB devices can run mid-range (3B–8B), and iPads or iPhones with 16 GB+ can handle the larger open models. These aren't hard limits — you can always install a bigger model — but the UI will warn you if it's likely to be slow or cause memory pressure.
Memory budgets
iOS doesn't give apps unlimited memory. Nimbus8 tracks the system's memory pressure notifications and refuses to load a model if available memory is below a safe threshold (currently 400 MB headroom). This prevents the OS from killing the app mid-conversation.
Thermal management
Sustained inference generates heat. The adaptive engine monitors thermal state and adjusts batch size and thread count when the device heats up. The result: slightly slower tokens, but no thermal throttling cliff and no uncomfortably hot phone in your hand.