Streaming replies
Token-by-token output with live "thinking" blocks you can fold away. Same rhythm as cloud chat apps — minus the round trip.
Gale is Nimbus8's everyday assistant. Stream replies from Llama, Qwen, Gemma, and Mistral with the polish of a modern chat app — with attachments, vision, and tool calls, all on-device.
Token-by-token output with live "thinking" blocks you can fold away. Same rhythm as cloud chat apps — minus the round trip.
Vision and OCR routed by a capability manifest. If your chosen model can't see, Gale falls back to on-device OCR automatically.
No account, no API keys, no telemetry. Gale runs models locally and stays offline unless you explicitly invoke a network feature.
Nimbus8 ships with a built-in Hugging Face picker. Every model is filtered against your device — RAM, chip, and OS — so incompatible downloads never start in the first place. Quantized builds are prioritised. Install, swap, and uninstall without leaving the app.
Gale is Nimbus8's on-device chat module — a full-featured assistant that runs open language models directly on your iPhone or iPad. There is no cloud inference, no account, and no round trip to an API.
Every model you pick (Llama, Qwen, Gemma, Mistral, Phi, and others on Hugging Face) is downloaded once, stored in the app's iOS sandbox, and executed on your device's Neural Engine or GPU. Chats are saved locally; attachments never leave the phone.
Gale streams tokens as they're generated — the same polish as a modern cloud chat app, but the compute is local. Each token is written to the bubble in real time, with a blinking caret until the model finishes.
When a model supports explicit reasoning traces (R1-family, QwQ, and similar), Gale renders them as a foldable "Thinking" step card above the reply. You can expand it to read the model's scratchpad or collapse it for a cleaner view. Nothing is hidden — the model just speaks in two voices: reasoning and reply.
Gale accepts images, PDFs, and arbitrary files as inline attachments. A capability manifest routes each input to the right engine:
You never pick the mode manually. Gale reads the manifest for your chosen model and picks the fastest path that preserves fidelity.
Any open model you can install through Nimbus8's Hugging Face browser. The built-in picker filters the catalog down to models your device can actually run — quantization, context window, and memory footprint are measured against your chip and RAM.
Flagship picks verified on current-gen iPhones:
Nothing, by default. Gale runs models locally and stores chats in the iOS sandbox. The network is required only when you explicitly invoke a feature that needs it — downloading a new model from Hugging Face, or toggling the optional Mist web-search fusion.
There is no telemetry. There is no account. The app does not phone home. See the privacy policy for the full data flow.
Only to download models. Once a model is on your device, Gale works fully offline — airplane mode is a supported configuration.
No. Gale is explicitly local-first. If you want cloud chat, a dozen apps already do that well. Nimbus8's value is what happens when the cloud isn't available — or when you don't want to send your data there.
On iPhone 15 Pro and newer, a 3B MLX model streams at roughly 25–40 tokens per second — faster than most cloud chat UIs feel in practice, because the first token appears immediately with no network round trip.
Yes, locally, in the iOS app sandbox. You can clear any conversation from Gale's chat list. Uninstalling Nimbus8 deletes every chat and every installed model.