A real chat app for open models. On your iPhone.

Gale is Nimbus8's everyday assistant. Stream replies from Llama, Qwen, Gemma, and Mistral with the polish of a modern chat app — with attachments, vision, and tool calls, all on-device.

iOS 17+iPhone & iPad

100%on-device

MLX · GGUF · Core MLopen models

Browse Hugging Face.
Install in a tap.

Nimbus8 ships with a built-in Hugging Face picker. Every model is filtered against your device — RAM, chip, and OS — so incompatible downloads never start in the first place. Quantized builds are prioritised. Install, swap, and uninstall without leaving the app.

Device-fit first.Models that can't run on your iPhone are filtered out by default.

Verified quants.GGUF and MLX builds from known orgs (Unsloth, MLX Community, bartowski, Apple).

Live progress.Resumable downloads with a progress bar, size, and ETA — all on-device.

Zero telemetry.The only network call is the download itself. No analytics, no "suggested picks".

What is Gale?

Gale is Nimbus8's on-device chat module — a full-featured assistant that runs open language models directly on your iPhone or iPad. There is no cloud inference, no account, and no round trip to an API.

Every model you pick (Llama, Qwen, Gemma, Mistral, Phi, and others on Hugging Face) is downloaded once, stored in the app's iOS sandbox, and executed on your device's Neural Engine or GPU. Chats are saved locally; attachments never leave the phone.

How does streaming work in Gale?

Gale streams tokens as they're generated — the same polish as a modern cloud chat app, but the compute is local. Each token is written to the bubble in real time, with a blinking caret until the model finishes.

When a model supports explicit reasoning traces (R1-family, QwQ, and similar), Gale renders them as a foldable "Thinking" step card above the reply. You can expand it to read the model's scratchpad or collapse it for a cleaner view. Nothing is hidden — the model just speaks in two voices: reasoning and reply.

Multimodal inputs: photos, PDFs, files

Gale accepts images, PDFs, and arbitrary files as inline attachments. A capability manifest routes each input to the right engine:

Vision-capable models (LLaVA, Qwen-VL, PaliGemma) receive images directly.
Text-only models get on-device OCR first — extracted text becomes part of the prompt.
PDFs are parsed natively when the model supports it, otherwise rendered to images + OCR.

You never pick the mode manually. Gale reads the manifest for your chosen model and picks the fastest path that preserves fidelity.

Which models does Gale support?

Any open model you can install through Nimbus8's Hugging Face browser. The built-in picker filters the catalog down to models your device can actually run — quantization, context window, and memory footprint are measured against your chip and RAM.

Flagship picks verified on current-gen iPhones:

Llama 3.2 3B — MLX — best general-purpose pick for iPhone 15 Pro and newer.
Qwen 2.5 7B — MLX — strongest reasoning where RAM allows.
Gemma 2 2B — MLX — fastest, cleanest small model.
Phi-3.5 Mini — GGUF via llama.cpp — widest device coverage.

What leaves my device when I use Gale?

Nothing, by default. Gale runs models locally and stores chats in the iOS sandbox. The network is required only when you explicitly invoke a feature that needs it — downloading a new model from Hugging Face, or toggling the optional Mist web-search fusion.

There is no telemetry. There is no account. The app does not phone home. See the privacy policy for the full data flow.

Getting started with Gale

Install Nimbus8 from the App Store (coming soon).
Open the Models browser and pick a model the app flags as "fits your device."
Wait for the one-time download. Models live in the app sandbox — uninstalling Nimbus8 removes them.
Tap Gale, pick your model in the composer, and start chatting.

FAQ

Does Gale need an internet connection?

Only to download models. Once a model is on your device, Gale works fully offline — airplane mode is a supported configuration.

Can I use my own API keys (OpenAI, Anthropic, etc.)?

No. Gale is explicitly local-first. If you want cloud chat, a dozen apps already do that well. Nimbus8's value is what happens when the cloud isn't available — or when you don't want to send your data there.

How fast is Gale compared to cloud chat?

On iPhone 15 Pro and newer, a 3B MLX model streams at roughly 25–40 tokens per second — faster than most cloud chat UIs feel in practice, because the first token appears immediately with no network round trip.

Does Gale store my conversations?

Yes, locally, in the iOS app sandbox. You can clear any conversation from Gale's chat list. Uninstalling Nimbus8 deletes every chat and every installed model.

A real chat app for open models. On your iPhone.

Streaming replies

Photos, PDFs, files

Zero cloud

Browse Hugging Face.
Install in a tap.

What is Gale?

How does streaming work in Gale?

Multimodal inputs: photos, PDFs, files

Which models does Gale support?

What leaves my device when I use Gale?

Getting started with Gale

FAQ

Does Gale need an internet connection?

Can I use my own API keys (OpenAI, Anthropic, etc.)?

How fast is Gale compared to cloud chat?

Does Gale store my conversations?

A real chat app for open models. On your iPhone.

Streaming replies

Photos, PDFs, files

Zero cloud

Browse Hugging Face.Install in a tap.

What is Gale?

How does streaming work in Gale?

Multimodal inputs: photos, PDFs, files

Which models does Gale support?

What leaves my device when I use Gale?

Getting started with Gale

FAQ

Does Gale need an internet connection?

Can I use my own API keys (OpenAI, Anthropic, etc.)?

How fast is Gale compared to cloud chat?

Does Gale store my conversations?

Everything that comes with Nimbus8.

Browse Hugging Face.
Install in a tap.