Nimbus8 Coming to the App Store
Shipping · Gale · Chat

A real chat app for open models. On your iPhone.

Gale is Nimbus8's everyday assistant. Stream replies from Llama, Qwen, Gemma, and Mistral with the polish of a modern chat app — with attachments, vision, and tool calls, all on-device.

iOS 17+iPhone & iPad
100%on-device
MLX · GGUF · Core MLopen models
9:41
New chat
Llama-3.2-3B-Instruct-4bit
Plan me three realistic priorities for this morning, plus a playlist.
Thinking
Three priorities, none overlapping. I'll favour the ones with the sharpest deadlines first…
9:41 AM
Message Nimbus8…

Streaming replies

Token-by-token output with live "thinking" blocks you can fold away. Same rhythm as cloud chat apps — minus the round trip.

Photos, PDFs, files

Vision and OCR routed by a capability manifest. If your chosen model can't see, Gale falls back to on-device OCR automatically.

Zero cloud

No account, no API keys, no telemetry. Gale runs models locally and stays offline unless you explicitly invoke a network feature.

Open models

Browse Hugging Face.
Install in a tap.

Nimbus8 ships with a built-in Hugging Face picker. Every model is filtered against your device — RAM, chip, and OS — so incompatible downloads never start in the first place. Quantized builds are prioritised. Install, swap, and uninstall without leaving the app.

  • Device-fit first.Models that can't run on your iPhone are filtered out by default.
  • Verified quants.GGUF and MLX builds from known orgs (Unsloth, MLX Community, bartowski, Apple).
  • Live progress.Resumable downloads with a progress bar, size, and ETA — all on-device.
  • Zero telemetry.The only network call is the download itself. No analytics, no "suggested picks".
9:41
Find a model
Search HuggingFace
Hugging Face
HuggingFace results
Llama-3.2-3B-Instruct-4bit
mlx-community
→ Gale 3B · 1.9 GB text-generation mlx
vision tools
en 4-bit apple-silicon
284K 1.2K
Qwen2.5-Coder-7B-Instruct-4bit
mlx-community
→ Cirrus 7B · 4.4 GB text-generation tight fit
code tools
128K 892
gemma-2-2b-it
google
→ Ashe 2B · 1.6 GB transformers
multilingual
2.4M 4.1K gated

What is Gale?

Gale is Nimbus8's on-device chat module — a full-featured assistant that runs open language models directly on your iPhone or iPad. There is no cloud inference, no account, and no round trip to an API.

Every model you pick (Llama, Qwen, Gemma, Mistral, Phi, and others on Hugging Face) is downloaded once, stored in the app's iOS sandbox, and executed on your device's Neural Engine or GPU. Chats are saved locally; attachments never leave the phone.

How does streaming work in Gale?

Gale streams tokens as they're generated — the same polish as a modern cloud chat app, but the compute is local. Each token is written to the bubble in real time, with a blinking caret until the model finishes.

When a model supports explicit reasoning traces (R1-family, QwQ, and similar), Gale renders them as a foldable "Thinking" step card above the reply. You can expand it to read the model's scratchpad or collapse it for a cleaner view. Nothing is hidden — the model just speaks in two voices: reasoning and reply.

Multimodal inputs: photos, PDFs, files

Gale accepts images, PDFs, and arbitrary files as inline attachments. A capability manifest routes each input to the right engine:

  • Vision-capable models (LLaVA, Qwen-VL, PaliGemma) receive images directly.
  • Text-only models get on-device OCR first — extracted text becomes part of the prompt.
  • PDFs are parsed natively when the model supports it, otherwise rendered to images + OCR.

You never pick the mode manually. Gale reads the manifest for your chosen model and picks the fastest path that preserves fidelity.

Which models does Gale support?

Any open model you can install through Nimbus8's Hugging Face browser. The built-in picker filters the catalog down to models your device can actually run — quantization, context window, and memory footprint are measured against your chip and RAM.

Flagship picks verified on current-gen iPhones:

  • Llama 3.2 3B — MLX — best general-purpose pick for iPhone 15 Pro and newer.
  • Qwen 2.5 7B — MLX — strongest reasoning where RAM allows.
  • Gemma 2 2B — MLX — fastest, cleanest small model.
  • Phi-3.5 Mini — GGUF via llama.cpp — widest device coverage.

What leaves my device when I use Gale?

Nothing, by default. Gale runs models locally and stores chats in the iOS sandbox. The network is required only when you explicitly invoke a feature that needs it — downloading a new model from Hugging Face, or toggling the optional Mist web-search fusion.

There is no telemetry. There is no account. The app does not phone home. See the privacy policy for the full data flow.

Getting started with Gale

  1. Install Nimbus8 from the App Store (coming soon).
  2. Open the Models browser and pick a model the app flags as "fits your device."
  3. Wait for the one-time download. Models live in the app sandbox — uninstalling Nimbus8 removes them.
  4. Tap Gale, pick your model in the composer, and start chatting.

FAQ

Does Gale need an internet connection?

Only to download models. Once a model is on your device, Gale works fully offline — airplane mode is a supported configuration.

Can I use my own API keys (OpenAI, Anthropic, etc.)?

No. Gale is explicitly local-first. If you want cloud chat, a dozen apps already do that well. Nimbus8's value is what happens when the cloud isn't available — or when you don't want to send your data there.

How fast is Gale compared to cloud chat?

On iPhone 15 Pro and newer, a 3B MLX model streams at roughly 25–40 tokens per second — faster than most cloud chat UIs feel in practice, because the first token appears immediately with no network round trip.

Does Gale store my conversations?

Yes, locally, in the iOS app sandbox. You can clear any conversation from Gale's chat list. Uninstalling Nimbus8 deletes every chat and every installed model.