Nimbus8 Coming to the App Store
Exploring · Haze · Memory

Memory that stays on the phone.

Haze gives Nimbus8's modules a long-term memory — a private vector store scoped per conversation or across them. Embeddings, recall, and ranking all run locally; nothing is synced, mirrored, or phoned home.

iOS 17+iPhone & iPad
100%on-device
bge-small · e5-small · Core MLlocal embeddings
9:41
Haze
What was that model I tried last month that handled long PDFs well?
Recalled · 3 memories
Mar 28, 2026 · Gale
Tested Phi-3.5-Mini-128k on the HOA PDF — handled 94 pages cleanly, recall was spotty after ~70k tokens.
0.92conversation: "pdf-long-context"
Mar 31, 2026 · Gale
Qwen2.5-7B-instruct with YaRN extension — better coherence at 64k than Phi, slower first-token latency.
0.88conversation: "pdf-long-context"
Apr 04, 2026 · Cirrus
Deepseek-Coder at 32k handled the monorepo map without thrashing. Not PDF but relevant for long-context fit.
0.71conversation: "monorepo-map"
You tested two long-context picks: Phi-3.5-Mini-128k (Mar 28, clean through ~70k) and Qwen2.5-7B-instruct with YaRN (Mar 31, better coherence at 64k). Qwen was the one you settled on.
Ask across memories…

Per-conversation scope

Memories are scoped to one chat by default — context stays where it was written. You opt into cross-chat recall on the memories you want lifted to the global pool.

Local embeddings

bge-small and e5-small run on the Neural Engine via Core ML. No embedding API calls, no tokens shipped off-device — every vector is computed and stored locally.

Explainable recall

Every answer shows the memories it drew from, with similarity scores and source chats. You always know which past moments are shaping what the model just said.

What is Haze?

Haze is Nimbus8's on-device memory module — a private vector store that lets every other module (Gale, Cirrus, Mist, Ashe, and the rest) recall what you said, learned, or decided in past conversations without sending a single byte to a remote service.

Every memory is a short excerpt plus a local embedding. Retrieval happens with an on-device approximate nearest-neighbour search over that index. Haze is currently in the exploring phase: the runtime is being tuned against real device RAM budgets before it ships alongside the other modules.

Per-conversation vs cross-chat

By default, Haze scopes memories to the conversation they were written in. A note you made while planning a trip doesn't bleed into a debugging session in Cirrus a week later. This keeps recall tight and relevant — and keeps accidental context leaks from happening.

You can lift individual memories, or whole conversations, into a cross-chat pool. Cross-chat recall is opt-in per memory, and every lifted item is tagged visibly in the recall panel so you know what's drawing from where. There is no hidden global pool.

How embeddings are computed

Each memory is passed through a small embedding model (bge-small-en-v1.5 by default) converted to Core ML and run on the Neural Engine. Typical latency on an A17 Pro is a few milliseconds per chunk; on older devices, Haze falls back to a GGUF build of the same model via llama.cpp for broader reach.

Embeddings are stored alongside the source text in an encrypted on-disk index inside the app sandbox. No embedding API is ever called — there is no OpenAI, Cohere, or Voyage integration to miss.

Ranking and recall

When a module asks Haze a question, the query is embedded with the same model used at write time, then matched against the index using cosine similarity. The top-K candidates are re-scored with a lightweight local reranker that weights recency, source module, and conversation scope.

The result is what you see in the phone mockup: a short list of memories, each with its date, source module, similarity score, and conversation tag. The reply from the model is always grounded in those visible memories — if a memory isn't in the list, it didn't influence the answer.

Storage, encryption, deletion

Memories live in the app's iOS sandbox, behind standard iOS data protection. The vector index is encrypted at rest; keys are held in the Secure Enclave and never leave the device. There is no sync, no backup to iCloud (unless you explicitly enable encrypted device backup in iOS Settings), and no cross-device mirror.

You can delete a single memory, clear a conversation's memory scope, or wipe the entire vector store from Haze's settings. Uninstalling Nimbus8 removes everything — memories, embeddings, and the index itself.

Embedding models supported

Haze ships a curated set of small, fast embedding models. All of them are open-weight, run on the Neural Engine or GPU, and fit in a modest memory footprint:

  • bge-small-en-v1.5 — default. Strong English recall, ~33M params, Core ML optimized.
  • e5-small-v2 — alternative English pick with slightly different ranking behaviour on short queries.
  • gte-small — compact general-purpose English embedder with good performance on technical notes.
  • multilingual-e5-small — for memories written across multiple languages; shares an index across them.

You pick one per vector store at creation time. Switching models later means re-embedding the store, which Haze will do in the background on the next idle window.

FAQ

Does Haze need an internet connection?

Only to download the embedding model once. After that, every write and every recall happens offline — airplane mode is fully supported.

How many memories can Haze hold?

Target on a current iPhone is tens of thousands of memories per store with sub-100ms recall. The exact ceiling depends on device RAM and embedding dimension; Haze reports the current index size and expected query latency in settings.

Can modules share memories without me opting in?

No. Cross-chat and cross-module recall are both opt-in, and every lifted memory is visibly tagged in the recall panel. The default is that a conversation's memories stay with that conversation.

Is the vector store encrypted?

Yes. The index is encrypted at rest using a key held in the Secure Enclave, and it lives inside the app sandbox behind iOS data protection. Nothing is mirrored off-device.

When does Haze ship?

Haze is in the exploring phase. It's being tuned against real device RAM budgets before shipping alongside the other modules — no dates yet. It will land when it's right, not when it's ready to demo.