What is Haze?

Haze is Nimbus8's on-device memory module — a private vector store that lets every other module (Gale, Cirrus, Mist, Ashe, and the rest) recall what you said, learned, or decided in past conversations without sending a single byte to a remote service.

Every memory is a short excerpt plus a local embedding. Retrieval happens with an on-device approximate nearest-neighbour search over that index. Haze is currently in the exploring phase: the runtime is being tuned against real device RAM budgets before it ships alongside the other modules.

Per-conversation vs cross-chat

By default, Haze scopes memories to the conversation they were written in. A note you made while planning a trip doesn't bleed into a debugging session in Cirrus a week later. This keeps recall tight and relevant — and keeps accidental context leaks from happening.

You can lift individual memories, or whole conversations, into a cross-chat pool. Cross-chat recall is opt-in per memory, and every lifted item is tagged visibly in the recall panel so you know what's drawing from where. There is no hidden global pool.

How embeddings are computed

Each memory is passed through a small embedding model (bge-small-en-v1.5 by default) converted to Core ML and run on the Neural Engine. Typical latency on an A17 Pro is a few milliseconds per chunk; on older devices, Haze falls back to a GGUF build of the same model via llama.cpp for broader reach.

Embeddings are stored alongside the source text in an encrypted on-disk index inside the app sandbox. No embedding API is ever called — there is no OpenAI, Cohere, or Voyage integration to miss.

Ranking and recall

When a module asks Haze a question, the query is embedded with the same model used at write time, then matched against the index using cosine similarity. The top-K candidates are re-scored with a lightweight local reranker that weights recency, source module, and conversation scope.

The result is what you see in the phone mockup: a short list of memories, each with its date, source module, similarity score, and conversation tag. The reply from the model is always grounded in those visible memories — if a memory isn't in the list, it didn't influence the answer.

Storage, encryption, deletion

Memories live in the app's iOS sandbox, behind standard iOS data protection. The vector index is encrypted at rest; keys are held in the Secure Enclave and never leave the device. There is no sync, no backup to iCloud (unless you explicitly enable encrypted device backup in iOS Settings), and no cross-device mirror.

You can delete a single memory, clear a conversation's memory scope, or wipe the entire vector store from Haze's settings. Uninstalling Nimbus8 removes everything — memories, embeddings, and the index itself.

Embedding models supported

Haze ships a curated set of small, fast embedding models. All of them are open-weight, run on the Neural Engine or GPU, and fit in a modest memory footprint:

bge-small-en-v1.5 — default. Strong English recall, ~33M params, Core ML optimized.
e5-small-v2 — alternative English pick with slightly different ranking behaviour on short queries.
gte-small — compact general-purpose English embedder with good performance on technical notes.
multilingual-e5-small — for memories written across multiple languages; shares an index across them.

You pick one per vector store at creation time. Switching models later means re-embedding the store, which Haze will do in the background on the next idle window.

FAQ

Does Haze need an internet connection?

Only to download the embedding model once. After that, every write and every recall happens offline — airplane mode is fully supported.

How many memories can Haze hold?

Target on a current iPhone is tens of thousands of memories per store with sub-100ms recall. The exact ceiling depends on device RAM and embedding dimension; Haze reports the current index size and expected query latency in settings.

Can modules share memories without me opting in?

No. Cross-chat and cross-module recall are both opt-in, and every lifted memory is visibly tagged in the recall panel. The default is that a conversation's memories stay with that conversation.

Is the vector store encrypted?

Yes. The index is encrypted at rest using a key held in the Secure Enclave, and it lives inside the app sandbox behind iOS data protection. Nothing is mirrored off-device.

When does Haze ship?

Haze is in the exploring phase. It's being tuned against real device RAM budgets before shipping alongside the other modules — no dates yet. It will land when it's right, not when it's ready to demo.

Memory that stays on the phone.

Per-conversation scope

Local embeddings

Explainable recall

What is Haze?

Per-conversation vs cross-chat

How embeddings are computed

Ranking and recall

Storage, encryption, deletion

Embedding models supported

FAQ

Does Haze need an internet connection?

How many memories can Haze hold?

Can modules share memories without me opting in?

Is the vector store encrypted?

When does Haze ship?

Memory that stays on the phone.

Per-conversation scope

Local embeddings

Explainable recall

What is Haze?

Per-conversation vs cross-chat

How embeddings are computed

Ranking and recall

Storage, encryption, deletion

Embedding models supported

FAQ

Does Haze need an internet connection?

How many memories can Haze hold?

Can modules share memories without me opting in?

Is the vector store encrypted?

When does Haze ship?

Everything that comes with Nimbus8.