Per-conversation scope
Memories are scoped to one chat by default — context stays where it was written. You opt into cross-chat recall on the memories you want lifted to the global pool.
Haze gives Nimbus8's modules a long-term memory — a private vector store scoped per conversation or across them. Embeddings, recall, and ranking all run locally; nothing is synced, mirrored, or phoned home.
Memories are scoped to one chat by default — context stays where it was written. You opt into cross-chat recall on the memories you want lifted to the global pool.
bge-small and e5-small run on the Neural Engine via Core ML. No embedding API calls, no tokens shipped off-device — every vector is computed and stored locally.
Every answer shows the memories it drew from, with similarity scores and source chats. You always know which past moments are shaping what the model just said.
Haze is Nimbus8's on-device memory module — a private vector store that lets every other module (Gale, Cirrus, Mist, Ashe, and the rest) recall what you said, learned, or decided in past conversations without sending a single byte to a remote service.
Every memory is a short excerpt plus a local embedding. Retrieval happens with an on-device approximate nearest-neighbour search over that index. Haze is currently in the exploring phase: the runtime is being tuned against real device RAM budgets before it ships alongside the other modules.
By default, Haze scopes memories to the conversation they were written in. A note you made while planning a trip doesn't bleed into a debugging session in Cirrus a week later. This keeps recall tight and relevant — and keeps accidental context leaks from happening.
You can lift individual memories, or whole conversations, into a cross-chat pool. Cross-chat recall is opt-in per memory, and every lifted item is tagged visibly in the recall panel so you know what's drawing from where. There is no hidden global pool.
Each memory is passed through a small embedding model (bge-small-en-v1.5 by default) converted to Core ML and run on the Neural Engine. Typical latency on an A17 Pro is a few milliseconds per chunk; on older devices, Haze falls back to a GGUF build of the same model via llama.cpp for broader reach.
Embeddings are stored alongside the source text in an encrypted on-disk index inside the app sandbox. No embedding API is ever called — there is no OpenAI, Cohere, or Voyage integration to miss.
When a module asks Haze a question, the query is embedded with the same model used at write time, then matched against the index using cosine similarity. The top-K candidates are re-scored with a lightweight local reranker that weights recency, source module, and conversation scope.
The result is what you see in the phone mockup: a short list of memories, each with its date, source module, similarity score, and conversation tag. The reply from the model is always grounded in those visible memories — if a memory isn't in the list, it didn't influence the answer.
Memories live in the app's iOS sandbox, behind standard iOS data protection. The vector index is encrypted at rest; keys are held in the Secure Enclave and never leave the device. There is no sync, no backup to iCloud (unless you explicitly enable encrypted device backup in iOS Settings), and no cross-device mirror.
You can delete a single memory, clear a conversation's memory scope, or wipe the entire vector store from Haze's settings. Uninstalling Nimbus8 removes everything — memories, embeddings, and the index itself.
Haze ships a curated set of small, fast embedding models. All of them are open-weight, run on the Neural Engine or GPU, and fit in a modest memory footprint:
You pick one per vector store at creation time. Switching models later means re-embedding the store, which Haze will do in the background on the next idle window.
Only to download the embedding model once. After that, every write and every recall happens offline — airplane mode is fully supported.
Target on a current iPhone is tens of thousands of memories per store with sub-100ms recall. The exact ceiling depends on device RAM and embedding dimension; Haze reports the current index size and expected query latency in settings.
No. Cross-chat and cross-module recall are both opt-in, and every lifted memory is visibly tagged in the recall panel. The default is that a conversation's memories stay with that conversation.
Yes. The index is encrypted at rest using a key held in the Secure Enclave, and it lives inside the app sandbox behind iOS data protection. Nothing is mirrored off-device.
Haze is in the exploring phase. It's being tuned against real device RAM budgets before shipping alongside the other modules — no dates yet. It will land when it's right, not when it's ready to demo.