How it works
Under the hood
NuraSafe is built to run entirely on your device. Here’s what powers it under the hood.
Emergency-mode retrieval (unique to crises)
When you activate an emergency mode, NuraSafe doesn’t answer from general chat memory alone. It switches to a dedicated retrieval pipeline: the bundled knowledge base is filtered to your scenario (fire, flood, radiation, etc.) plus general safety content, then hybrid search finds the best passages to inject into the prompt.
That design means responses can be as well grounded in authoritative safety knowledge as answers from frontier-class cloud assistants when those systems lean on curated retrieval—except here nothing leaves your phone: no API calls, no uploads, full offline operation.
For everyday chit-chat, retrieval is often skipped so the model isn’t distracted. In a crisis, retrieval turns on when your question needs NuraSafe’s offline knowledge—so you get domain-sharp answers, not generic filler.
Tools & stack
- On-device LLM for chat and reasoning (no cloud required for core use).
- Core ML runs the multilingual E5 embedding model for semantic search.
- The knowledge base ships with the app as structured content and is indexed locally.
- Embeddings are cached after first setup so repeat launches stay fast; vector search uses an in-memory store for low-latency retrieval.
Intelligent retrieval (RAG)
When you send a message, it doesn’t always hit the knowledge base. A first, lightweight model pass reads your current message and recent conversation and decides: does this need NuraSafe’s safety knowledge, or is it general conversation?
- If it’s general chat, retrieval is skipped so the answer model isn’t distracted by unrelated passages.
- If knowledge is needed, the same pass produces a short search phrase tuned to your question (and to any active emergency mode, when that’s on).
- That phrase is checked for drift: if it doesn’t align well with what you actually asked, the system falls back to your wording so search stays faithful—without a slow second round of query generation.
Dual search: lexical + semantic
When retrieval runs, NuraSafe searches in two ways at once:
- Lexical (BM25) scoring over the knowledge text—strong for exact words and titles.
- Semantic (E5) similarity over 384-dimensional embeddings—strong for meaning and paraphrases.
Vectors are loaded into memory for search; they’re built from the bundled knowledge base when you first use the app (or when content updates), with on-disk caching so you don’t pay full embedding cost every time.
Ranking & when to trust semantics
Semantic matches below a similarity threshold are dropped so weak matches don’t pollute results. The system also watches whether embeddings are meaningfully spread out or squeezed together (a sign the semantic signal isn’t reliable). In that case, NuraSafe leans on lexical search alone. When semantic scores are healthy, results are ranked with a 60% semantic / 40% lexical blend, then the top passages are passed to the main model to write the answer—offline, on your phone.
Full architecture
For implementation detail—KnowledgeBase.json, index build, VectorStore, ChatEngine orchestration, prompt injection, and constants—see the in-depth reference.
Technical architecture