How it works
Under the hood
NuraSafe is built to run entirely on your device. Here’s what powers it under the hood.
Tools & stack
- On-device LLM for chat and reasoning (no cloud required for core use).
- Core ML runs the multilingual E5 embedding model for semantic search.
- The knowledge base ships with the app as structured content and is indexed locally.
- Embeddings are cached after first setup so repeat launches stay fast; vector search uses an in-memory store for low-latency retrieval.
Intelligent retrieval (RAG)
When you send a message, it doesn’t always hit the knowledge base. A first, lightweight model pass reads your current message and recent conversation and decides: does this need NuraSafe’s safety knowledge, or is it general conversation?
- If it’s general chat, retrieval is skipped so the answer model isn’t distracted by unrelated passages.
- If knowledge is needed, the same pass produces a short search phrase tuned to your question (and to any active emergency mode, when that’s on).
- That phrase is checked for drift: if it doesn’t align well with what you actually asked, the system falls back to your wording so search stays faithful—without a slow second round of query generation.
Dual search: lexical + semantic
When retrieval runs, NuraSafe searches in two ways at once:
- Lexical (BM25) scoring over the knowledge text—strong for exact words and titles.
- Semantic (E5) similarity over 384-dimensional embeddings—strong for meaning and paraphrases.
Vectors are loaded into memory for search; they’re built from the bundled knowledge base when you first use the app (or when content updates), with on-disk caching so you don’t pay full embedding cost every time.
Ranking & when to trust semantics
Semantic matches below a similarity threshold are dropped so weak matches don’t pollute results. The system also watches whether embeddings are meaningfully spread out or squeezed together (a sign the semantic signal isn’t reliable). In that case, NuraSafe leans on lexical search alone. When semantic scores are healthy, results are ranked with a 60% semantic / 40% lexical blend, then the top passages are passed to the main model to write the answer—offline, on your phone.
Full architecture
For implementation detail—KnowledgeBase.json, index build, VectorStore, ChatEngine orchestration, prompt injection, and constants—see the in-depth reference.
Technical architecture