NuraSafe
NuraSafe

How it works

Under the hood

NuraSafe is built to run entirely on your device. Here’s what powers it under the hood.

Tools & stack

  • On-device LLM for chat and reasoning (no cloud required for core use).
  • Core ML runs the multilingual E5 embedding model for semantic search.
  • The knowledge base ships with the app as structured content and is indexed locally.
  • Embeddings are cached after first setup so repeat launches stay fast; vector search uses an in-memory store for low-latency retrieval.

Intelligent retrieval (RAG)

When you send a message, it doesn’t always hit the knowledge base. A first, lightweight model pass reads your current message and recent conversation and decides: does this need NuraSafe’s safety knowledge, or is it general conversation?

  • If it’s general chat, retrieval is skipped so the answer model isn’t distracted by unrelated passages.
  • If knowledge is needed, the same pass produces a short search phrase tuned to your question (and to any active emergency mode, when that’s on).
  • That phrase is checked for drift: if it doesn’t align well with what you actually asked, the system falls back to your wording so search stays faithful—without a slow second round of query generation.

Ranking & when to trust semantics

Semantic matches below a similarity threshold are dropped so weak matches don’t pollute results. The system also watches whether embeddings are meaningfully spread out or squeezed together (a sign the semantic signal isn’t reliable). In that case, NuraSafe leans on lexical search alone. When semantic scores are healthy, results are ranked with a 60% semantic / 40% lexical blend, then the top passages are passed to the main model to write the answer—offline, on your phone.

Full architecture

For implementation detail—KnowledgeBase.json, index build, VectorStore, ChatEngine orchestration, prompt injection, and constants—see the in-depth reference.

Technical architecture
← Back to home