Building a Fully Private RAG Chatbot — No Cloud, No Data Leakage

vInsights
August 11, 2025
11 minutes

In an era where AI assistants are everywhere, most rely on cloud-hosted Large Language Models (LLMs) like OpenAI or Anthropic. For many businesses, this is a non-issue. But for organisations with sensitive internal knowledge — whether due to IP concerns, compliance requirements, or security mandates — sending data to a public LLM is simply not an option.

Recently, Versalence AI delivered a fully private Retrieval-Augmented Generation (RAG) chatbot for a client with one clear mandate:

“Build an AI chatbot that can answer from our internal knowledge base — but without sending a single byte of data to any cloud-based LLM.”

The Challenge

The client’s environment was governed by strict data policies. Even anonymised or processed content was prohibited from leaving their infrastructure. This meant:

No OpenAI, Anthropic, or similar cloud APIs
No hybrid hosting where prompts are routed to the internet
No vector database hosted on third-party infrastructure

We needed to deliver enterprise-grade AI capability while keeping 100% of the data processing local.

The Solution — A Local-Only RAG Workflow

We designed and deployed a local LLM + RAG pipeline, entirely contained within the client’s own servers.

Architecture Overview:

Internal Knowledge Base Ingestion
- Source: Client’s internal KB and document repository
- Format: Primarily PDFs
Document Processing Pipeline
- PDFs stored in a local directory
- Gotenberg.dev used for PDF-to-text conversion
- Custom PDF Processor to handle:
  - Content cleaning
  - Text chunking (optimised for LLM context size)
  - Embedding generation
Local RAG Vector Store
- All embeddings stored in a self-hosted vector database
- No external API calls — all storage and retrieval remain on the client’s network
Local LLM Integration
- Model deployed on-premise (LLM server inside client’s infrastructure)
- Retrieval-Augmented Generation workflow pulls from the vector store
- Generates responses with zero internet dependence
Webchat Interface
- Secure chatbot widget deployed on the client’s website
- All interactions routed through the internal network to the LLM

The Result

The outcome was a precise, fast, and fully secure chatbot that operates without touching the public internet or any external AI provider.

Key Wins for the Client:

100% data privacy — No risk of leakage to cloud LLM providers
Regulatory compliance — Meets internal and external audit requirements
Operational control — Client owns the entire AI stack, from storage to inference
High answer accuracy — RAG ensures answers come only from verified internal documents

The Trade-Offs

Building AI without the cloud is possible — but it’s not instant.

Time-Intensive Build: Local deployment meant careful tuning of models, embeddings, and hardware.
Hardware Requirements: Client had to provision capable servers for LLM inference.
Ongoing Maintenance: Updates to documents require re-processing and re-embedding locally.

Despite these trade-offs, the payoff was peace of mind: a zero-leak, compliant AI solution that functions as an internal knowledge expert.

At Versalence AI, we don’t force clients into “one-size-fits-all” AI. Whether it’s cloud-native for speed or fully local for compliance, we design solutions to fit your operational reality.

If your organisation needs an AI chatbot but can’t risk data leaving your network, talk to us. We’ve done it. And we can do it for you.

Use Case

Versalence Blogs