Private, air-gapped RAG system letting a compliance team query 4M+ regulatory documents with 92% accuracy — zero data leaks to public LLMs.
A fintech compliance team needed to answer policy questions against 4M+ regulatory PDFs — SEC filings, FINRA rules, internal policy docs — in real time. The catch: none of this data could touch a public LLM API. Any leak would be a regulatory violation.
They were spending 3–4 hours per query using keyword search + manual reading. At 20+ queries per day per analyst, that was 60–80 hours of analyst time wasted on lookups.
A fully self-hosted RAG pipeline running on Azure Private Endpoint infrastructure:
text-embedding-3-large running on Azure OpenAI (no data leaves the tenant)Everything runs inside the client’s Azure subscription:
Regulatory PDFs are full of data tables. Standard PDF text extraction flattens them into garbage. I wrote a custom extractor that detects table regions (via pdfminer layout boxes), reconstructs row/column structure, and serializes to markdown before chunking. This alone moved accuracy from 71% to 88% on table-heavy queries.
Cross-encoders are underused. The retrieval quality improvement from adding a fine-tuned cross-encoder re-ranker was larger than any prompt engineering change. If you’re building RAG and haven’t benchmarked a re-ranker, do it first.
30 minutes, free, no deck. We'll figure out if I'm the right fit for your project.