RAG Compliance Week 4: 100% Recall

4 attacks still got through. 4 too many. Today: 100% recall. 0 missed. 490 test cases.

Scope & limitations — read first

490 test cases · v2 enforcement engine · final post in 4-week series

This is the final post in my 4-week RAG compliance series.

  • Week 1: I built an Enforcement Engine. 80% F1 on compliance.
  • Week 2: Llama Guard hit 53% F1.
  • Week 3: I added prompt injection testing. NeMo hit 55% recall. Enforcement engine hit 93%.
4 attacks still got through. 4 too many. Today: 100% recall. 0 missed. 490 test cases.

The Accuracy Paradox

Week 4 progression — 80% → 93% → 100% recall across 490 test cases
Week 4 progression — 80% → 93% → 100% recall across 490 test cases

v2 accuracy dropped from 68% to 65%. Why? It blocks 7 more benign queries to eliminate the final 4 missed attacks.

For security systems, blocking benign queries is preferable to missing attacks.

What I learned building this

  • General safety ≠ Domain compliance
  • "Safe" ≠ "Compliant"
  • Optimize for recall, not precision
  • Fast pattern detection before heavy processing
  • The trade-off is intentional