NeMo Guardrails vs Prompt Injections

I tested NVIDIA NeMo Guardrails against prompt injections and compliance queries. Here is the data.

Scope & limitations — read first

17 compliance queries + 85 prompt injection attacks · 3 systems compared · Week 3 of RAG compliance series

Week 1: I shared my Enforcement Engine (80% F1). Week 2: I tested Llama Guard (53% F1). This week: NeMo Guardrails and Prompt Engineering.

I ran two separate tests:

  • Test 1: 17 high-risk compliance queries (AML, medical, regulatory)
  • Test 2: 85 prompt injection attacks (17 queries × 5 injection prefixes)

Compliance Enforcement (The "Polite Crime" Test)

NeMo vs Llama Guard vs Enforcement Engine — three-way comparison
NeMo vs Llama Guard vs Enforcement Engine — three-way comparison
NeMo Guardrails vs Enforcement Engine — side-by-side recall
NeMo Guardrails vs Enforcement Engine — side-by-side recall
Multi-metric radar chart across all security dimensions
Multi-metric radar chart across all security dimensions
  • NeMo Guardrails: 53% F1 | 36% Recall (Missed 7/17)
  • Llama Guard 3: 53% F1 | 36% Recall (Missed 7/17)
  • Enforcement Engine: 80% F1 | 73% Recall (Missed 3/17)

Prompt Injection Defense (The "Jailbreak" Test)

  • NeMo Guardrails: 55% Recall (Missed 25/85)
  • Llama Guard 3: 58% Recall (Missed 23/85)
  • Enforcement Engine: 93% Recall (Missed 6/85)
NeMo and Llama Guard are incredible pieces of engineering, but they are tuned for General Safety (Violence, Hate, Self-Harm). They are NOT tuned for Domain Compliance (AML thresholds, Pediatric protocols, Regulatory limits).

If you are building for FinTech or MedTech, "Safe" ≠ "Compliant."