Projects — Built & tested

01LLM Policy Enforcement Engine

Enforcement Engine

A 4-tier verification pipeline that enforces domain-specific KB rules on LLM outputs and blocks prompt injection attacks with 100% recall across 185 test cases.

100%Attack recall

490Test cases

82KB rules

Results & architecture

NeMo vs Llama Guard vs Enforcement Engine

Key features

› Tier 0: XGBoost injection detection on CPU (~0.5ms)

› Tier 1: Go sentinel for regex + obfuscation detection

› Tier 2: Semantic routing with activation steering

› Tier 3: LLM generation with NLI verification

› 42-45% better recall than Llama Guard 3 and NeMo Guardrails

› Zero false positives on benign queries

Tech stack

PythonFastAPIGoXGBoostDeBERTa NLIgRPCSentence Transformers

Private repoRequest access →

02AI Legal Analysis with Real-Time Voice

Contract Paranoia

Real-time voice-based contract analysis using Google Gemini Live API and Agent Development Kit. Users talk to "Para," an AI legal buddy that flags risky clauses with search-grounded citations.

~1.6sLatency

3Risk levels

Full-duplexVoice

Results & architecture

Multi-agent architecture — Para + Analyzer + Judge

Key features

› Bidirectional voice with interruption support

› Multi-agent: root agent + analyzer sub-agent

› RED / YELLOW / GREEN clause risk flagging

› Google Search grounding prevents hallucinated legal advice

› Judge Agent audits quality with 8-point rubric

› Session persistence with conversation recovery on drops

Tech stack

ReactTypeScriptFastAPIGemini Live APIGoogle ADKWebSocketDockerCloud Run

Private repoRequest access →

03AI Mock Interview Platform

PrepVoice

Full-stack AI interview prep platform with real-time voice interaction. Analyzes job descriptions and resumes, conducts adaptive mock interviews, and tracks readiness progression.

5Scoring dims

3LLM backends

6+Domains

Key features

› Real-time voice interviews with follow-up questions

› Multi-dimensional scoring (technical, communication, depth, JD relevance, STAR)

› Gap analysis between JD requirements and resume

› Body language feedback via MediaPipe

› Level-aware questions from junior to director

› Session replay with full transcripts and scores

Tech stack

Next.js 14TypeScriptFastAPIPostgreSQLOllamaClaudeGeminiWeb Speech API

Private repoRequest access →

041,500+ Tests on Small LLM JSON Generation

Structured Output JSON

Rigorous test harness proving small LLMs fail at JSON generation in silent, dangerous ways. A well-instructed 2B model jumped from 30% to 90% compliance, outperforming 7B models on defaults.

1,500+Tests

7Models

8KB rules

Results & architecture

Structured Output Reliability — 4-panel comparison

Compliance vs Model Size — KB rules close the gap

KB Rule Ablation — which rules actually matter

Key features

› Multi-backend: HuggingFace Transformers + Ollama

› Detects parse failures, hallucinated fields, type mismatches, silent failures

› Ablation study showing 1 rule held accuracy together

› 4 progressively complex JSON schemas tested

› JSON Mode degraded 2 of 3 models tested

› 95% confidence intervals with proper statistics

Tech stack

PythonPyTorchHuggingFaceOllamaMatplotlibJSON Schema

Private repoRequest access →

05Context Position Bias in Small LLMs

Lost in the Middle

Empirical study testing whether the "Lost in the Middle" phenomenon from GPT-scale papers applies to 2-4B models. Each architecture shows distinct position-handling behavior — the classic U-curve does not appear.

~500/modelTrials

7Positions

3Models

Key features

› Gemma-2B: strong recency bias (p=0.023)

› Llama-3B: completely flat — no position effect (p=1.0)

› Gemma-4B: weak middle dip, not statistically significant

› 7 hard semantic distractors per QA pair

› 70-100 document contexts (~7-10K tokens)

› Replication of Liu et al., 2023 on smaller models

Tech stack

PythonPyTorchHuggingFaceSciPyMatplotlib

Private repoRequest access →

← Back to home

Built & tested.

Enforcement Engine

Results & architecture

Key features

Tech stack

Contract Paranoia

Results & architecture

Key features

Tech stack

PrepVoice

Key features

Tech stack

Structured Output JSON

Results & architecture

Key features

Tech stack

Lost in the Middle

Key features

Tech stack