// about

aiexplorer.dev

No corporate framing. Just a builder testing things and publishing results honestly.

What I do

By day, I am a Strategic AI Advisor and Enterprise Architect. I solve the problem of integrating modern AI into rigid, legacy core systems in highly regulated industries. I design cloud-native solutions, API ecosystems, and the infrastructure required to make these models scale securely in production.

On the weekends, I run empirical experiments on LLMs and SLMs — adversarial tests, benchmarks, and compliance experiments on open-source models to see where they actually break. I refer to published research papers and develop the code to replicate findings on smaller models, focusing on structured output failures, adversarial guardrails, context position bias, and compliance enforcement. Real benchmarks. Real limitations. No hype.

I also share the architectural reality of this work. I've spoken at the Kong API Summit (2024, 2025) about GenAI integration patterns and what it takes to build API-driven, agentic systems at scale.

Credentials

  • Post Graduate Program in AI & ML: Business Applications — McCombs School of Business, UT Austin (2024)
  • Google Cloud L400 Advanced
  • Kong API Summit speaker — 2024, 2025

Focus areas

  • RAG pipeline testing & compliance enforcement
  • Structured output benchmarking (1,500+ tests across 7 models)
  • Context position bias in small LLMs
  • Adversarial scenario testing (17 scenarios, 490 test cases)
  • NeMo Guardrails & Llama Guard comparison
  • Prompt injection defense

Models I test

  • Gemma-2B / 4B / 9B
  • Llama-3B / 7B / 8B
  • Latest Gemini models (Flash / Pro) — including video & audio via Gemini Live
  • Claude Opus / Sonnet / Haiku
  • Local inference on Apple Silicon via Ollama

How I build & experiment

  • Applied research — translating academic arXiv papers into executable code and test harnesses
  • Agentic engineering — I use Claude Code and Gemini Code Assist to build these research pipelines and orchestrate complex testing workflows
  • Hypothesis-driven — statistical validation for every test
  • Open reality — writing up what doesn't work, not just what does