Google's newest 4B model tested across 8 enterprise task suites against Gemma 2 2B, Gemma 3 4B, and Gemma 3 12B. Run locally on Apple Silicon.
We ran Gemma 4 E4B through 8 enterprise test suites — function calling, RAG grounding, classification, code generation, summarization, information extraction, multilingual, and multi-turn — and compared it head-to-head against three other Gemma models. The 4B model scored 83.6% overall, beating even the 12B.