InteractiveFlagship proof⚡ 3 hours
The Hot-Loop Benchmark
What this proves
In this benchmark, when both models were equally correct, the faster and more concise one kept the build loop moving. That compounds.
Head-to-head test of Gemma 4 E4B vs Qwen 3 VL 8B on an M4 Pro 24GB. Same correctness, 5x faster wall-clock, 4-18x shorter answers. One of them became the default.
gemma-4qwen-3ollama