Builds

Interactive proofs, dashboards, and case studies that make a new AI capability obvious: what changed, what held up in practice, and the patterns worth borrowing.

Interactive⚡ 3 hours

The Hot-Loop Benchmark

What this proves

In this benchmark, when both models were equally correct, the faster and more concise one kept the build loop moving. That compounds.

Head-to-head test of Gemma 4 E4B vs Qwen 3 VL 8B on an M4 Pro 24GB. Same correctness, 5x faster wall-clock, 4-18x shorter answers. One of them became the default.

gemma-4qwen-3ollama

Get new builds, breakdowns, and useful AI updates.