Builds

Interactive proofs, dashboards, and case studies that make a new AI capability obvious: what changed, what held up in practice, and the patterns worth borrowing.

Interactive⚡ 1 session

Agent Setup Scorecard

What this proves

A pass rate only helps when you know which agent setup produced it and what it fails on. The scorecard turns a fuzzy model choice into a named setup pattern, a failure row, and the next fix.

Compare three named agent setup patterns against the same dummy task suite, then read which setup to use and what still needs fixing before it touches a real workflow.

google-io-2026agent-evalsgemini

Get new builds, breakdowns, and useful AI updates.