Coinbase Halves AI Costs by Routing to Cheaper Models, No Engineer Caps

By Meridian48 News Desk · Summarised from DEV Community · June 30, 2026

Coinbase cut AI spend 50% by defaulting engineers to open-weight models GLM 5.2 and Kimi 2.7, improving cache hit rate from 5% to 60%, and routing tasks by complexity. CEO Brian Armstrong shared five levers, noting 91% of engineers never hit old usage limits. The shift signals enterprise budget pressure driving adoption of cheaper models, threatening OpenAI and Anthropic revenue.

Meridian48 take

The real story isn't cost-cutting but the enterprise shift to open-weight models—a trend that could reshape the AI market if quality holds.

Read the full reporting

Coinbase Cut Its AI Spend in Half Without Throttling Engineers - Here's the Playbook →

DEV Community

ai-cost-optimizationopen-weight-models

Coinbase Halves AI Costs by Routing to Cheaper Models, No Engineer Caps

Google shuts down Tenor GIF API, impacting X, Discord, and others

T-Mobile to Retire 1,100 Legacy Plans, Forcing Customer Migration

Inside the Shadow Market for Amazon Employee Access