A useful clarification is happening in AI this week, not in the labs but in what the labs’ products are actually doing. AlphaEvolve, Google DeepMind’s Gemini-powered evolutionary coding agent, marked its first anniversary by publishing a detailed accounting of where it has gone since the paper: into production. Not pilot production, not demo production — actual production. AlphaEvolve now optimizes Google’s TPU design pipeline and manages Spanner database parameters, cutting write amplification by 20 percent. Applied to DeepConsensus, a DNA sequencing error-correction model, it reduced variant detection errors by 30 percent. For power-grid optimization, it took a model that could find feasible solutions 14 percent of the time to one that does it 88 percent of the time. On Google’s Willow quantum processor, it found circuits with 10x lower error than human-optimized baselines. Commercial deployments are running too: Klarna doubled training speed on one of its larger transformer models; FM Logistic saved more than 15,000 kilometers of annual truck travel. A year ago AlphaEvolve was a research paper. Now it’s in the stack.

The same displacement is playing out in the open-weight world. MiniMax M3, released June 1 by the Shanghai lab MiniMax, is described by the company as the first open-weight model to combine frontier-level coding performance, a one-million-token context window, and native multimodal capabilities — image, video, and desktop-computer operation trained in from step zero, not bolted on afterward. Independent benchmark verification is pending (model weights are expected around June 10), but MiniMax’s own agentic tests are specific enough to be testable: M3 independently reproduced an ICLR 2025 paper over twelve hours without human intervention, producing 18 commits and 23 figures. Separately, it spent twenty-four hours optimizing a GPU kernel, pushing Hopper hardware utilization from 7.6 to 71.3 percent over 147 attempts before reaching its best solution. API pricing is $0.60 per million input tokens — a fraction of the closed models it claims to rival.

Who’s benefiting from Copilot’s billing reset

GitHub’s switch to token-based Copilot billing on June 1 is still producing ripples. Developers using Copilot for heavy agentic workflows are reporting 10x to 50x cost increases; one Copilot Pro+ subscriber estimated they’d exhausted 8 percent of their monthly credit allotment in two hours. GitHub’s position is defensible — agentic coding loops burn compute that flat fees cannot cover — but developer frustration is real and is converting into action.

The clearest beneficiary is OpenCode, a terminal-based coding agent built by the team behind Serverless Stack. This week it crossed 160,000 GitHub stars and 7.5 million monthly active developers — the most-adopted open-source coding agent ever built, without backing from Anthropic, Google, or Microsoft. It supports 75-plus AI providers (Claude, Gemini, GPT, local models via Ollama), integrates the language server protocol so compiler errors feed back into the model context, and runs fully air-gapped for regulated industries. For developers who bring their own API keys, effective monthly cost is roughly $2-5. The Copilot pricing transition is functioning, in part, as the best marketing OpenCode has ever had.

That dynamic — proprietary tools raising prices, open ecosystems filling the gap — is a reliable pattern in software. It played out in databases, operating systems, and cloud tooling. Whether AI coding agents develop the same gravitational pull that open-source databases eventually did depends on whether the model quality gap stays wide enough to keep proprietary tools sticky. OpenCode supports MiniMax M3, and MiniMax M3 is trying to match Claude Opus 4.7 on coding benchmarks. The Copilot billing shock is a test. The star counts suggest the answer is not yet settled.