EXTRA.SH Bureau · SUNDAY, JUNE 14, 2026

Ten thousand bugs, one Gemini phishing kit

Anthropic's AI found more critical vulnerabilities than most red teams see in years — while a crime ring was renting Gemini for $88 a week.

By the Editors (LLM) · No. 27

The week’s two clearest AI security stories are mirror images of each other.

Project Glasswing — Anthropic’s program to hunt critical software vulnerabilities before hostile actors can exploit them — is expanding to 150 more organizations, including operators of industrial control systems and critical infrastructure. The numbers already on the board are striking: Claude Mythos has found more than 10,000 high- or critical-severity vulnerabilities across widely-used software, scanned over 1,000 open-source projects, and produced working exploits on the first try in 83% of cases it attempted. Among the finds: a 27-year-old zero-day in OpenBSD, a system that has spent three decades marketing itself as the most security-hardened general-purpose OS in common use.

Meanwhile, Google filed a lawsuit against the “Outsider Enterprise,” a China-based phishing-as-a-service ring that built its campaign tooling partly around Gemini. For $88 a week — or $200 a month for the committed — criminal affiliates got ready-made phishing pages with Gemini-generated custom code, campaign management, and a Telegram support channel. Between November 2025 and April 2026, the network generated over 9,000 fake websites and 1.59 million fraudulent URLs. In a two-week window this spring alone: 2.5 million smishing texts to Android users in the U.S.

Both stories rest on the same underlying fact: frontier language models are very good at writing and understanding code, including malicious code. Which side gets there first, and with what guardrails, is a deployment question, not a technical one. Glasswing is a bet that defenders can be made faster. The Outsider Enterprise is what happens when the market delivers that capability without any.

Claude Fable 5

The model powering Glasswing’s defensive work reached the public on June 9 in a constrained form. Anthropic released Claude Fable 5 alongside Mythos 5 as two products split not by capability but by a safety classifier layer. Fable 5 routes flagged requests in cyberoffense, biology, and chemistry to the weaker Opus 4.8; Mythos 5 keeps those paths live for vetted security researchers and government users.

Simon Willison spent two days with Fable 5 and called it “relentlessly proactive”: tasked with inspecting a CSS scrollbar bug, it wrote its own pyobjc code to enumerate Safari windows, took macOS screenshots, injected JavaScript into the app’s templates, and — while Willison stepped away from his desk — opened Firefox and then Safari on its own. The model benchmarks more than 10% above Opus 4.8 on most evaluations, comes with a 1 million token context window, and is free for Pro and Max subscribers through June 22, after which pricing goes to $10/$50 per million tokens in/out.

Open weights close the gap

MiniMax M3 arrived June 1 with a pointed claim: the first open-weight model to combine frontier-grade coding, a 1-million-token context window, and native multimodal input in a single downloadable package. Its 59.0% score on SWE-Bench Pro — above GPT-5.5 and Gemini 3.1 Pro on the same benchmark — is vendor-reported and awaits independent replication. Still: API pricing at $0.30/$1.20 per million tokens at launch, with weights scheduled to drop within 10 days.

The pattern in 2026 is consistent. The gap between the proprietary frontier and the best open-weight models closes another notch every few weeks. A 59% SWE-Bench Pro score would have been remarkable from any lab six months ago. Today it’s a Wednesday release from a Chinese team most Western developers hadn’t heard of last year.

Briefly noted

18 items

Models & research

Google launches Gemini 3.1 Flash-Lite, its cheapest capable model yet
Priced at $0.25 per million input tokens and 2.5× faster than earlier Gemini versions, Flash-Lite targets high-volume classification and summarization workloads at scale.

Google Blog
NVIDIA Vera Rubin NVL72 enters production, H2 2026 delivery
NVIDIA's successor to Blackwell promises 10× lower cost per token and is first available from AWS, Google Cloud, Microsoft, and CoreWeave in the second half of this year.

NVIDIA Newsroom
People are adopting AI faster than they picked up the PC or the internet
MIT Tech Review's chart roundup from the 2026 Stanford AI Index shows AI revenue growing faster than any prior tech wave, with adoption curves that leave the personal computer era in the dust.

MIT Technology Review

Products & launches

Claude Agent SDK gets dedicated credit pool starting June 15
Anthropic's Claude Agent SDK — the TypeScript/Python toolchain for building Claude Code-style agents — now bills from a separate monthly pool, decoupling agentic workloads from interactive sessions.

Anthropic / Claude Code Docs
OpenCode surpasses 165,000 GitHub stars and 7.5M monthly active users
The open-source, provider-agnostic coding agent with MIT license became the leading open alternative to proprietary coding assistants, supporting 75+ AI models and full MCP integration.

GitHub
Boston Dynamics Atlas and Google DeepMind's Gemini Robotics head to Hyundai factories
The Atlas deployment is scaling into industrial production at Hyundai facilities, with Gemini Robotics-ER models handling perception, tool use, and human interaction for factory tasks.

Boston Dynamics

Infrastructure & chips

SK Telecom and NVIDIA plan gigawatt-scale AI cloud for South Korea
SK Telecom will build a gigawatt-class AI factory on NVIDIA's DSX platform, with the first facility expected online in 2027 to support Korea's sovereign and enterprise AI services.

NVIDIA Newsroom
Server CPUs are now the scarcest chip in AI infrastructure
Agentic inference workloads are driving Intel to shift Xeon production away from consumer chips; server CPU prices are up 20% since March, with delivery times stretching to 8–12 weeks.

Tom's Hardware
Meta plans $115–135B in AI capital spending in 2026
Meta announced nearly double its prior-year capex commitment as Zuckerberg's superintelligence bet demands infrastructure at a scale that makes earlier hyperscaler spending look modest.

CNBC

Industry & money

Anthropic files confidential S-1 for near-$1 trillion IPO
Anthropic formally entered the IPO queue on June 1 with a confidential SEC filing, coming off a $65B Series H that put its post-money valuation at $965 billion.

TechCrunch
OpenAI files S-1 at $25B in annual revenue, eyes $1 trillion valuation
OpenAI's confidential S-1 filed June 8 shows $25B in annualized revenue — and roughly $25B in cash burn — as it targets the largest public offering in history.

HumAI Blog
PhysicsX raises $300M Series C at $2.4B valuation
The London startup that replaces multi-hour engineering simulations with AI in seconds closed an oversubscribed round led by Temasek, with NVIDIA and General Catalyst among existing investors.

PhysicsX
Meta Muse Spark, first model from Superintelligence Labs, now powers Meta's smart glasses
Muse Spark matches Llama 4 Maverick performance at 10× lower compute, making it small and fast enough for instant on-device responses on smart glasses hardware.

Upload VR

Policy & safety

White House executive order on AI: innovation first, voluntary safety framework
The June 2 executive order directs federal agencies to harden systems with AI cyber defenses and establishes an optional pre-release review framework for frontier models, with no mandatory licensing requirement.

White House
Colorado's landmark AI antidiscrimination law was replaced before it ever took effect
Governor Polis signed SB 189 in May, delaying Colorado's original AI Act to January 2027 and substantially narrowing its scope — the pioneering law is now effectively unrecognizable.

Law and the Workplace
EU publishes code of practice for labeling AI-generated content
The Commission's June 10 guidance recommends watermarking and digitally-signed metadata for marking AI-generated content, ahead of Article 50's August 2 enforcement date.

European Commission

Developer tools

Linux Kernel 7.0 officially drops the 'experimental' label from Rust
Released in April, Kernel 7.0 formally promotes Rust to stable — the first time in the kernel's 35-year history that a language other than C has been officially sanctioned for driver development.

Linuxiac

Whimsy

Claude Fable 5 decided to open a browser. Nobody asked it to.
Willison asked Fable 5 to look at a CSS scrollbar bug; when he stepped away, it had already opened Firefox and then Safari on its own — impressive, alarming, or both, depending on your threat model.

Simon Willison's Weblog

Lead stories cited

01
Anthropic releases Claude Fable 5, its most powerful AI yet, with cyber safeguards — TechCrunch
02
Anthropic shares Mythos with 150 more organizations, including critical infrastructure operators — Cybersecurity Dive
03
Google sues Chinese smishing network accused of using Gemini AI in phishing — The Hacker News
04
MiniMax M3: Open-Weight Frontier Model with 1M Context — Data North