GPT-5.5's First Week: 2x Revenue Growth, Codex Doubles in Days

💡 INSIGHT

GPT-5.5's First Week: 2x Revenue Growth, Codex Doubles in Days

One week in and GPT-5.5 is already OpenAI's strongest model launch by revenue — API growth running more than 2x faster than any prior release, with Codex doubling revenue in under seven days. Frontier labs rarely share concrete commercial metrics, so these numbers matter: they confirm that model quality improvements still translate directly to developer spending. The question now is whether this pace holds once the launch-week migration rush settles. (5,884 likes | 255 RTs) Read more →

Sam Altman Tells Developers: Use Claude Code If It Works Better

In a rare moment of public détente, Sam Altman told developers to use Codex or Claude Code — whatever works best — calling the "which is better" polls silly. With 14.6K likes, the developer community clearly agrees with the pragmatic stance. Read between the lines: when the CEO of OpenAI is comfortable telling you to use the competition, it means he thinks the war will be won on platform lock-in, not individual tool loyalty. (14,593 likes | 726 RTs) Read more →

Open-Source Models Hit Closed-Model Parity on Batch Workloads: Kimi 2.6 and GLM 5.1 are now performing within striking distance of closed models on batch jobs, and enterprises are starting to shift — the API margin premium is harder to justify when open alternatives deliver comparable quality. Speed remains the gap, but for async workloads that's irrelevant. (234 likes | 10 RTs) Read more →

HuggingFace CEO: Labs That Trained by Distilling Are 'Pulling the Ladder': Clement Delangue argues every major lab benefited from distillation during training, and now restricting the technique amounts to pulling the ladder up behind them. LeCun's retweet amplifies what's becoming the central tension in the open vs. closed access fight — and today's MODEL LITERACY section explains why this matters technically. (307 likes | 33 RTs) Read more →

Spotify Creates 'Verified Human' Badges to Flag AI-Generated Music: Spotify is the first major platform to introduce an explicit human/AI content distinction — verified badges that mark human artists. It's a precedent-setting move that other creative platforms will almost certainly follow. The interesting question: what happens when human artists use AI in their production workflow? Where's the line? (187 likes | 207 RTs) Read more →

🧠 LAUNCH

Grok 4.3 Ships at Sonnet-Level Quality for 5x Less

xAI drops Grok 4.3 claiming Sonnet 4.6-level capability at 5x lower cost and higher speed. If the benchmarks hold up under independent evaluation, this is serious price-performance disruption in the mid-tier model market — the segment where most production workloads actually run. Worth benchmarking against your current provider before your next billing cycle. (670 likes | 24 RTs) Read more →

Diffusers 0.38 Brings Discrete Diffusion to Text Generation: HuggingFace Diffusers 0.38.0 ships support for LLaDA2 — discrete diffusion language models that generate text through block-wise iterative refinement instead of autoregressive decoding. This is a fundamentally different generation paradigm now accessible in the standard diffusion library, alongside new image and audio pipelines. Read more →

NVIDIA Ships Quantized Kimi-K2.6 for Consumer GPUs: NVIDIA's LLM Compressor team releases NVFP4 and FP8 checkpoints for Kimi-K2.6 — making a frontier-competitive open model runnable on consumer NVIDIA hardware. When the GPU vendor itself is shipping quantized open-model checkpoints, the message is clear: they're betting on the open ecosystem as a driver of GPU sales. (198 likes | 21 RTs) Read more →

🔬 RESEARCH

First Rigorous RCT Shows AI Therapy Works at Scale

A randomized controlled trial with Mexican women found that an AI therapy chatbot improved mental health outcomes by 0.3 standard deviations over six months — with gains in sleep quality, daily functioning, and labor market outcomes, and no increase in severe cases. This isn't another "users liked chatting with it" survey — it's one of the strongest pieces of causal evidence for AI-delivered mental health interventions, using the gold standard of clinical research methodology. If you've been skeptical about AI therapy, this is the study to read. (458 likes | 56 RTs) Read more →

UK Safety Group: GPT-5.5 Matches Mythos in Cyber Attack Simulations: The UK's AI safety testing group reports that GPT-5.5 completed a difficult corporate network attack simulation in 2 out of 10 attempts — roughly matching Anthropic's unreleased Mythos model. This is the first independent head-to-head comparison of frontier cyber capabilities from a government evaluator, and the convergence between models is the headline. Read more →

Qwen Open-Sources Interpretability Toolkit with Sparse Autoencoders: Qwen releases Qwen-Scope, adding Sparse Autoencoders to Qwen3.5-27B for mechanistic interpretability research. Open interpretability tooling on a frontier-class model dramatically lowers the barrier for researchers outside major labs to study what these models actually learn. (186 likes | 34 RTs) Read more →

🔧 TOOL

Claude Code v2.1.126: Project Purge and Gateway Model Picker: Claude Code ships claude project purge for full state cleanup, gateway /v1/models integration for the model picker, and a dangerously-skip-permissions overhaul. The project purge alone is worth the update if you've ever had stale state cause weird agent behavior. Read more →

Paperclip Now Searches Full Text Across All of arXiv and PubMed: Paperclip expands to full-text search across all of arXiv, PubMed Central, and 150M abstracts — making it the most comprehensive open research discovery tool available. If you're still cobbling together Google Scholar, Semantic Scholar, and arXiv search for literature reviews, this replaces the whole stack. (1,426 likes | 203 RTs) Read more →

AI CLI: Pipe Image, Video, and Text Generation in Your Terminal: A new CLI tool brings Unix-philosophy AI generation to your terminal — pipe image, video, and text generation together across hundreds of models, with multi-model comparison and inline previews. Built on AI SDK and AI Gateway, no native dependencies. (428 likes | 17 RTs) Read more →

📝 TECHNIQUE

One CLAUDE.md Config Cut Token Usage by 50% in a Week: A practitioner shares a measured, reproducible result: a specific CLAUDE.md task delegation config — routing Haiku for bulk work, Sonnet for research, and Opus only for deep thinking — cut token usage by 50% over a week. The insight is obvious once you see it: most agentic workflows spend the majority of tokens on tasks that don't need frontier-level reasoning. If you're running Claude Code daily and haven't set up model delegation, you're burning money. (337 likes | 29 RTs) Read more →

🏗️ BUILD

A Non-Technical PM Shipped a Complete App with Claude Code in Six Weeks: A project manager with no engineering background built and shipped a full stress management app using only Claude Code — from concept to App Store in six weeks. This isn't a toy demo; it's a concrete case study of agentic coding tools enabling entirely new builder profiles. The "who can build software" question just got a data point. Read more →

Gemma 4 Local Agentic Setup: Multimodal Agent with MCP Tool Discovery: Google publishes a hands-on notebook showing Gemma 4 in a local agentic setup using Haystack — multimodal map and weather agent, dynamic tool discovery via GitHub MCP server, and composable agent patterns. Clone it and run it today if you want to see what local agentic workflows look like without API calls. (208 likes | 19 RTs) Read more →

🎓 MODEL LITERACY

Model Distillation: Distillation is the process of training a smaller "student" model to mimic the outputs of a larger "teacher" model — the student learns not just the right answers but the teacher's probability distribution over all possible answers, capturing nuances that raw training data alone doesn't provide. It's why open-source models like Kimi 2.6 are suddenly competitive with closed frontier models: they can learn from the behavior of larger models without needing the same massive training budgets. Today's debate between HuggingFace's CEO and LeCun surfaces the core tension — virtually every major lab used distillation at some point during training, but restricting it now protects commercial margins at the cost of open-source progress. Understanding distillation is key to understanding why the open-vs-closed model gap is shrinking, and why closed labs are increasingly motivated to limit the technique through licensing restrictions.

⚡ QUICK LINKS

Code with Claude: Anthropic's developer conference returns next week — register before sessions fill up. (5,486 likes | 499 RTs) Link
Anthropic Bedrock SDK v0.29.1: Fixes a silent streaming failure where error events in chunk frames didn't throw — update if you run Claude on Bedrock. Link
Mollick on AI Strategy: Stop thinking about AI as individual productivity — organizations are already superhuman intelligences, and the real question is how AI changes organizational intelligence. (368 likes | 45 RTs) Link
GPT-5.5 Throws Itself a Party: May 5 at 5:55 PM — Codex will help pick attendees. Peak AI marketing. (5,733 likes | 358 RTs) Link

🎯 PICK OF THE DAY

The UK's head-to-head cyber evaluation is the real story today. Forget the benchmark leaderboards — when the UK's AI safety testing group runs GPT-5.5 through a corporate network attack simulation and finds it performs comparably to Anthropic's unreleased Mythos model, we're seeing the emergence of a new kind of capability assessment that matters far more than any public eval. Government red teams — not cherry-picked benchmarks — are becoming the real gatekeepers of frontier deployment. The convergence between GPT-5.5 and Mythos on cyber capabilities is particularly telling: it suggests that frontier models are hitting similar capability ceilings in adversarial domains, regardless of architecture or training approach. This is a "when, not if" problem for capability containment — and the fact that a government body is publishing comparative results across labs means the evaluation regime is maturing faster than most people realize. For anyone building on frontier models in regulated industries, the UK evaluation framework is the one to watch. Read more →

Until next time ✌️