Anthropic Drops Opus 4.7 — Agentic Coding Gets a Flagship Upgrade

🧠 LAUNCH

Anthropic Drops Opus 4.7 — Agentic Coding Gets a Flagship Upgrade

Claude Opus 4.7 lands as Anthropic's new flagship, and the numbers are real: Cursor's internal bench jumped from 58% to 70%, Notion saw a 14% eval lift with fewer tool calls. The story here isn't just "better model" — it's a model that's architecturally better at running long, multi-step agent workflows without losing the thread. If you're building anything agentic on Claude, the upgrade path is immediate: say migrate to Opus 4.7 in Claude Code and it handles model names, prompts, and effort settings automatically. (70,230 likes | 9,004 RTs) Read more →

OpenAI's GPT Rosalind Is a Frontier Model Built for Biology

GPT Rosalind is OpenAI's first domain-specific frontier model, and it's aimed squarely at pharma: drug discovery, protein analysis, and translational medicine. Named after Rosalind Franklin, it signals that OpenAI sees vertical specialization — not just general-purpose scaling — as the next revenue play. If you work in bio/chem, read the technical report; if you don't, watch how this changes the "one model to rule them all" narrative. (7,737 likes | 695 RTs) Read more →

Qwen3.6-35B-A3B: An Open-Source Laptop Model That Beats the Flagship

Qwen3.6-35B-A3B shouldn't exist. A sparse MoE model with 35B total parameters but only 3B active, it runs on a MacBook, ships under Apache 2.0, and already beats Opus 4.7 on Simon Willison's pelican benchmark. Alibaba just made "frontier-tier agentic coding on consumer hardware" a real sentence. Pull the GGUF via Ollama and see for yourself. (8,766 likes | 1,264 RTs) Read more →

Tencent Open-Sources HY-World 2.0 for Interactive 3D Environments: HY-World 2.0 generates, reconstructs, and simulates interactive 3D environments from multimodal inputs. A serious open-weight entry in the world-model race — game devs and simulation researchers can run it today. (1,922 likes | 324 RTs) Read more →

Boston Dynamics' Spot Gets a Gemini Brain: Spot now uses Gemini Robotics for embodied reasoning — understanding surroundings, identifying objects, and following natural-language commands. One of the most visible demonstrations of foundation models in physical robotics, from two industry leaders who clearly rehearsed the demo. (1,172 likes | 207 RTs) Read more →

Google Puts AI Mode Inside Chrome for 3 Billion Users: AI Mode is now embedded directly in Chrome, turning the browser from search-and-click into a conversational web agent. With Chrome's 3B+ installed base, this is the largest-scale AI browsing deployment to date — and the clearest sign that the address bar's days are numbered. Read more →

🔧 TOOL

Codex Expands Way Beyond Code — Computer Use, Image Gen, 90+ Plugins

Codex just outgrew its name. The update adds computer use on Mac (click and type across apps), an in-app browser, gpt-image-1.5 generation, and 90+ new plugins spanning JIRA, CircleCI, GitLab, and the Microsoft Suite. Sam Altman says computer use has been "even more useful than expected." The real play: Codex is becoming an all-purpose agent runtime, not a coding assistant. (Compare Codex vs Claude Code →) (5,919 likes | 282 RTs) Read more →

One-Command Migration to Opus 4.7 in Claude Code: The new @ClaudeDevs account launches with a practical gift — say migrate to Opus 4.7 in Claude Code and it updates model names, prompts, and effort settings automatically. Removes the friction from the biggest model upgrade of the year. (1,402 likes | 65 RTs) Read more →

Cloudflare Sandbox SDK Gives AI Agents a Secure Runtime: Cloudflare Sandbox SDK integrates with the OpenAI Agents SDK so AI agents can run code in secure, isolated environments. The infrastructure story is finally catching up to the agent story — production sandboxing is now a one-SDK integration, not a DevOps project. (295 likes | 41 RTs) Read more →

📝 TECHNIQUE

Boris Cherny's Field Notes on Getting the Most From Opus 4.7: The Claude Code creator (388K followers) shares hard-won tips after days of testing: Opus 4.7's prompting patterns are fundamentally different from 4.6, and the agentic features require unlearning old habits. Key insight — what worked before may actively hurt you now. (3,753 likes | 318 RTs) Read more →

The Missing Bridge: Porting HuggingFace Models to Apple MLX: HuggingFace publishes a step-by-step guide for porting Transformers models to Apple's MLX framework. With Apple Silicon dominating the local-inference hardware story and models like Qwen3.6 proving laptop-scale is real, this is the bridge the ecosystem has been waiting for. Read more →

🔬 RESEARCH

UK AI Safety Institute Confirms: Claude Mythos First to Clear a Cyber Benchmark: The AISI independently validated that Claude Mythos is the first model to complete an AISI cyber benchmark task. This is concrete third-party confirmation of the capabilities Anthropic previewed with cyber defenders last week — not marketing, but independent evaluation. (2,961 likes | 538 RTs) Read more →

The Redis Creator Challenges the 'Cybersecurity as Proof of Work' Thesis: antirez pushes back on Simon Willison's argument that cybersecurity functions as proof of work for AI systems, offering a technical counter-argument. A substantive voice in a debate that matters: does AI fundamentally change the attacker-defender asymmetry, or is that wishful thinking? (193 likes | 78 RTs) Read more →

Simon Willison's Pelican Test: A 21GB Local Model Outdraws Opus 4.7: Simon Willison ran Qwen3.6 on his laptop and got a better pelican than Opus 4.7. Beyond the meme: it's a concrete data point that open-weight MoE models are closing the gap on frontier APIs — on consumer hardware, with zero API costs. The implications for deployment economics are hard to ignore. (269 likes | 61 RTs) Read more →

💡 INSIGHT

Mollick Flags Opus 4.7's Adaptive Thinking as a UX Problem: Ethan Mollick reports that Opus 4.7's automatic effort routing regularly downgrades non-math/code queries as "low effort," producing worse results. Unlike ChatGPT, there's no manual override. On launch day, that's a significant friction point for anyone using Claude outside of coding. (632 likes | 28 RTs) Read more →

Latent Space Declares the Pull Request Dead: Latent Space argues the PR — the oldest ritual in software collaboration — is dying as AI coding tools shift from "review my diff" to "run the whole task." Provocative, but well-argued: when agents write entire features end-to-end, who exactly is reviewing whom? (Read more on how Codex compares to ChatGPT →) Read more →

🏗️ BUILD

CodeBurn: See Where Your Claude Code Tokens Actually Go: With Opus 4.7's 1M context window and adaptive thinking, token spend is harder to predict than ever. CodeBurn gives developers visibility into exactly where their Claude Code tokens go — broken down by task, file, and tool call. Essential cost management as agentic workloads scale. (69 likes | 14 RTs) Read more →

🎓 MODEL LITERACY

Mixture of Experts (MoE) — Sparse Activation: Today's biggest surprise — a laptop model beating a flagship API — makes no sense until you understand MoE. Traditional "dense" models activate every parameter for every token, so a 35B model needs 35B parameters' worth of compute. MoE models like Qwen3.6-35B-A3B split the network into specialized "expert" sub-networks and route each token through only a handful of them — in this case, 3B active out of 35B total. The result: you get the knowledge capacity of a large model with the inference cost of a small one. This is why "model size" no longer means what it used to — and why a MacBook can now run what used to require a data center.

⚡ QUICK LINKS

Gemini Nano Banana 2: Personalized image gen from your own photos, running on-device via Nano. Link
Android CLI: Google ships an agent-agnostic CLI for building Android apps 3x faster. (91 likes | 24 RTs) Link
NVIDIA Lyra 2.0: New open model from NVIDIA, trending on HuggingFace. (122 likes) Link
NucleusAI Nucleus-Image: Text-to-image model with strong early adoption, fragmenting the open generation space. (134 likes | 464 downloads) Link
Gas Town API Credit Controversy: GitHub issue alleges a popular dev tool silently routes user API credits for self-improvement — audit your tools. (193 likes | 92 RTs) Link

🎯 PICK OF THE DAY

A laptop model just beat the world's newest flagship API. Qwen3.6-35B-A3B — 35 billion parameters, 3 billion active, Apache 2.0, runs on a MacBook — outperformed Opus 4.7 on Simon Willison's viral pelican benchmark the same day Opus 4.7 launched. Dismiss it as a party trick if you want, but the signal is unmistakable: the moat around frontier APIs is eroding from below. When a 21GB local model matches a cloud flagship on creative and agentic tasks, the pricing, deployment, and control implications ripple through the entire industry. Enterprises paying per-token for API calls now have an open-source alternative that runs on hardware they already own, with zero data leaving their network. This doesn't mean frontier APIs are dead — they'll keep winning on the hardest reasoning tasks — but the floor of "good enough" just rose dramatically. The companies that will actually ship AI products in 2026 won't necessarily be the ones with the biggest API budgets. They'll be the ones who figured out which tasks need the frontier and which ones a laptop handles just fine. Read more →

Until next time ✌️