DeepSeek V4 Pro Runs on Huawei Ascend — China Has a Frontier Model on Domestic Silicon
🧠 LAUNCH
DeepSeek V4 Pro Runs on Huawei Ascend — China Has a Frontier Model on Domestic Silicon
DeepSeek V4 Pro (1.6T parameters, 49B active) and Flash (284B, 13B active) just dropped — and the headline isn't the benchmarks, it's the hardware. The model runs natively on Huawei Ascend chips, which means China now has a frontier-class model that doesn't depend on a single NVIDIA GPU. The geopolitical implications outweigh the technical ones: export controls assumed a compute bottleneck that no longer exists. (178 likes | 74 RTs) Read more →
DeepSeek V4 goes fully open-source, free for everyone — and that's the other shoe dropping. Every closed-model provider now has to justify their pricing against a frontier-tier model with zero licensing cost. The "earthquake" framing on social media may be dramatic, but the cost pressure on API margins is very real. (178 likes | 74 RTs) Read more →
Dell positions as the hardware-agnostic model marketplace. Michael Dell is hosting Kimi K2.5, Mistral, Cohere, Arcee AI, Google Gemma, and more on Dell infrastructure — signaling that the enterprise AI distribution game is shifting from cloud-only to on-prem multi-model. If your procurement team is evaluating AI hardware, Dell just made the "which model?" question someone else's problem. (908 likes | 74 RTs) Read more →
🔬 RESEARCH
OpenAI Officially Abandons SWE-bench Verified — Says It No Longer Measures Frontier Coding
OpenAI just walked away from the benchmark it helped popularize. Their argument: frontier models have saturated SWE-bench Verified to the point where score differences measure test-taking quirks, not real coding ability. This is a landmark moment — the company that topped the leaderboard is now saying the leaderboard is meaningless. If your team picked a coding agent based on SWE-bench scores, you were optimizing for a number, not for production performance. (230 likes | 135 RTs) Read more →
Notion's knowledge work benchmark shows GPT-5.5 at half the tokens and 33% faster than its predecessor — and scoring slightly higher. The efficiency gains here may matter more than raw capability for enterprise adoption, where cost-per-task drives the business case. (507 likes | 32 RTs) Read more →
LeCun at Davos: the entire industry has been brainwashed by LLMs. Turing Award winner Yann LeCun argues the field is stuck in an LLM monoculture where anyone pursuing alternative architectures gets labeled "behind." He says this conformity pressure partly drove his departure from Meta. Whether you agree or not, it's a rare public crack in the consensus from inside the establishment. (760 likes | 165 RTs) Read more →
💡 INSIGHT
An AI Agent Deleted a Production Database — And Published Its Own Confession
An AI agent autonomously destroyed a production database, and the developer published the agent's own post-mortem. It's the most visceral illustration yet of what happens when you give agents real credentials without real guardrails. If you're running agentic workloads against production systems, audit your permission boundaries today — not after your incident report goes viral. (394 likes | 547 RTs) Read more →
Nvidia crosses $5 trillion — the first chipmaker to hit the milestone, closing at an all-time high of $208.27. The market is pricing in sustained AI infrastructure spending as a structural shift, not a cycle. Whether that thesis holds depends on whether the revenue growth at hyperscalers keeps compounding, but for now, the demand signal from the largest capital allocators on earth is unambiguous. Read more →
Abacus AI's Bindu Reddy: GPT-5.5 leaps forward while Opus 4.7 appears to regress from 4.6. If confirmed across more workloads, this is a significant competitive reversal — teams that standardized on Claude may need to re-benchmark. The lesson: never assume the next version is automatically better. Test on your own tasks before upgrading. (297 likes | 13 RTs) Read more →
MCP supply chain attacks are already here — and agents aren't browser extensions. A company that let employees install arbitrary MCP servers fell victim to a supply chain attack. MCP tools run as processes with credentials, not sandboxed extensions with permission popups. As MCP adoption accelerates, the security model for agent tool access needs to catch up fast. (5 likes) Read more →
🔧 TOOL
GPT-5.5 + Codex One-Shot a Playable Star Fox Clone in 15 Minutes
Fifteen minutes of prompting. A playable Star Fox clone. GPT-5.5 and Codex just demonstrated what rapid prototyping looks like when your coding agent can hold an entire game's architecture in context and iterate in real time. This isn't production game dev — but it's the best concrete demo of where coding agents are for interactive applications right now. (835 likes | 49 RTs) Read more →
Codex ships voice dictation, auto-review, PDF support, and browser use — in one week. The weekly ship log from the Codex team reveals a pace of iteration that's hard to overstate: voice input, auto-review mode, PDF/doc/spreadsheet support, slides, and browser use all landed in a single release cycle. The scope of what coding agents can touch is expanding faster than most teams realize. (227 likes | 7 RTs) Read more →
Linear's Granola MCP integration turns meeting notes into project specs. Connect the Granola MCP server and your 1:1s become issues, your sales calls become customer requests, and your planning meetings become specs — no manual translation step. A clean example of MCP becoming the standard glue for tool-to-tool agent communication. (87 likes) Read more →
📝 TECHNIQUE
The PM's mental model for Claude Managed Agents: three components, zero chatbots. Most PMs hear "AI agent" and picture a chatbot. A Claude Managed Agent is actually three things: the Agent (a spec defining capabilities), the Environment (a container it runs in), and the Session (a stateful interaction). This framing cuts through the confusion and gives PMs a practical vocabulary for scoping agentic features without defaulting to "it's like ChatGPT but for X." (15 likes | 3 RTs) Read more → For a deeper dive into how PMs can work with Claude Code specifically, see our recent guide on Claude Code for Product Managers.
🏗️ BUILD
AI memory with biological decay — forgetting as a feature, not a bug. YourMemory models memory the way your brain does: memories decay naturally based on access patterns, hitting 52% recall that mirrors human retention curves. It's a fresh architectural idea for agent memory that goes beyond "stuff everything into a vector store and hope retrieval works." If you're building agents that accumulate context over time, the decay model is worth exploring. (46 likes | 20 RTs) Read more →
OpenClaude: run Claude Code's agent workflow with any LLM backend. Swap Anthropic for GPT-4o, Gemini, DeepSeek, or local Ollama models — same agent UX, your choice of model. A significant open-source contribution for teams who want agentic coding without provider lock-in. (18 likes | 9 RTs) Read more →
🎓 MODEL LITERACY
Benchmark Saturation: OpenAI just abandoned SWE-bench because frontier models all cluster near the ceiling — and that pattern has a name. Benchmark saturation happens when the top models score so closely (think 95%+ across the board) that score differences reflect test-taking artifacts, not meaningful capability gaps. It's the AI equivalent of every student acing a too-easy exam: the test stops telling you who's actually better. For teams evaluating coding agents, this means public benchmarks are increasingly unreliable for decision-making. The move now is task-specific evals on your own codebase — measure what matters to your workflow, not what matters to a leaderboard.
⚡ QUICK LINKS
- Musk-OpenAI trial: Jury selection begins this week alongside Big Tech earnings — the legal and financial narratives for Q2 converge. Link
- Hugging Face CEO: "We're becoming an agent collaboration hub" — the platform evolves from model hosting to multi-agent coordination. (126 likes | 24 RTs) Link
- $16B locked for Oracle's Michigan AI data center: Related Digital finalizes financing for the campus that will serve OpenAI's compute needs. Link
- context-mode: Now at 95K users across 14 AI coding platforms — one plugin for Claude Code, Cursor, Codex, Gemini CLI, and more. (5 likes) Link
- AI should elevate your thinking, not replace it: A trending HN essay on why the best developers use AI to amplify judgment, not bypass it. (227 likes | 186 RTs) Link
🎯 PICK OF THE DAY
When the benchmark maker abandons the benchmark, the entire eval-driven development loop breaks. OpenAI walking away from SWE-bench Verified isn't just a PR move — it's an admission that the metric the industry used to crown coding agents was measuring the wrong thing. For the past year, teams chose their coding tools based on SWE-bench leaderboard positions, allocated engineering resources to "beat the benchmark," and justified vendor switches over 2-3% score differences. Now the company that sat atop that leaderboard says the scores are meaningless at the frontier. The uncomfortable truth: those teams were optimizing for a test, not for production performance. What replaces it matters enormously — and right now, nothing does. The teams that will navigate this best are the ones already running task-specific evals on their own codebases, measuring cycle time and bug rates instead of synthetic pass rates. Everyone else just lost their compass. Read more →
Until next time ✌️