OpenAI Model Disproves 80-Year-Old Math Conjecture Autonomously

🔬 RESEARCH

OpenAI Model Disproves 80-Year-Old Math Conjecture Autonomously

A general-purpose OpenAI model just solved the planar unit distance problem — a conjecture Paul Erdős posed in 1946 that no human mathematician could crack. This isn't "AI assists researcher" — the model autonomously produced an original proof in discrete geometry. The first time AI has achieved a mathematical discovery of this magnitude, and it signals that scaled reasoning may be the viable path to scientific breakthroughs, not domain-specific solvers. (11,586 likes | 1,546 RTs) Read more →

NVIDIA's Diffusion Language Models Generate All Tokens at Once: Nemotron-Labs-Diffusion takes a fundamentally different approach — instead of predicting one token at a time like every GPT-style model, it generates multiple tokens simultaneously using diffusion. If this architecture scales, it breaks the autoregressive bottleneck that currently makes inference expensive and sequential. (1,091 likes | 165 RTs) Read more →

NanoGPT-Bench Tests Whether Coding Agents Can Actually Do Research: IntologyAI releases an internal eval that tests coding agents on real AI R&D problems — not just code generation but experimental design, hypothesis testing, and iteration. First serious benchmark for agent-as-researcher capabilities, and early results suggest most agents are worse at research than at coding. (208 likes | 44 RTs) Read more →

🧠 LAUNCH

Google I/O 2026: 100 Announcements From Gemini Omni to Universal Cart

Google went full firehose — Gemini Omni, Gemini 3.5 Flash, Universal Cart, Antigravity, and 96 more products in a single keynote. The sheer breadth is the point: Google is betting that platform ubiquity beats any single model advantage. If you build on any Google service, block an hour to scan the list — there's almost certainly something that changes your integration story. Read more →

Cohere Ships Command A+ — MoE Model Built for Hardware-Constrained Enterprise: Cohere's new flagship uses mixture-of-experts architecture specifically optimized to run on minimal hardware. The play is clear: enterprises that can't afford GPU clusters still need frontier-tier models, and Cohere is betting that efficiency beats raw scale for that market. (1,150 likes | 194 RTs) Read more →

Alibaba's Qwen3.7-Max Claims Agent Frontier at Flash-Tier Pricing: The Qwen team drops an agent-optimized model claiming to beat Gemini Flash 3.5 at lower cost. The open-weight frontier keeps compressing the gap with closed models — and the pricing pressure is real. (593 likes | 236 RTs) Read more →

💡 INSIGHT

Spotify's Chief Architect Reveals 4,500 Claude-Powered Deployments Per Day

This is what production AI adoption actually looks like at scale. Spotify's engineering org pushing 4,500 deploys daily with Claude isn't a pilot program — it's the operating rhythm of a 600M-user platform. The architecture details shared at Code with Claude London show how Claude is embedded in their CI/CD pipeline, not just their IDE. (5,302 likes | 374 RTs) Read more →

Microsoft Engineers Publicly Build Agents With Claude, Not Copilot: A senior Microsoft AI developer presenting on stage about building agents with Claude — not GitHub Copilot, not GPT — is a signal you don't see every day. Multi-provider adoption is now public reality at the world's largest software company. (1,548 likes | 230 RTs) Read more →

Simon Willison's 12K-Like Thread Exposes Gemini's Product Fragmentation: A day after Google's 100-announcement firehose, Simon Willison's viral thread captures the developer experience gap — personal vs workspace accounts, AI Studio vs Cloud console, free tier vs paid. The confusion undermines whatever Google shipped. (11,935 likes | 1,226 RTs) Read more →

OpenAI Offers $2M in Tokens to Every YC Startup This Batch: The tokenmaxxing era is here. OpenAI is locking in the next generation of startups before they evaluate alternatives — $2M per company is enough to build an entire product on the platform before ever paying a bill. Smart distribution play. (1,881 likes | 117 RTs) Read more →

HuggingFace Co-Founder Maps How AI Is Restructuring Software Itself: Thomas Wolf's long-form analysis argues that AI isn't just changing how we write code — it's restructuring the economics of software companies, the shape of engineering teams, and the definition of technical moats. One of the more thoughtful industry pieces this week. (1,832 likes | 296 RTs) Read more →

🔧 TOOL

Anthropic's New Guide Takes Computer Use From Demo to Production

Computer Use has been impressive in demos — but actually shipping it means handling flaky UIs, recovery from mis-clicks, and timeout management. Anthropic's new production guide covers the reliability patterns that separate "works on my machine" from "handles 10K sessions daily." If you've been waiting to deploy Computer Use for real, this is the missing manual. (1,941 likes | 149 RTs) Read more →

Transformers v5.9.0 Adds Native Cohere Command A+ MoE Support: If you want to run Cohere's new MoE model locally, this is the release that enables it. Native support means no custom code paths — just load and run. (no engagement data) Read more →

Kapso MCP Gives Your AI Agent a WhatsApp Number: Add an MCP server, get a WhatsApp number for your agent. As agents need real-world communication channels beyond chat widgets, WhatsApp integration is the most obvious missing piece — 2B monthly active users is hard to ignore. (620 likes | 31 RTs) Read more →

Claude Code v2.1.144 Ships Background Session Resume and Mid-Chat Model Switching: /resume brings back background sessions, /model switches models mid-conversation without starting over. Key quality-of-life upgrades for power users running multi-hour agent sessions who got tired of losing context. Read more →

📝 TECHNIQUE

How Claude Cowork Runs a 4,000-Account Sales Book — The Full Playbook: An Anthropic sales leader shares the exact workflow for managing 4,000 accounts with Claude Cowork — from account prioritization to meeting prep to follow-up generation. This isn't hypothetical; it's a working system handling real revenue. First detailed playbook for revenue teams evaluating AI copilots. Read more →

Why Structural Backpressure Beats Smarter Agents in Coding Loops: The argument is counterintuitive but compelling — adding formal verification gates to your agent loop works better than upgrading to a smarter model. Constraints force the agent to produce correct code rather than plausible code. Practical patterns for anyone running coding agents in production. (97 likes | 23 RTs) Read more →

Anthropic's Playbook for Running Claude Code Across Hundreds of Engineers: Team configuration, permission management, cost control, and the org patterns that actually work when Claude Code goes from one developer's tool to a company-wide platform. Worth reading before your next rollout. (3,996 likes | 402 RTs) Read more →

🏗️ BUILD

Railway's CEO: 3M Users, 100K Signups/Week, and Why PRs Are Dying: Jake Cooper on Latent Space reveals that Railway is seeing $200K+ monthly in coding agent infrastructure spend alone. His thesis: agents don't do pull requests, they deploy directly — and the infrastructure layer needs to be rebuilt for that workflow. If you're building deployment tooling, this is required listening. Read more →

🎓 MODEL LITERACY

Diffusion Language Models: Every major LLM you use today is autoregressive — it generates text one token at a time, left to right, each token dependent on all previous ones. Diffusion language models flip this entirely: they start with noise across the full sequence and iteratively refine all positions simultaneously, like a sculptor revealing a statue from a block of marble. NVIDIA's Nemotron diffusion model, released today, shows this isn't just theory — it's shipping. The practical implication: autoregressive generation is inherently sequential (token 50 must wait for tokens 1-49), but diffusion can parallelize across the whole sequence. If it scales, it could dramatically reduce latency and cost for long outputs. The trade-off is that current diffusion LMs still trail autoregressive models on quality benchmarks — but the architectural ceiling is higher.

⚡ QUICK LINKS

Mollick: From 'Strawberry' to Solved Conjectures in Two Years: The pace-of-progress framing that matters more than any single result. (621 likes | 101 RTs) Link
BBC: Adversarial Prompt Injection Is Now an SEO Problem: Google's AI search results are being manipulated — and they're fighting back quietly. (244 likes | 171 RTs) Link
Marlin-2B: A 2B-Parameter Video Understanding Model You Can Run Locally: Small enough for edge, capable enough for production video-to-text. (141 likes | 125 downloads) Link
Anthropic Brings Philosophers and Ethicists Into AI Character Formation: Treating alignment as a humanities question, not just a technical one. (235 likes | 33 RTs) Link

🎯 PICK OF THE DAY

A general-purpose model just disproved an 80-year-old math conjecture — and that changes everything about how we fund research. OpenAI's model didn't just assist a mathematician or check a proof — it autonomously produced an original result on the planar unit distance problem, a conjecture Erdős posed in 1946 that the entire field of discrete geometry couldn't crack. The crucial detail: this wasn't a math-specific model or a domain-tuned solver. It was a general-purpose reasoning system pointed at a hard problem. That distinction matters enormously. If scaled general reasoning — not narrow expert systems — is what produces scientific breakthroughs, it restructures how we should think about funding and organizing research itself. You don't need a protein-folding model and a math model and a chemistry model. You need reasoning that scales. Two years ago these models couldn't count the letters in "strawberry." Now they're generating proofs that extend human mathematical knowledge. The pace isn't slowing down. Read more →

Until next time ✌️