NewsletterBlogLearnCompareTopicsGlossary
RESEARCHINSIGHTLAUNCHTOOLTECHNIQUEBUILD

22 items covered

A 30B Model With 3B Active Parameters Just Won Olympiad Gold in Physics and Math

🧠 LAUNCH

A 30B Model With 3B Active Parameters Just Won Olympiad Gold in Physics and Math.

A new Mixture-of-Experts reasoning model achieves gold-medal performance across both physics and math Olympiad evaluations β€” with only 3B of its 30B parameters active per forward pass. This isn't brute-force scaling; it's routing efficiency proving that you can match frontier reasoning at a fraction of the inference cost. The MoE efficiency thesis just got its most dramatic validation yet. Download it and benchmark against your own tasks β€” the compute savings are real. (1,164 likes | 133 RTs) Read more β†’

NVIDIA Drops SANA-WM: Open-Source World Model for 1-Minute 720p Video. A 2.6B parameter world model that generates coherent 1-minute 720p videos β€” open-source and runnable on consumer hardware. NVIDIA is making accessible video generation serious rather than toy-demo quality. (284 likes | 118 RTs) Read more β†’


πŸ’‘ INSIGHT

OpenAI Found and Fixed a 48-Hour Capability Regression in GPT-5.5 Codex.

OpenAI publicly acknowledged that GPT-5.5 powering Codex degraded for roughly 48 hours before they identified and patched two separate issues. This is rare transparency on production model stability β€” and a stark reminder that frontier models aren't static artifacts. If your Codex outputs felt worse this week, they literally were. The fix is live; check your recent completions against the same prompts. (7,467 likes | 494 RTs) Read more β†’

Frontier AI Has Broken the Open CTF Format β€” and the Scene Knows It.

Capture the Flag competitions β€” the training ground that produced a generation of security researchers β€” are dying. Teams now spend more time prompting models than actually hacking. The write-up from an active CTF competitor is blunt: frontier AI doesn't just help competitors, it fundamentally removes the skill expression that made competitions meaningful. This isn't just a security niche problem β€” it's the first competitive domain to fully break under AI capability pressure. (329 likes | 308 RTs) Read more β†’

Gemini Pro Rumored at GPT-5.5 Coding Level β€” at Half the Price. Google's unreleased Gemini Pro is reportedly matching GPT-5.5 on coding benchmarks at $12/1M output tokens β€” more than 50% cheaper than OpenAI's pricing. If true, Google's price-performance squeeze continues to compress margins across the entire frontier tier. (951 likes | 28 RTs) Read more β†’

Ex-xAI Cofounder Raising $1B at $5B Valuation for River AI β€” Before Building Anything. Igor Babuschkin, formerly of xAI, is raising $1B with General Catalyst leading at a $5B valuation β€” with no product, no paper, no demo. The talent premium in AI has decoupled entirely from output. This is venture capital pricing pure optionality on pedigree. (209 likes | 20 RTs) Read more β†’


πŸ”¬ RESEARCH

PrimeIntellect Lets Claude Code and Codex Run Autonomously on AI Research Tasks.

PrimeIntellect ran Claude Code (Opus 4.7) and Codex (GPT-5.5) autonomously on actual AI research workflows β€” not code completion, not summarization, but end-to-end scientific investigation. The results show frontier agents can now handle research tasks that previously required PhD-level human oversight. This is the clearest signal yet that "AI researcher" is becoming a job description for models, not just people. (1,694 likes | 152 RTs) Read more β†’

Anthropic's Mythos Found 250 Security Vulnerabilities Where Prior Models Found 22. Anthropic's CFO revealed that their unreleased Mythos model discovered 250 security vulnerabilities on the same benchmark where previous frontier models found 22 β€” an 11x multiplier. That gap is precisely why Anthropic hasn't released it yet. The number explains the caution. (82 likes | 17 RTs) Read more β†’

Energy-Based Models Are Back β€” LeCun's Structural Verification Thesis Gets Traction. LeCun has argued for years that AI systems need to verify structural consistency before generating outputs. Energy-Based Models β€” which score plausibility rather than directly generating tokens β€” are now getting practical implementations that prove the thesis. The autoregressive paradigm isn't the only path forward. (212 likes | 37 RTs) Read more β†’

The Second Scaling Law Remains Undefeated: More Thinking Tokens, Better Results, No Plateau. Ethan Mollick confirms what inference-time compute advocates have been arguing: adding thinking tokens consistently improves performance across hacking, math, science, and puzzle-solving with no diminishing returns yet observed. If you're budgeting inference costs, the question isn't whether to spend on thinking tokens β€” it's how much. (282 likes | 23 RTs) Read more β†’


πŸ”§ TOOL

Codex Can Now Daisy-Chain Multiple Computers Through ChatGPT. Codex isn't limited to a single machine anymore β€” you can connect multiple computers and orchestrate across them from one ChatGPT session. This turns Codex from a coding agent into a multi-machine control plane. If you manage multiple dev environments, this is the workflow change you didn't know you needed. (515 likes | 71 RTs) Read more β†’

HomeClaw Ships CLI + MCP + OpenClaw Plugin for Apple Home Automation. Smart home control just went fully agent-native. HomeClaw exposes your entire Apple Home setup through CLI, MCP protocol, and an OpenClaw plugin β€” meaning any coding agent can now manage your lights, scenes, and devices through natural language. (197 likes | 19 RTs) Read more β†’

Open Code + Qwen 3.6 Plus: A Completely Free Coding Agent Stack. No subscription, no credit card, no API key costs β€” Open Code paired with Qwen 3.6 Plus during its free preview creates a fully functional coding agent setup at zero cost. If you've been waiting to try agentic coding without financial commitment, this is your window. (26 likes | 5 RTs) Read more β†’


πŸ“ TECHNIQUE

DeepSeek-V4-Flash Makes Steering Vectors Practical Again. Steering vectors β€” activation-space vectors you add at inference time to modify model behavior without fine-tuning β€” were theoretically elegant but impractical on most architectures. DeepSeek-V4-Flash's design makes them work reliably, reopening a whole class of lightweight behavior modification that doesn't require training runs. (199 likes | 67 RTs) Read more β†’

Anthropic Drops a 2-Hour Masterclass on Building Claude Agents. Taught by the engineer behind Claude Code, this covers terminal access, memory systems, hooks, and hallucination mitigation β€” the full stack for building autonomous agents that actually work in production. Two hours of concentrated architecture decisions from someone who shipped it. (118 likes | 16 RTs) Read more β†’


πŸ—οΈ BUILD

Multica: Open-Source Platform That Turns Coding Agents Into Managed Teammates. Assign tasks, track progress, compound skills β€” Multica treats coding agents like team members with a manager, not isolated tool calls. The open-source managed agent platform has exploded in popularity for teams that want orchestration without building it from scratch. (28,848 likes | 3,494 RTs) Read more β†’

Zerostack Hits v1.0: A Unix-Philosophy Coding Agent in Pure Rust. Every operation is a composable pipeline, not a monolithic session. Zerostack applies Unix design principles β€” do one thing well, pipe outputs to inputs β€” to coding agent architecture. Pure Rust, v1.0 stable, and fast enough to feel native. cargo install zerostack and try it. (78 likes | 23 RTs) Read more β†’

Building a Personal AI Agent: Motivation, Architecture, and What It Actually Means. A practitioner walks through why they built a personal agent, the architecture choices that mattered, and the philosophical implications of delegating daily decisions to a model. Less tutorial, more honest reflection on what personal agents are and aren't good at yet. (1,157 likes | 191 RTs) Read more β†’


πŸŽ“ MODEL LITERACY

Mixture of Experts (MoE) and Active Parameters: Today's 30B-A3B reasoning model hits Olympiad gold using only 3B of its 30B parameters per forward pass. MoE architectures route each input to a small subset of specialized "expert" sub-networks rather than activating every parameter for every token. This slashes inference cost while maintaining the model's total knowledge capacity. It's why a model can be "30B" in stored knowledge but "3B" in compute per query β€” and why the parameter count on the label is increasingly misleading. When comparing models, ask about active parameters, not total parameters.


⚑ QUICK LINKS

  • Cerebras IPOs at $60B: AI chips get a public comparator beyond NVIDIA β€” wafer-scale meets market scrutiny. Link
  • Claude API Grey Markets: Oxford researchers find Chinese resellers offering Claude access at 10% of list price. (65 likes) Link
  • Warelay β†’ OpenClaw: Simon Willison covers the rebrand of the AI assistant platform. Link
  • Microsoft's Free AI Agents Course: 15 lessons covering agentic RAG, multi-agent, MCP, and A2A β€” with code and video. (541 likes | 94 RTs) Link

🎯 PICK OF THE DAY

The death of CTF isn't a security story β€” it's a preview of every competitive skill domain. When frontier AI models can solve Capture the Flag challenges faster than human teams can read the problem statement, the competition isn't testing human skill anymore β€” it's testing prompt engineering speed. But the deeper signal is existential: CTF built an entire professional community around the identity of being the person who could break things that others couldn't. That identity is dissolving. The communities that grew up around competitive hacking, competitive math, competitive programming β€” they all face the same fork. Gatekeep AI out and become irrelevant niche hobbies, or reinvent the format around human-AI collaboration and lose the thing that made them culturally distinctive. CTF is just the first domino because security tasks are perfectly structured for AI iteration loops. Every domain where "practice makes perfect" meets "AI iterates faster than you practice" is next. Read more β†’


Until next time ✌️