Mistral Small 4: 128 Experts, 256K Context, Configurable Reasoning

🧠 LAUNCH

Mistral Small 4: 128 Experts, 256K Context, Configurable Reasoning.

Mistral just shipped its new MoE flagship — 128 experts, 119B total parameters, 256K context window, and configurable reasoning that lets you dial between speed and depth per query. The expert count alone is wild: most MoE models top out at 8-16 experts, so 128 is a fundamentally different architecture bet. With open weights and a context window that rivals frontier proprietary models, this is the most serious open-weight contender for agentic workflows since Mixtral. Download the weights and benchmark against your current stack. (2,576 likes | 324 RTs) Read more →

Google AI Studio Goes Full-Stack Vibe Coding.

Google AI Studio now supports full-stack multiplayer app building — complete with an Antigravity agent and Firebase backends. You can build real-time games and tools entirely from prompts, with the agent handling both frontend and infrastructure. This isn't a demo; it's Google's play to make AI Studio the default environment for prompt-to-production workflows. (2,116 likes | 226 RTs) Read more →

Google Stitch graduates from Labs into a full AI design canvas — multimodal input (text, images, code), a context-aware design agent, and production-ready front-end code output. Google's closing the design-to-code gap by letting designers hand off directly to code without a Figma-to-dev handoff step. (1,882 likes | 208 RTs) Read more →

Chandra OCR 2 takes the open-source OCR crown with 85.9% on the olmOCR benchmark, beating the previous SOTA. If your document parsing pipeline still relies on proprietary APIs, the open alternative just got harder to ignore. (269 likes | 30 RTs) Read more →

MiniMax 2.7 matches GLM-5 performance at one-third the cost — a new SOTA for cost-efficient open models. Chinese labs continue closing the frontier gap at a fraction of the price, and the cost curve shows no signs of flattening. Read more →

🔧 TOOL

Dispatch Bridges Your Phone to Claude Code Sessions.

Dispatch now launches Claude Code sessions directly — ask it to build something from your phone, come back to a working project on your desktop. This closes the loop between mobile AI assistant and full coding agent: you can sketch out an idea on your commute and have working code waiting when you sit down. The persistent session bridging is the real unlock here. (2,238 likes | 118 RTs) Read more →

DESIGN.md is a new portable, agent-readable design system standard — your coding agent reads your design system while building. With an MCP server connecting to Claude Code, Cursor, and Gemini CLI, this is the interop story that makes multi-agent design-to-code actually work. Add one to your project. (1,743 likes | 107 RTs) Read more →

ElevenLabs MCP server has 11K users generating speech, sound effects, and music directly inside Claude. MCP adoption is real, and media generation is becoming a first-class agent capability — not a separate workflow. (297 likes | 29 RTs) Read more →

📝 TECHNIQUE

Qwen 3.5 397B Running on a Mac at 5.7 Tokens/Sec.

A 209GB MoE model running on an M3 Mac by quantizing and streaming weights from SSD at 17GB/s. Qwen 3.5 397B only needs 5.5GB of active memory because MoE's sparse activation means you load just the active expert subset — the rest stays on disk. This fundamentally redefines what "runs locally" means: model size no longer equals memory required. Try the SSD-streaming approach with your own MoE setups. (1,578 likes | 150 RTs) Read more →

Intercom's Claude Code setup is the most detailed enterprise agent customization case study yet — 13 plugins, 100+ skills, and hooks for deterministic guardrails. The pattern: use hooks to enforce coding standards automatically, skills for domain-specific workflows, and plugins for tool integrations. Study this if you're scaling coding agents across teams. (1,813 likes | 111 RTs) Read more →

Local coding agents are now practical with small models. The latest generation of small models is capable enough to run coding agents locally — privacy-preserving, zero-cost agent workflows for standard development tasks. If you're already using Claude Code or Codex, running agents locally is the next step for tasks that don't need frontier reasoning. (1,149 likes | 128 RTs) Read more →

🔬 RESEARCH

LeCun proposes a cognitive science-inspired AI architecture: LeCun, Dupoux, and Malik lay out an alternative to pure scaling — a biologically-grounded learning approach that draws from cognitive science. Whether or not it pans out, it's the most concrete "there's another path" proposal from a Turing Award winner this year. (520 likes | 98 RTs) Read more →

NVIDIA's SPEED-Bench finally standardizes speculative decoding evaluation — the technique that makes LLM inference 2-3x faster by drafting with a small model and verifying with a large one. Until now, every paper used different benchmarks, making comparisons meaningless. This fixes that. Read more →

Neural cellular automata as a pre-pretraining step for LLMs: A novel approach that uses NCA to bootstrap language model training. If the results hold, this could reduce compute costs during the expensive early phase of pretraining — the phase where most tokens are "wasted" on basic pattern learning. (82 likes | 16 RTs) Read more →

💡 INSIGHT

Astral (ruff, uv) Is Joining OpenAI.

Astral — the team behind ruff and uv, the fastest Python linter and package manager in the ecosystem — is being acquired by OpenAI. This isn't a model play; it's an infrastructure play. OpenAI is betting that owning the developer toolchain creates stickier lock-in than any API contract. The Python ecosystem should be watching closely for roadmap changes. (804 likes | 496 RTs) Read more →

AI slop PRs are making major open-source repos unusable. HuggingFace's CEO reports a new AI-generated PR landing every couple of minutes on their biggest repos — most of them useless, all of them consuming maintainer time. The cost of zero-friction AI code generation is being externalized onto the people who maintain the code we all depend on. (1,170 likes | 98 RTs) Read more →

Maintainers are prompt-injecting their own CONTRIBUTING.md to detect and block AI-generated PRs. It's a creative defense — and a sign of how desperate the slop problem has become. When your contribution guidelines need adversarial prompt engineering, something has gone very wrong. (43 likes | 16 RTs) Read more →

🏗️ BUILD

NVIDIA Nemotron 3 Nano is a 4B hybrid model optimized for on-device deployment. At this size it runs on phones and edge hardware — practical for anyone building local-first AI features without cloud dependency. If you're shipping AI to constrained devices, this is your starting point. Read more →

🎓 MODEL LITERACY

Mixture of Experts (MoE) — Active vs. Total Parameters: Mistral Small 4 has 119B total parameters but activates only a fraction per token via its 128 experts. Each input token gets routed to a small subset of "expert" sub-networks, while the rest sit idle. This is also why Qwen 3.5 397B runs on a Mac with only 5.5GB of active memory — MoE's sparse activation means "model size" no longer equals "memory required" or "compute per token." When you see a 397B-parameter MoE model, it's not comparable to a 397B dense model — the effective compute is far lower. Understanding this distinction is critical for evaluating any MoE benchmark claim: a bigger number doesn't necessarily mean a smarter or more expensive model.

⚡ QUICK LINKS

Code with Claude: Anthropic's developer conference goes global — SF, London, and Tokyo this spring. (7,556 likes | 851 RTs) Link
Gemini 3.0 stumbles: Most users still stuck on 2.5 as competitors ship rapidly. (824 likes | 45 RTs) Link
Scaling Autoresearch: What happens when you give Karpathy's research agent a GPU cluster. (18 likes) Link
AlphaFold Database: DeepMind's proof case for AI-accelerated science — now foundational infrastructure for biology worldwide. (1,724 likes | 275 RTs) Link

🎯 PICK OF THE DAY

OpenAI buying Astral isn't about ruff or uv — it's about owning the developer surface area. When your toolchain vendor is also your model vendor, the lock-in runs deeper than any API contract. Astral built the fastest Python linter and package manager in the ecosystem — tools that millions of developers run dozens of times per day. That's not just distribution; it's habit. OpenAI already has ChatGPT for chat, Codex for coding agents, and now the tools that set up every Python project. The playbook is clear: surround developers with OpenAI-owned touchpoints at every stage of the workflow, from uv init to deployment. For the Python community, the key question is whether ruff and uv stay genuinely open or start accumulating OpenAI-specific integrations. For everyone else, it's a reminder that the AI wars aren't just about model benchmarks anymore — they're about who owns the pipes. Read more →

Until next time ✌️