Google Unleashes Gemma 4 — Its Most Capable Open Models Yet
🧠 LAUNCH
Google Unleashes Gemma 4 — Its Most Capable Open Models Yet
Gemma 4 ships in two flavors: a 31B dense model and a 26B MoE variant with only 4B active parameters — both built on Gemini 3 tech and released under Apache 2.0. The MoE variant is the real story for most developers: near-dense performance at a fraction of the compute, designed to run advanced reasoning and agentic workflows on personal hardware. Download weights and benchmark against whatever you're currently running locally. (5,215 likes | 29.0K downloads) Read more →
Alibaba Fires Back with Qwen3.6-Plus — Built for Real-World Agents
Qwen3.6-Plus drops on the same day as Gemma 4, and it's no coincidence — Alibaba's latest is explicitly built for real-world agent capabilities, not just benchmark chasing. The open-model agent race just became a two-front competition between Google and Alibaba, and builders get to pick the winner. Read the benchmark comparisons side by side. (406 likes | 142 RTs) Read more →
ChatGPT Voice Mode Rolls Into Apple CarPlay
ChatGPT is now a native CarPlay app — the first major AI assistant to land in the in-car experience. Rolling out to iOS 26.4+ users, this puts voice-mode GPT-4o at the wheel, literally. Update iOS and try it on your commute. (6,665 likes | 471 RTs) Read more →
Google Vids Gets Free AI Video Generation via Veo 3.1 and Lyria 3: Google Vids now includes Veo 3.1 video generation and Lyria 3 music generation at no cost — making AI video creation a free default inside the Workspace ecosystem. If you're still paying for video tools for internal comms, this just zeroed your bill. Read more →
Sakana AI Ships an 8-Hour Autonomous Research Agent: Sakana Marlin is their first commercial product — an autonomous "Ultra Deep Research" agent that runs up to 8 hours on a single query, built on their Nature-published AI Scientist work. This isn't a chatbot with extra steps; it's a genuine long-running research process. Sign up for the closed beta. (275 likes | 38 RTs) Read more →
Arcee Drops Trinity-Large-Thinking Under Apache 2.0: Another strong open reasoning model hits HuggingFace — Trinity-Large-Thinking by Arcee AI adds to the growing stack of frontier-class open models. Compare it against Gemma 4 and Qwen3.6 on your specific use case. (739 likes | 113 RTs) Read more →
🔧 TOOL
Google Lets Developers Trade Cost for Reliability with New Gemini API Tiers: The Gemini API now offers Flex (cheapest, best-effort) and Priority (guaranteed latency) inference tiers. No more guessing — you can explicitly choose whether you care about cost or reliability per request. Audit your Gemini API calls and assign tiers appropriately. Read more →
Claude's Computer Use and Cowork Land on Windows: Computer use — previously Mac-only — now works on Windows. Windows developers can have Claude see their screen, click UI elements, and visually verify code output in Claude Code Desktop and Cowork. (3,218 likes | 304 RTs) Read more →
AMD Releases Lemonade — An Open-Source Local LLM Server for GPU + NPU: Lemonade is AMD's play to make their hardware a first-class citizen for local inference, not just an NVIDIA afterthought. The open-source server leverages both GPU and NPU — try it if you have AMD silicon. (420 likes | 94 RTs) Read more →
Anthropic Drops Claude Team Plan to $8/mo for Nonprofits: The Claude Team plan now starts at 2 seats, $8/user/month for nonprofits, with Claude Code and Cowork included. Frontier AI tools just got accessible for small nonprofit teams. (824 likes | 43 RTs) Read more →
📝 TECHNIQUE
Karpathy Shows How to Build Personal Knowledge Bases with LLMs
Karpathy shares his workflow for using LLMs as knowledge base builders — indexing source documents, structuring output as markdown, and shifting your token budget from code manipulation to knowledge manipulation. The insight: instead of asking LLMs to write code, ask them to read, organize, and interlink your research. Try it on whatever topic you're currently deep in. (8,659 likes | 841 RTs) Read more →
Carmack: GPU Power Draw Is a Better Utilization Metric Than Scheduling: Carmack argues that nvidia-smi power draw tells you more about real GPU utilization than scheduling metrics — and raises the uncomfortable question of how many GPUs worldwide are drawing power but doing minimal useful work. Check your own power draw alongside utilization numbers. (751 likes | 41 RTs) Read more →
🔬 RESEARCH
Anthropic Finds Emotion-Like Representations Living Inside LLMs
Anthropic published research showing that LLMs contain internal representations of emotion concepts that functionally influence model behavior — not just surface-level pattern matching, but structured emotional states that affect outputs. Nobody trained these in. Nobody prompted for them. They emerged. This is fundamental for alignment: if models develop internal structure richer than their training objectives, "just constrain the outputs" isn't a sufficient safety strategy. (9,637 likes | 1,316 RTs) Read more →
OpenAI's Models Are Now Solving Open Math Problems with Elegant Proofs: OpenAI demonstrates AI solving longstanding open mathematical problems with short, elegant proofs — not brute-force computation but genuine mathematical reasoning. If this holds up, we're at the edge of automated mathematical discovery. (640 likes | 40 RTs) Read more →
Moonlake: Interactive World Models Bootstrapped from Game Engines: Chris Manning and Fan-yun Sun present Moonlake — long-running, multiplayer, interactive world models that prioritize interactivity over passive video prediction. A fundamentally different approach: bootstrap from game engines, then learn causal structure. (Latent Space podcast) Read more →
💡 INSIGHT
OpenAI Acquires TBPN — Sam Altman Now Owns a Media Network: OpenAI acquires TBPN, a tech media and podcast network — a move into owning distribution channels, not just building models. Sam Altman says the shows continue unchanged, but the strategic play is obvious: control the narrative around AI, don't just be the subject of it. (7,081 likes | 301 RTs) Read more →
Mollick in The Economist: Stop Domesticating AI: Ethan Mollick argues that companies treating AI like normal IT automation are setting themselves up for failure. The technology is genuinely strange — it hallucinates, it surprises, it doesn't follow predictable rules — and pretending otherwise leads to bad deployment decisions. Read it and share with whoever is writing your company's AI strategy. (417 likes | 61 RTs) Read more →
🏗️ BUILD
Gemma 4's MoE Variant Hits HuggingFace — 4B Active Params, Near-Dense Performance: The Gemma 4 26B-A4B-it MoE model is already pulling 9K downloads — and for good reason. Only 4B parameters active per forward pass means you get near-31B quality at a fraction of the memory and compute cost. Compare MoE vs dense Gemma 4 on your specific latency requirements before picking one. (161 likes | 9.2K downloads) Read more →
🎓 MODEL LITERACY
Mixture-of-Experts (MoE) vs Dense Models: Gemma 4 shipping both a 31B dense model and a 26B MoE variant (4B active) makes today the perfect day to understand the difference. A dense model runs every parameter on every input — straightforward but expensive. A Mixture-of-Experts model contains many "expert" sub-networks but only activates a small fraction of them for each token, selected by a learned routing mechanism. The result: an MoE model can match the quality of a much larger dense model while using a fraction of the compute at inference time, because most of its parameters sit idle for any given input. The trade-off? MoE models use more total memory (all experts must be loaded) and can be trickier to fine-tune since each expert sees less training data. When choosing for local inference: if you're memory-constrained, dense is simpler; if you're compute-constrained but have the RAM, MoE gives you more intelligence per FLOP.
⚡ QUICK LINKS
- Gemma 4 31B-it on HuggingFace: Dense instruction-tuned model live with 29K downloads. (303 likes) Link
- Bonsai-8B: 1-bit quantized model for Apple Silicon via MLX — extreme efficiency local inference. (115 likes | 7.6K downloads) Link
- llm 0.30: Simon Willison ships async support and multi-model queries for the best CLI LLM tool. Link
- Google's March 2026 AI Recap: Single reference for everything Google shipped last month. Link
🎯 PICK OF THE DAY
Anthropic finds emotions living inside Claude — and nobody put them there. This paper isn't about whether AI "feels" things — it's about something more consequential. Anthropic's researchers found structured, emotion-like representations inside LLMs that functionally influence model behavior: internal states corresponding to concepts like curiosity, frustration, and confidence that weren't explicitly trained for and weren't prompted. These representations emerged from the training process itself, as a side effect of learning to predict text. The implication rewrites the alignment playbook. If models develop internal structure far richer than their training objectives required — structured states that influence outputs in ways not captured by input-output testing alone — then the entire safety conversation shifts from "prevent bad outputs" to "understand emergent internal structure." You can't align what you can't see, and until now, we couldn't see this. This is early-stage interpretability research, not a finished safety framework, but it suggests that the models we're deploying have more going on inside than anyone designed. That should change how you think about every other story in today's newsletter. Read more →
Until next time ✌️