Gemma 4 12B ships encoder-free multimodal with open weights

🧠 LAUNCH

Gemma 4 12B ships encoder-free multimodal with open weights.

Google DeepMind drops Gemma 4 12B — a unified decoder-only transformer that processes text, images, and video without a separate vision encoder, CLIP adapter, or modality-specific heads. Open weights, runs locally, and already supported in Transformers v5.10.1 and Ollama on day one. The encoder-free design isn't just elegant — it means one architecture to fine-tune, one attention space where the model reasons across modalities natively, and a significantly smaller deployment footprint. If you're building multimodal pipelines, download it and benchmark against your current stack. (8,272 likes | 1,099 RTs) Read more →

Ideogram 4.0 goes open-weight in a major strategy reversal.

Ideogram releases v4.0 as fully downloadable, fine-tunable open weights — claimed best open image generation model available. This is a hard pivot from Ideogram's previous closed-only approach, and it matters: you can now own the weights, run them on your infra, and fine-tune without API dependencies. The open-source image generation space just got its most capable contender. (4,186 likes | 462 RTs) Read more →

OpenAI expands GPT-Rosalind for enterprise life sciences. GPT-Rosalind gets new capabilities — a model series purpose-built for life sciences research combining agentic coding with domain-specific drug discovery and experimental workflow intelligence. If your team runs computational biology or pharma pipelines, this is worth evaluating. (1,753 likes | 173 RTs) Read more →

Anthropic formalizes its partner ecosystem with a services track. The new Claude Partner Network introduces a services track and partner hub — signaling Anthropic's transition from product-only to platform play with certified implementation partners. If you're a consultancy building on Claude, this is your on-ramp. Read more →

🔬 RESEARCH

Anthropic maps a year of real AI-enabled attacks to MITRE ATT&CK.

Anthropic examined 832 malicious accounts and mapped their activity onto the MITRE ATT&CK framework — producing the most comprehensive public dataset on how attackers actually use AI in practice. The findings are striking: real-world AI misuse clusters heavily around reconnaissance and social engineering, not the autonomous weapon scenarios dominating policy debates. If you're on a security team, this dataset should reshape your threat model. (553 likes | 71 RTs) Read more →

Microsoft MAI tech report reveals zero synthetic data training. swyx highlights that Microsoft's MAI technical report is unusually transparent — the model uses zero synthetic data for training. At a time when most frontier labs lean heavily on synthetic data pipelines, this is a deliberate and revealing methodological choice. One of the most detailed training reports at this scale. (1,839 likes | 229 RTs) Read more →

What is mid-training and why it explains model divergence. HuggingFace publishes a clear explainer on mid-training — the increasingly critical stage where base models are continued on curated domain-specific data before RLHF. Understanding this stage explains why models like Gemma 4 and MAI perform differently despite similar parameter counts and architectures. (384 likes | 50 RTs) Read more →

Axiom Math pushes toward provably correct AI outputs. Latent Space interviews Axiom Math on verified generation — the idea that formal verification can make AI outputs provably correct, not just probably correct. A glimpse at where reliability-critical AI is headed, especially for domains where "mostly right" isn't good enough. Read more →

📝 TECHNIQUE

Inside Claude Code's skills architecture: the missing manual.

Anthropic engineers explain how skills work internally in Claude Code — reusable, composable agent behaviors that persist across sessions and projects. The post covers the full architecture: how skills are discovered, loaded, and composed into workflows. If you're building Claude Code automations and wondered why some setups just work better, this is the doc you were missing. Read more →

Anthropic replaces internal dashboards with conversational analytics. Anthropic's own teams now use Claude for self-service data queries instead of maintaining traditional dashboards — a dogfooding case study with practical patterns for teams that want to kill their BI tool subscriptions. The key insight: natural language queries against structured data beat point-and-click dashboards when the questions change faster than the dashboards can be updated. Read more →

🔧 TOOL

Claude Cowork gets its official multi-agent playbook. The official best practices guide for Claude Cowork — the agent tab that spawns sub-agents and loops until tasks complete. Covers practical patterns for multi-agent delegation: when to fan out, how to scope sub-tasks, and what to keep in the main loop. If you've been guessing at Cowork patterns, start here. Read more →

Transformers v5.10.1 ships native Gemma 4 support. HuggingFace Transformers adds first-class Gemma 4 12B support including multi-token prediction — the framework plumbing that makes today's Gemma 4 launch immediately usable in production pipelines. Update and go. Read more →

Claude Code v2.1.162 adds agent visibility and effort persistence. New release adds waitingFor (shows what's blocking your agent), /effort now persists across sessions, and Grep/Glob tools work correctly on native builds. Small release, meaningful quality-of-life improvements for daily Claude Code users. Read more →

💡 INSIGHT

Uber's $1,500/month AI cap is the pricing signal everyone needed. Uber caps per-employee AI tool spend at $1,500/month — the first public data point from a major tech company on what they think AI coding tools are actually worth per seat. Every AI tool vendor just got a ceiling to price against, and every procurement team just got a benchmark to negotiate with. (452 likes | 42 RTs) Read more →

Claude Mythos already beat superforecasters' year-end predictions. Mollick flags that Claude Mythos hit the 3–4 hour METR task horizon in May — seven months ahead of the best superforecaster predictions for end-of-2026 AI agent capability. When your expert forecasters are consistently too conservative, the error bars on "what AI can do next year" are wider than anyone's planning for. (351 likes | 29 RTs) Read more →

Ted Chiang makes his definitive case against AI consciousness. The most respected voice at the intersection of AI and literature argues in The Atlantic that current AI systems are not conscious — and that the question itself is being asked wrong. Whether you agree or not, this essay will shape how the non-technical public thinks about AI for years. Required reading for anyone who has to explain AI to a board, a regulator, or a dinner party. (159 likes | 243 RTs) Read more →

Meta builds a $200/month consumer vibe-coding agent called 'Hatch'. Internal docs reveal Meta is building Hatch — describe what you want in plain language, Hatch builds it, priced up to $200/month for consumers. Another major lab betting that the future of software isn't writing code, it's describing outcomes. The consumer AI agent market is getting crowded fast. (84 likes | 6 RTs) Read more →

🏗️ BUILD

Gemma 4 12B instruction-tuned model live on HuggingFace. The IT variant is already the fastest trending model on HuggingFace today with GGUF quantizations from Unsloth available for local deployment. If you want to run Gemma 4 locally without waiting for official quantization releases, Unsloth has you covered. (156 likes | 463 downloads) Read more →

🎓 MODEL LITERACY

Encoder-Free Multimodal Architecture: Most multimodal AI models use a two-part design — a vision encoder (like CLIP) processes images into embeddings, then a language model reasons over those embeddings alongside text. Gemma 4 throws this out entirely: one decoder-only transformer handles text, images, and video through the same attention mechanism with no separate encoder. Why does this matter? Fewer moving parts means easier fine-tuning (one model to train, not two), a smaller deployment footprint (no encoder weights to serve), and a single unified attention space where the model reasons across modalities natively instead of stitching together representations from architecturally different components. The trade-off is that the decoder must learn visual understanding from scratch rather than inheriting it from a pre-trained encoder — which is why this approach only becomes viable at sufficient scale and data.

⚡ QUICK LINKS

Anthropic expands Glasswing to 150 more orgs: Claude Mythos Preview access widens across 15+ countries. (3,136 likes | 328 RTs) Link
OpenAI Python SDK v2.41.0: Built-in moderation endpoints for Responses and Chat Completions APIs. Link
DPO beyond chatbots: Practical guide to applying Direct Preference Optimization to code generation, summarization, and structured output. Link
Sam Altman backs new AI Executive Order: Calls the balance between frontier development and safety "right" — unusually unified industry consensus. (2,463 likes | 148 RTs) Link

🎯 PICK OF THE DAY

The actual AI threat surface looks nothing like the one dominating policy debates. Anthropic's analysis of 832 real-world malicious accounts, mapped onto MITRE ATT&CK, is the most rigorous public accounting of how attackers actually use AI — and the gap between perceived and real threats is jarring. While policy conversations fixate on autonomous weapons and self-replicating agents, the data shows attackers overwhelmingly use AI for mundane but effective work: drafting phishing emails, generating social engineering scripts, automating reconnaissance. The sophisticated doomsday scenarios get the headlines; the actual damage comes from AI making existing attack playbooks faster and cheaper. This matters because security teams allocating resources based on hypothetical risks are defending against the wrong threat model. Every CISO should read this report, recalibrate their priorities, and ask a hard question: are we defending against what AI attacks actually look like, or what we imagine they might look like? (553 likes | 71 RTs) Read more →

Until next time ✌️