Claude Design Arrives: Anthropic Bets Big on Visual Prototyping

🧠 LAUNCH

Claude Design Arrives: Anthropic Bets Big on Visual Prototyping

Anthropic is pushing Claude beyond text and code into full UI/UX prototyping — Claude Design lets you sketch, iterate, and refine interfaces inside the conversation loop instead of bouncing between Figma and a chatbot. Nearly 6K likes suggest the developer community agrees this is the right move. The prototyping-first workflow means you go from idea to clickable mockup without leaving Claude. If you build interfaces, this replaces at least one tool in your stack. (5,784 likes | 179 RTs) Read more →

Claude Plugs Into Enterprise Compliance and Security Stacks

Claude now integrates with compliance APIs and security tooling for IT teams managing deployments at scale. This is Anthropic's direct answer to the biggest enterprise objection: "we can't govern Claude like our other apps." With native hooks into audit logging, access controls, and compliance frameworks, the gap between Claude and enterprise-approved software just got a lot smaller. If you manage Claude deployments, the compliance API docs are worth reviewing today. Read more →

ChatGPT Moves Into PowerPoint — Native Slide Creation, Not Just Suggestions. ChatGPT can now create and edit native, editable PowerPoint files directly — not just suggesting bullet points, but building the actual deck. This is the kind of boring-but-massive integration that drives daily enterprise usage. If you're on ChatGPT Plus, the PowerPoint beta is live. (1,554 likes | 106 RTs) Read more →

Alibaba Drops Qwen3.7-Max, Built Specifically for Coding Agents. Qwen3.7-Max is Alibaba's proprietary model optimized for long-horizon autonomous execution, coding agents, and MCP orchestration — a direct shot at Claude and GPT on the agent use case. If your agent workflows need a new benchmark, this is the challenger to test. (252 likes | 15 RTs) Read more →

Tencent Open-Sources Hy-MT2: SOTA Translation Across 33 Languages. Tencent releases fully open-source 7B and 30B-A3B translation models that hit state-of-the-art among open-source models across 33 languages. If you need production translation without API costs, this is the new baseline to benchmark against. (374 likes | 54 RTs) Read more →

🔧 TOOL

Claude Code's /usage Command Finally Shows Where Your Tokens Go. When you're running skills, agents, and MCP servers simultaneously, you have no idea which one is eating your budget — until now. The new /usage command breaks down token spend by component, so you can see if that MCP server is costing 10x what your skill invocations are. Run it in your next session. (3,862 likes | 262 RTs) Read more →

Claude Code v2.1.147: Pinned Sessions Survive Idle, /code-review Lands. Pinned background sessions now survive idle timeouts and auto-restart on updates — no more losing your long-running agent context because you stepped away. The /simplify command has been renamed to /code-review with effort levels and inline GitHub commenting. Power user features that reduce context loss. Read more →

Anthropic Python SDK Exposes Thinking Token Counts in Streaming. anthropic-sdk-python v0.104.0 adds a beta feature surfacing estimated thinking token counts in streaming deltas. If you're running extended thinking in production, you can finally monitor and budget for inference cost in real time instead of getting surprised on the invoice. Read more →

Google Antigravity Gets 30+ Life Science Database Integrations. Google's Antigravity platform ships purpose-built Science Skills integrating 30+ major life science databases — UniProt, PDB, NCBI, and more. Researchers can now query across datasets that previously required separate specialized tools and manual cross-referencing. (251 likes | 39 RTs) Read more →

💡 INSIGHT

Trump's Proposed 90-Day AI Review Window Would Reshape the Release Race

A reported executive order would impose a mandatory 90-day government review window before any AI model release. If enacted, this reshuffles everything — open-source rapid iteration, international competition, and the quarterly release cadence that frontier labs have settled into. OpenAI and Anthropic are reportedly pledging cooperation, which tells you who this regulation actually hurts: everyone else. The compliance overhead alone could price startups and open-source competitors out of the game. (69 likes | 6 RTs) Read more →

How Security Teams Are Actually Using Opus: Threat Detection to Incident Response. Anthropic publishes real-world case studies of security teams deploying Opus for threat detection, vulnerability analysis, and incident response. This moves the conversation from "can AI do security" to "here's how it's working in production." The partner deployment patterns are worth studying if you're evaluating AI for your security stack. Read more →

Disproving Erdős Cost Less Energy Than Three Almond Milk Lattes. Ethan Mollick puts OpenAI's math proof in resource context: 0.6–6.3 kWh of electricity and 3–31 liters of water to disprove an 80-year-old conjecture. That's less than three almond milk lattes. Next time someone tells you AI research costs too much, you have your rebuttal. (2,785 likes | 233 RTs) Read more →

Daytona Hits 850K Daily Sandbox Runs as the Default Agent Infrastructure Layer. Daytona is quietly becoming the plumbing behind coding agents — 850K daily sandbox runs, 74% month-over-month growth, and growing adoption as the default compute environment for agentic workflows. If you're building agents that need sandboxed execution, this is the provider scaling fastest. Read more →

🔬 RESEARCH

OpenAI's Model Disproves an 80-Year-Old Erdős Conjecture in Discrete Geometry

An OpenAI model has produced a formal disproof of a central conjecture in discrete geometry that stood for 80 years — the unit distance problem posed by Paul Erdős. This isn't "AI helps a mathematician" — the model generated the key construction autonomously. The full technical write-up details the proof approach, and whether you're excited or unsettled, this is a milestone for AI in pure mathematics. (634 likes | 437 RTs) Read more →

Mosaic Pushes the Pareto Frontier of ML Weather Forecasting. Mosaic matches the best deterministic weather forecasters while providing calibrated uncertainty estimates — knowing that your 72-hour forecast is 85% confident matters as much as the prediction itself. For operational meteorology, this is a meaningful advance over models that give you a number without telling you how much to trust it. (1,041 likes | 116 RTs) Read more →

📝 TECHNIQUE

40K Stars: A Repo of Leaked System Prompts from Every Major AI Lab. A GitHub repo collecting extracted system prompts from Opus 4.7, Sonnet 4.6, ChatGPT 5.5, Gemini 3.5 Flash, and more has hit 40K+ stars. Security concern or masterclass in production prompt engineering — either way, seeing how labs actually prompt their own models reveals patterns that no tutorial will teach you. Study the structural choices: guardrails, persona framing, tool-use instructions. (40,549 likes | 6,743 RTs) Read more →

🏗️ BUILD

Datasette Agent: Simon Willison Ships AI-Powered Database Exploration. Willison combines Datasette with an AI agent that autonomously explores, queries, and analyzes databases — drop it on an unfamiliar SQLite file and it figures out the schema, runs queries, and surfaces insights. A practical tool for anyone who needs to make sense of data they didn't create. Read more →

physics-intern: A Simple Harness That Doubles Gemini's Physics Scores. A HuggingFace evaluation harness pushes Gemini 3.1 Pro from 17.7% to 31% on science problems just by structuring the prompting better. No fine-tuning, no new model — pure prompt engineering. Shows how much low-hanging fruit remains in evaluation methodology and how much benchmark scores depend on how you ask the question. (296 likes | 43 RTs) Read more →

🎓 MODEL LITERACY

Test-Time Compute (Thinking Tokens): When a model encounters a hard problem, it can now "think harder" — spending more compute at inference time to reason through complexity before answering. Anthropic's Python SDK (v0.104.0) just started exposing thinking token counts in streaming, making this visible and billable. This matters because test-time compute is becoming a first-class cost and quality lever: a simple question might use 50 thinking tokens while a complex coding task burns 8,000. As extended thinking moves from lab curiosity to production budget line item, understanding that you're paying not just per-request but per-thought-step changes how you design prompts, set budgets, and choose which queries deserve deep reasoning versus quick answers.

⚡ QUICK LINKS

OpenAI Codex Goals: Structured agent task management — define high-level objectives, let the agent plan. (368 likes | 23 RTs) Link
Kimi K2.6: Open-source model that codes, designs, and orchestrates 100 agents at once. (172 likes | 109 RTs) Link
Simon Willison on Antigravity: What exactly is Google's agent harness, and should you build on it? (212 likes | 7 RTs) Link
Cursor Automations: No-repo and multi-repo agent workflows ship with a 50% launch discount. (19 likes) Link
DeepMind Asia Pacific Accelerator: Environmental AI applications — climate, disaster response, conservation. Link

🎯 PICK OF THE DAY

A 90-day review window won't slow the labs — it will freeze everyone else. Trump's proposed executive order requiring government review before any AI model release sounds like a brake on the frontier race. It isn't. Anthropic and OpenAI already operate on quarterly release cycles — a 90-day window is just paperwork they can absorb with dedicated policy teams. The real casualties are open-source projects and startups that move fast precisely because they don't have compliance departments. Every day of regulatory overhead is a day that well-capitalized incumbents extend their lead while smaller players burn runway waiting for approval. If you wanted to design a regulation that calcifies the current market structure while appearing to promote safety, this is exactly what it would look like. Watch for the official executive order text — the details on what counts as a "model release" and who's exempt will determine whether this is a speed bump or a moat. Read more →

Until next time ✌️