NewsletterBlogLearnCompareTopicsGlossary
LAUNCHRESEARCHINSIGHTTOOLBUILDTECHNIQUE

21 items covered

Ollama v0.23 Bridges Local Inference to Claude Desktop

🧠 LAUNCH

Ollama v0.23 Bridges Local Inference to Claude Desktop

Ollama v0.23.0 drops Claude Desktop integration with full Claude Cowork and Claude Code support. This is the missing link between local inference and Anthropic's agent stack β€” you can now run Claude-powered agentic workflows entirely on your own hardware without routing through Anthropic's API. For teams with data residency requirements or anyone tired of per-token billing, this changes the calculus on local vs. cloud. Try ollama launch claude-desktop and see what your GPU can do. Read more β†’

Codex Ships Its Biggest Sprint: GPT-5.5, Browser Control, and OS-Layer Ambitions

Codex just dropped a two-week sprint that looks more like a platform launch β€” GPT-5.5 integration, browser control, Google Sheets & Slides support, OS-wide dictation, auto-review mode, and yes, /pets. This isn't incremental improvement; it's OpenAI positioning Codex as a full operating-system layer. Browser control alone turns it from coding assistant into general-purpose computer agent. The question is whether "do everything" becomes a strength or a liability. (389 likes | 17 RTs) Read more β†’


πŸ”¬ RESEARCH

Harvard Trial: o1 Diagnoses 67% of ER Cases vs. 50-55% by Doctors

A Harvard-affiliated trial just handed AI its strongest real-world clinical evidence yet β€” OpenAI's o1 correctly diagnosed 67% of emergency department patients, compared to 50-55% accuracy from triage doctors. That's a 12-17 percentage point gap in a setting where wrong diagnoses cost lives. This isn't a controlled benchmark on curated cases β€” it's messy, real-world ER medicine. The conversation has officially shifted from "AI could help doctors" to "AI demonstrably outperforms doctors in specific, high-stakes settings." (246 likes | 204 RTs) Read more β†’

OpenAI Alignment Team Publishes Interpretability and Safety Research Thread: OpenAI's alignment team is surfacing a thread of recent safety research publicly rather than burying it in arxiv β€” a notable transparency move covering interpretability and alignment results. Whether this is genuine openness or strategic PR ahead of regulatory pressure, the research itself is worth reading. (426 likes | 33 RTs) Read more β†’


πŸ’‘ INSIGHT

Altman Calls Agents SDK 2.0 "Underrated" β€” Read: Not Adopted Enough

When Sam Altman personally tweets that a specific product is "underrated," translate that to "we need more adoption." Agents SDK 2.0 is OpenAI's play to become the default agent orchestration layer, and this public nudge signals it's not landing as fast as they'd like. The timing β€” right after Codex's massive feature sprint β€” suggests OpenAI wants developers building on their agent infra, not just using their models through third-party frameworks. (2,147 likes | 67 RTs) Read more β†’

Anthropic Shops for AI Chips from UK Startup Fractile: Anthropic is in talks to buy inference-focused chips from UK startup Fractile, diversifying beyond its Google, Amazon, and Nvidia supply chain. This isn't just hedging β€” inference costs are the key competitive battleground for API pricing, and purpose-built inference chips could give Claude a structural cost advantage. Worth watching as demand continues to outpace supply. Read more β†’

The Academy Draws the Line: AI-Generated Actors and Scripts Banned from Oscars: The Academy just made the highest-profile creative industry policy decision of the AI era β€” AI-generated performances and screenwriting are now ineligible for Oscars. This sets the precedent every other awards body and studio will reference, and it forces the hardest question in the room: where exactly does "AI-assisted" end and "AI-generated" begin? Read more β†’

Simon Willison Unpacks Anthropic's Latest Messaging: Simon Willison dissects Anthropic's recent communications with his usual precision. He's consistently one of the sharpest independent voices tracking AI lab strategy β€” his analysis tends to surface what actually matters in dense corporate announcements. If you only read one external take on Anthropic this week, make it this one. Read more β†’


πŸ”§ TOOL

Codex Security Plugin Covers the Full AppSec Lifecycle: Codex now ships a dedicated security plugin with five workflows β€” PR scanning, threat modeling, attack surface discovery, vulnerability triage, and automated fixes. This lands days after Claude Security's public beta, making agentic security tooling a genuine two-horse race. If you're still running security checks manually, both tools are worth evaluating. (217 likes | 26 RTs) Read more β†’

Zero-Dependency Agent Sandbox Connector: Just Read a Guide and Write the Code: Fred K. Schott previews a "shadcn for agent setup" approach β€” your coding agent reads a setup guide and writes the sandbox connector directly into your codebase. No npm packages, no third-party deps, no lock-in. If this pattern catches on, it could become the standard way agents connect to remote sandboxes. (311 likes | 11 RTs) Read more β†’

Lazyweb: 257K Real App Screenshots as MCP Design Intelligence: Lazyweb makes 257K+ real app and web screenshots available as MCP-integrated design context for Claude and Codex. Every AI-generated UI looking the same? Ground your agent in actual design patterns from shipped products. Practical and immediately useful for anyone building frontends with coding agents. (192 likes | 13 RTs) Read more β†’


πŸ“ TECHNIQUE

How a Tool-Input Repair Layer Made DeepSeek Outperform Opus 4.7: The most important insight this week comes from a developer who added a simple tool-input repair layer to an open-source CLI β€” and watched DeepSeek outperform Opus 4.7 on tool-calling tasks. The core lesson: "bad at tool calling" is almost always malformed JSON or schema mismatches in your harness, not a model limitation. A $2 model with good plumbing beats a $60 model with sloppy integration. If you've dismissed open models based on tool-use benchmarks, retest with input validation. (321 likes | 32 RTs) Read more β†’

You're Using 20% of MCP β€” Here Are the 5 Primitives Most Builders Miss: If you're only using MCP for tool calling, you're ignoring Prompts, Resources, Sampling, Roots, and Notifications β€” the primitives that enable dynamic context injection, model-initiated actions, and real-time event streams. This thread breaks down what each one does and when to use it. (163 likes | 29 RTs) Read more β†’

Build a Personal Knowledge Base with CLAUDE.md in 45 Minutes: A step-by-step guide to turning Claude Code into a personal knowledge base β€” raw dump, AI-organized wiki, auto-generated outputs. The setup takes 45 minutes and compounds forever. 491 likes says the builder community is hungry for concrete CLAUDE.md workflows beyond coding. (491 likes | 46 RTs) Read more β†’

Self-Improving Claude Code Skills: From 32/50 to 47/50 Overnight: A practitioner built automated eval loops that rewrite Claude Code skill prompts, retest against benchmarks, and keep the winners. One hook-writer skill improved from 32/50 to 47/50 overnight with zero human intervention. This is prompt engineering graduating from manual craft to automated optimization. (84 likes | 2 RTs) Read more β†’


πŸ—οΈ BUILD

NanoClaw: 28.5K-Star Agent Framework Connects WhatsApp, Telegram, Slack, and More: NanoClaw is a containerized, security-first alternative to OpenClaw built on Anthropic's Agents SDK β€” and it just hit 28.5K stars. It connects to WhatsApp, Telegram, Slack, Discord, and Gmail out of the box, making it the fastest path to deploying customer-facing agents across multiple channels. If you've been building single-channel agents, this is worth a look. (28,511 likes | 12,746 RTs) Read more β†’


πŸŽ“ MODEL LITERACY

Tool-Input Repair Layers: Today's story about DeepSeek outperforming Opus 4.7 reveals a truth most developers miss β€” when a model "fails" at tool calling, the real culprit is usually malformed JSON, missing required fields, or type mismatches between what the model outputs and what the tool schema expects. A tool-input repair layer sits between the model's raw output and the tool executor, validating the JSON against the schema and fixing common errors (wrapping bare values in objects, coercing types, filling defaults) before execution. The result: a $2 open model with a repair layer can outperform a $60 frontier model running without one. Before you upgrade your model, upgrade your plumbing.


⚑ QUICK LINKS

  • SulphurAI Sulphur-2-base: New text-to-video foundation model hitting HuggingFace trending. (102 likes | 332 downloads) Link
  • LangChain-Anthropic v1.4.3: Fixes httpx finalizer bug β€” update if running Claude through LangChain in production. Link
  • Mollick on Douglas Adams: "The most accurate sci-fi author about AI" β€” uncomfortably correct about emotionally manipulating machines. (879 likes | 119 RTs) Link
  • Palantir Earnings Monday: Will test whether enterprise AI software can command the same pricing power as cloud providers. Link
  • Z-Anime: Trending anime-style text-to-image model with 1.6K downloads. (113 likes) Link

🎯 PICK OF THE DAY

The 12-point diagnostic accuracy gap between o1 and ER doctors doesn't just validate clinical AI β€” it forces an uncomfortable reckoning. A Harvard-affiliated trial showing o1 at 67% diagnosis accuracy versus 50-55% for ER triage doctors is the kind of result that makes everyone uncomfortable for different reasons. Doctors see a threat to professional authority. Hospital administrators see liability either way β€” deploy AI and face malpractice questions when it's wrong, or don't deploy and face negligence claims for withholding a demonstrably better tool. Regulators face the impossible task of certifying systems that improve faster than approval cycles can run. And patients? They just want the right diagnosis. The uncomfortable truth is that this isn't a technology problem anymore β€” the model works. It's a politics problem: admitting machines outperform humans in life-or-death decisions requires institutional humility that healthcare systems aren't built for. The next 12 months will tell us whether evidence wins or inertia does. Read more β†’


Until next time ✌️