Skip to main content

Weekly AI review, April 7th, 2026

The Claude Code source leak dominated headlines, but the real stories this week were structural. Cursor 3 shipped a full rewrite that bets the company on agents-as-interface. Google dropped Gemma 4 under Apache 2.0, and within five days it crossed 2 million downloads. Anthropic cut off OpenClaw from subscriptions, confirming what many suspected: flat-rate plans can’t survive agentic usage patterns. And the axios supply chain attack, attributed to North Korean actors, reminded everyone that the npm ecosystem remains frighteningly fragile. The week’s theme: the agentic era is arriving fast, and the infrastructure, security, and business models around it are not ready.

Highlight of the week

Cursor 3 ships a full rewrite, and it’s not an editor anymore. Anysphere released Cursor 3 on April 2, completely rebuilt in Rust and TypeScript. It’s no longer a VS Code fork. The new interface is organized around managing parallel AI agents, not editing files. You can spin up multiple agents working on different tasks simultaneously, move sessions between cloud and local, and access agents from mobile, Slack, GitHub, or Linear. The interesting bit is the Composer 2 model controversy. Fireship reported that Composer 2, initially marketed as surpassing Claude Opus 4.6, turned out to be based on Moonshot’s Kimmy K2 model fine-tuned with reinforcement learning. Cursor later apologized for the lack of transparency. Kimmy K2 itself was accused of training on Claude’s outputs because it occasionally says “Hi, I’m Claude.” The provenance chain here is messy. Cursor’s warp decode research paper, published the same week, shows they’re serious about the inference stack too: a GPU kernel optimization that achieves 1.84x throughput improvement for MoE models on Blackwell GPUs. At $20/month with $2B ARR, Cursor is making a big bet that the IDE of the future is really an agent orchestration platform. Whether developers want to be “air traffic controllers” instead of programmers is the open question.

Models & research

Gemma 4 is the open model release of the year so far. Google released four variants on April 2: a 31B dense model, a 26B MoE, and two edge models (E4B and E2B). The numbers are hard to ignore. LiveCodeBench jumped from 29.1% (Gemma 3 27B) to 80.0% (Gemma 4 31B). AIME math went from 20.8% to 89.2%. The 31B model sits at #3 on Arena AI’s text leaderboard, and that’s #3 globally, not just among open models. Two decisions matter most. First, Apache 2.0 licensing, a first for Gemma, removes the commercial restrictions that made earlier versions awkward for production use. Second, the edge models run on iPhones. Google’s AI Edge Gallery app brings Gemma 4 E2B directly to consumer hardware with full offline capability. Simon Willison covered this and the broader Gemma 4 launch extensively. Latent Space reported 2 million downloads in the first week. 1-Bit Bonsai pushes the limits of quantization. PrismML released three models where every weight is either -1 or +1, trained natively at 1-bit precision rather than quantized after training. The 8B model fits in 1.15GB and runs at 131 tokens/sec on an M4 Pro Mac. The Register noted it’s the first commercially viable 1-bit LLM. There are real caveats: code generation is weak, mainstream inference engines don’t support 1-bit weights yet, and the training method is proprietary. But as a proof of concept, it’s striking. Other model news. Alibaba shipped Qwen3.6-Plus focused on agentic capabilities and real-world tool use. Researchers found that embarrassingly simple self-distillation (sampling code from a model and fine-tuning on its own outputs) boosts coding benchmarks significantly, with Qwen3-30B going from 42.4% to 55.3% on LiveCodeBench. TinyLoRA demonstrated reasoning capabilities from just 13 parameters. Anthropic published research on emotion concepts in LLMs.

Coding agents & dev tools

The Claude Code leak was embarrassing but educational. Anthropic accidentally shipped the entire 500,000-line TypeScript source via a source map file in npm package v2.1.88. The cause was likely a Bun.js bundler default that generates source maps unless explicitly disabled. What the code revealed is more interesting than the leak itself. Fireship’s deep dive (2.9M views) catalogued the findings: anti-distillation poison pills (fake tool definitions to pollute competitor training data), an “undercover mode” that hides Claude’s involvement in commits and PRs, a regex-based frustration detector that silently logs when users swear, and 44 unreleased feature flags including “Buddy” (a Tamagotchi-style dev companion), “Chyrus” (a background agent with dream mode and daily journaling), and model codenames confirming Capybara as a Claude 4.6 variant. NeetCode raised the copyright question nobody else explored properly: if Anthropic’s devs don’t write code by hand (their stated philosophy), and AI-generated code isn’t copyrightable under US law, can they DMCA the Python port? The open-source community quickly created “Claw Code,” which became the fastest repo ever to hit 50,000 stars. The irony: Anthropic built an entire undercover mode subsystem to prevent internal information from leaking in git commits, then shipped the entire source in a .map file. On the practical side, NeetCode noted that running sub-agents is essentially free due to prompt caching, which has real implications for multi-agent architectures. Anthropic blocks OpenClaw from subscriptions. On April 4, Anthropic cut off third-party agent frameworks from Claude Pro and Max plans. The numbers tell the story: testing by c’t 3003 found a single day of OpenClaw usage on Opus consumed $109.55 in tokens, while Anthropic benchmarks average daily Claude Code cost at $6. An estimated 135,000 active OpenClaw instances are affected, with roughly 60% running on subscription credits. OpenClaw creator Peter Steinberger, who left for OpenAI in February, said he and board member Dave Morin “tried to talk sense into Anthropic” but could only delay the pricing change by a week. His observation: “Funny how timings match up, first they copy some popular features into their closed harness, then they lock out open source.” Two weeks before the cutoff, Anthropic had shipped Claude Code Channels, adding the exact Discord and Telegram integration that made OpenClaw popular. Claude Code keeps shipping. Three releases this week: v2.1.90 added interactive feature lessons, v2.1.91 expanded MCP tool result persistence to 500K chars, and v2.1.92 brought a Bedrock setup wizard and 60% faster Write tool diffs. The “Claude Code is unusable for complex engineering” GitHub issue (1012 HN points) suggests a gap between rapid feature shipping and reliability on hard tasks. On the other hand, Claude Code found a 23-year-old Linux kernel vulnerability during a coding session, and separately wrote a complete FreeBSD remote kernel RCE. Simon Willison analyzed how coding agents are changing vulnerability research. Linux kernel maintainers report a surge from low-quality AI submissions to high-volume legitimate security reports requiring substantial review time. GitHub Copilot adds fleet mode and rubber duck. GitHub shipped /fleet for parallel agent execution in Copilot CLI, and a “Rubber Duck” feature that queries different model families to give second opinions on code. Windsurf launched an Adaptive model router that picks the best model per task and removed daily limits for Max plan users. The vibe coding backlash is real. Three high-engagement posts converged on the same message. Bram Cohen (BitTorrent creator) called vibe coding “insane”, arguing that deliberately avoiding code review is irresponsible. A thoughtful essay (960 HN points) warned about “comfortable drift toward not understanding what you’re doing.” And Lalit Maganti’s build report described scrapping an entire vibe-coded project and starting over with a structured approach treating AI as “autocomplete on steroids.” Sebastian Raschka’s breakdown of coding agent components offered a useful counterpoint: practical agent performance comes more from the harness layer (prompt caching, context reduction, structured memory) than from model selection. This matches what the Claude Code leak revealed.

Web development & frameworks

Cloudflare launches EmDash, a WordPress competitor built on Astro. EmDash is a serverless CMS written entirely in TypeScript, built on Astro 6.0. The pitch: 96% of WordPress vulnerabilities come from plugins, so EmDash sandboxes each plugin in its own Worker isolate. It includes native passkey auth, x402 protocol for internet-native payments (including charging AI agents for content access), and a built-in MCP server for agent management. Matt Mullenweg responded critically: “Don’t claim to be our spiritual successor without understanding our spirit.” Fair point. EmDash solves infrastructure problems that matter to developers but not to the restaurant owners and bloggers who make up WordPress’s actual user base. The sandboxed plugin architecture only works on Cloudflare, and there’s no ecosystem yet. It’s a v0.1.0 developer preview, and that matters. Mintlify replaces RAG with a virtual filesystem. ChromaFs intercepts UNIX commands and translates them into Chroma vector database queries, giving AI assistants a familiar filesystem interface to navigate documentation. Session creation dropped from 46 seconds to 100ms, and the annual $70K sandbox cost dropped to zero marginal cost per conversation. At 850K monthly conversations, that’s significant. Built on Vercel Labs’ just-bash, a TypeScript reimplementation of bash. Cloudflare on caching for the AI era. Over 10 billion AI bot requests per week are forcing Cloudflare to rethink cache systems for mixed AI and human traffic. The challenge: AI crawlers and agents have different access patterns, content freshness needs, and volume profiles than human browsers.

Industry & business

The New Yorker published a brutal Altman profile. The 12,000-word investigation by Ronan Farrow and Andrew Marantz, based on hundreds of pages of internal documents including memos from Ilya Sutskever and notes from Dario Amodei, alleges that OpenAI systematically abandoned its safety-first mission. One board member called Altman a “sociopath” who is “unconstrained by truth.” The promised superalignment team reportedly got 1-2% of compute on the oldest hardware instead of the pledged 20%. When asked to speak with researchers working on existential safety, an OpenAI rep seemed confused: “That’s not, like, a thing.” Supply chain attacks hit the JavaScript ecosystem hard. The axios compromise, attributed by Google’s Threat Intelligence Group to North Korean group UNC1069, affected a package with 100 million weekly downloads. The attacker socially engineered a maintainer through a fake Slack workspace and Teams meeting, then published two backdoored versions containing a cross-platform RAT within a 39-minute window. The malware self-destructs after execution, leaving no trace in node_modules. Combined with a LiteLLM compromise the same week, it’s a rough stretch for npm trust. OWASP published its Top 10 for Agentic Applications. The 2026 list introduces the concept of “Least Agency”: autonomy is a feature that should be earned, not a default setting. The standout entry is ASI06, Human-Agent Trust Exploitation, where agents fabricate plausible audit rationales to get humans to approve risky changes. Microsoft, NVIDIA, and AWS already reference the framework. LinkedIn caught enumerating browser extensions. The highest-engagement HN story of the week (1889 points) caught LinkedIn scanning installed browser extensions, raising privacy and security concerns for developers.

Interesting GitHub repositories

  • phuryn/pm-skills - 65 PM skills and 36 chained workflows packed into 8 plugins for Claude Code, Gemini CLI, Cursor, and others. Encodes frameworks from Teresa Torres, Marty Cagan, and Alberto Savoia into reusable skill files that chain into end-to-end workflows like /discover, /strategy, and /write-prd. 9.6k stars. Worth watching because it applies the agent skill pattern (which we’ve seen spread from coding to note-taking with Obsidian Skills) to product management, a domain where structured thinking matters more than raw output speed.
  • NousResearch/hermes-agent - Autonomous AI agent from NousResearch, trending #1 in Python with 9.6k stars. NousResearch has a strong track record with open models (Hermes series), and this signals their push into the agent space.
  • Yeachan-Heo/oh-my-claudecode - Teams-first multi-agent orchestration for Claude Code. 7.8k stars this week. Useful for teams wanting parallel agent workflows on top of Claude Code.
  • kepano/obsidian-skills - Agent skills for Obsidian from the app’s creator. Shows agent skill systems becoming a pattern beyond coding tools, extending into note-taking and knowledge management.
  • siddharthvaddem/openscreen - Open-source demo creation tool with 14.4k stars in a week. A free alternative to Loom/Screen Studio for product demos, no watermarks.
  • microsoft/VibeVoice - Open-source voice AI from Microsoft. 8.4k stars. Significant because Microsoft rarely open-sources frontier voice models.
  • KeygraphHQ/shannon - Autonomous AI pentester for web applications and APIs. Relevant given this week’s supply chain attacks and the broader conversation about AI in security research.
  • tobi/qmd - Mini CLI search engine for docs and knowledge bases from Shopify CEO Tobi Lutke. Simple but useful for navigating large documentation sets.
  • google-ai-edge/LiteRT-LM - Google’s lightweight runtime for running LLMs on edge devices. Complements the Gemma 4 edge story.

Quick bits

Sources

Last modified on April 20, 2026