Loading...
Loading...
The article reviews the last six months of rapid progress in large language models (LLMs), highlighted at PyCon US 2026. Key news: November 2025 marked an inflection point when model leadership shifted repeatedly (Claude Sonnet 4.5, GPT-5.1, Gemini 3, GPT-5.1 Codex Max, Claude Opus 4.5) and coding agents—boosted by Reinforcement Learning from Verifiable Rewards—became practically useful for daily development. The author used a pelican-on-a-bicycle prompt to compare models, built experiments like
The article distills six months of rapid LLM progress, highlighting a November 2025 inflection when coding agents became reliably useful. Model leadership shuffled among Claude Sonnet 4.5, GPT-5.1, Gemini 3, GPT-5.1 Codex Max, and Claude Opus 4.5, with Opus dominating afterward. Crucially, OpenAI and Anthropic’s reinforcement-learning work produced coding agents that moved from “often-work” to “mostly-work,” making them practical for daily development. The author recounts personal experiments—like a browser-playable micro-javascript project running Python-in-WebAssembly—and tracks the meteoric rise of a three-month-old personal assistant project, Warelay, rebranded as OpenClaw, which sparked a category of “Claws” (personal AI assistants). The piece is part reflection, part annotated PyCon US 2026 lightning talk.
The author summarizes six months of rapid LLM progress, centered on a November 2025 inflection when coding agents became reliably useful. Model leadership flipped among Claude Sonnet 4.5, GPT-5.1, Gemini 3, GPT-5.1 Codex Max, and Claude Opus 4.5, with Opus 4.5 widely seen as top for a period. The real breakthrough was reinforcement learning improvements (RLfVR) that made coding agents practical daily tools. A small open-source project—initially Warelay, later OpenClaw—exploded into prominence as a popular “Claw” personal AI assistant. The author also recounts hobby projects (including a micro-javascript in Python running via Pyodide) that illustrate experimentation enabled by better models. The piece highlights model competition, coding-agent quality, and rapid ecosystem growth.
The most newsworthy development over the past six months has been rapid LLM progress—especially a November 2025 inflection where coding agents became reliably useful. Simon Willison summarizes shifting model leadership (Claude Sonnet 4.5 → GPT-5.1 → Gemini 3 → GPT-5.1 Codex Max → Claude Opus 4.5) and credits RL from Verifiable Rewards efforts by OpenAI and Anthropic for dramatically better code generation. A viral new open-source project—originally “Warelay,” now OpenClaw—emerged as a popular personal AI assistant (“Claws”), gaining massive attention within three months. Willison also recounts experimentation over the holidays, including a toy micro-javascript-in-Python project run in Pyodide, illustrating both rapid innovation and many ephemeral developer prototypes. This matters for developers, tools, and product teams building practical AI coding workflows.
The article reviews the last six months of rapid progress in large language models (LLMs), highlighted at PyCon US 2026. Key news: November 2025 marked an inflection point when model leadership shifted repeatedly (Claude Sonnet 4.5, GPT-5.1, Gemini 3, GPT-5.1 Codex Max, Claude Opus 4.5) and coding agents—boosted by Reinforcement Learning from Verifiable Rewards—became practically useful for daily development. The author used a pelican-on-a-bicycle prompt to compare models, built experiments like a micro-JavaScript-in-Python demo, and recounts the rise of a fast-moving open project (Warelay → OpenClaw), which helped popularize “Claws” as a term for personal AI assistants. Takeaway: models and coding agents matured quickly, spawning new tooling and fast-rising open projects that matter to developers and product teams.