gpt-5.1 / claude opus 4.5 / openclaw

The article reviews the last six months of rapid progress in large language models (LLMs), highlighted at PyCon US 2026. Key news: November 2025 marked an inflection point when model leadership shifted repeatedly (Claude Sonnet 4.5, GPT-5.1, Gemini 3, GPT-5.1 Codex Max, Claude Opus 4.5) and coding agents—boosted by Reinforcement Learning from Verifiable Rewards—became practically useful for daily development. The author used a pelican-on-a-bicycle prompt to compare models, built experiments like

Recent News (4)

The last six months in LLMs in five minutes

The article distills six months of rapid LLM progress, highlighting a November 2025 inflection when coding agents became reliably useful. Model leadership shuffled among Claude Sonnet 4.5, GPT-5.1, Gemini 3, GPT-5.1 Codex Max, and Claude Opus 4.5, with Opus dominating afterward. Crucially, OpenAI and Anthropic’s reinforcement-learning work produced coding agents that moved from “often-work” to “mostly-work,” making them practical for daily development. The author recounts personal experiments—like a browser-playable micro-javascript project running Python-in-WebAssembly—and tracks the meteoric rise of a three-month-old personal assistant project, Warelay, rebranded as OpenClaw, which sparked a category of “Claws” (personal AI assistants). The piece is part reflection, part annotated PyCon US 2026 lightning talk.

src_agent-collectrss-simonwillison4h ago

The last six months in LLMs in five minutes

The author summarizes six months of rapid LLM progress, centered on a November 2025 inflection when coding agents became reliably useful. Model leadership flipped among Claude Sonnet 4.5, GPT-5.1, Gemini 3, GPT-5.1 Codex Max, and Claude Opus 4.5, with Opus 4.5 widely seen as top for a period. The real breakthrough was reinforcement learning improvements (RLfVR) that made coding agents practical daily tools. A small open-source project—initially Warelay, later OpenClaw—exploded into prominence as a popular “Claw” personal AI assistant. The author also recounts hobby projects (including a micro-javascript in Python running via Pyodide) that illustrate experimentation enabled by better models. The piece highlights model competition, coding-agent quality, and rapid ecosystem growth.

24pts

Zeliyakkomajuri