The last six months in LLMs in five minutes

Disclosure: Some links in this article are affiliate links. AI Maestro may earn a commission if you make a purchase, at no…

By AI Maestro May 19, 2026 2 min read
The last six months in LLMs in five minutes

“`html




The last six months in LLMs in five minutes

Simon Willison – simonwillison.net

The last six months in LLMs in
five minutesSimon Willison - simonwillison.netPyCon US 2026 Lightning Talk

I presented this lightning talk at PyCon US 2026, attempting to summarize the last six months of developments in LLMs in five minutes.

The November inflection point

Six months is a pretty convenient time period to cover, because it captures what I’ve been calling the November 2025 inflection point. November was a critical month in LLMs, especially for coding.

The “best” model changed hands 5 times
between Anthropic, OpenAl and Google

For one thing, the supposedly “best” model (depending mostly on vibes) changed hands five times between the three big providers.

Generate an SVG of a
pelican riding a bicycle

As always, I’m using my Generate an SVG of a pelican riding a bicycle test to help illustrate the differences between the models.

Why this test? Because pelicans are hard to draw, bicycles are hard to draw, pelicans can’t ride bicycles… and there’s zero chance any AI lab would train a model for such a ridiculous task.

Five pelicans, one for each of the following models. Varying qualities!

At the start of November the widely acknowledged “best” model was Claude Sonnet 4.5, released on 29th September. It drew me this pelican.

In November it was overtaken by GPT-5.1, then Gemini 3, then GPT-5.1 Codex Max, and then Anthropic took the crown back again with Claude Opus 4.5.

I think Gemini 3 drew the best pelican out of this lot, but pelicans aren’t everything. Most practitioners will agree that Opus 4.5 held the crown for the next couple of months.

The coding agents got good

It took a little while for this to become clear, but the real news from November was that the coding agents got good.

OpenAI and Anthropic had spent most of 2025 running Reinforcement Learning from Verifiable Rewards to increase the quality of code written by their models, especially when paired up with their Codex and Claude Code agent harnesses.

In November the results of this work became apparent. Coding agents went from often-work to mostly-work, crossing a quality barrier where you could use them as a daily-driver to get real work done, without needing to spend most of your time fixing their stupid mistakes.

Screenshot of "Initial commit" on GitHub to steipete/Warelay, commit f6dd362, steipete authored on Nov 24, 2025It's a copy of the MIT license

Also in November, this happened – the first commit to an obscure (back then) repo called “Warelay” by some guy called Pete.

December/January
(A little bit of LLM psychosis)

Over the holiday period, from December to January, a whole lot of us took advantage of the break to have a poke at these new models and coding agents and see what they could do.

They could do a lot! Some of us got a little bit over-excited. I had my own short-lived bout of a form of LLM psychosis as I started spinning up wildly ambitious projects to see how far I could push them.

micro-javascript playground
Execute JavaScript code in a sandboxed micro-javascript environment powered by Pyodidevar numbers = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10];
var doubled = numbers.map(n => n * 2);<br />
console.log(‘Doubled: “‘, doubled);<br />
var evens = numbers.filter(n => n % 2 === 0);<br />
console.log(‘Evens: ‘, evens);<br />
var sum = numbers.reduce((a, b) => a + b, @);<br />
console.log(‘Sum: “‘, sum);</p><p>Output 27<br />
Doubled: [2, 4, 6, 8, 10, 12, 14, 16, 18, 20]<br />
Evens: [2, 4, 6, 8, 10]<br />
Sum: 55<br />
Execution time: 8.00ms<br />
About: micro-javascript is a pure Python JavaScript interpreter with configurable memory and time limits. This playground runs entirely in your browser using<br />
Pyodide (Python compiled to WebAssembly). View on GitHub” src=”https://static.simonwillison.net/static/2026/5-minutes-llms/5-minutes-llms.009.jpeg” /></p><p>One of my projects was a vibe-coded implementation of JavaScript in Python – a loose port of <a href=MicroQuickJS – which I called micro-javascript. You can try it out in your browser in this playground.

JavaScript running in Python running in Pyodide running in WebAssembly running in JavaScript

That playground demo shows JavaScript code run using my micro-javascript library, in Python, running inside Pyodide, running in WebAssembly, running in JavaScript, running in a browser!

It’s pretty cool! But did anyone out there need a buggy, slow, insecure half-baked implementation of JavaScript in Python?

They did not. I have quite a few other projects from that holiday period that I have since quietly retired!

February 2026

On to February. Remember that Warelay project that had its first commit at the end of November?

Warelay → CLAWDIS → CLAWDBOT →
Clawdbot → Moltbot →<hr><p><em>Originally published at <a href=simonwillison.net. Curated by AI Maestro.

Stay ahead of AI. Get the most important stories delivered to your inbox — no spam, no noise.

Name
Scroll to Top