Pretty much a whole year of nothing really. Just coming with a bunch of abstraction and ideas trying to solve an unsolvable problem. Getting reliable results from an unreliable process while assuming the process is reliable.
At least when herding cats, you can be sure that if the cats are hungry, they will try to get where the food is.
Please tell me which one of the headings is not about increased usage o LLMs and derived tools and is about some improvement in the axes of reliability or or any kind of usefulness.
Yes, that is the analogy I am making. People argued that bicycles (a tool for humans to use) could not possibly work - even as people were successfully using them.
People use drugs as well but I'm not sure I'd call that successful use of chemical compounds without further context. There are many analogies one can apply here that would be equally valid.
I'm curious how all of the progress will be seen if it does indeed result in mass unemployment (but not eradication) of professional software engineers.
My prediction: If we can successfully get rid of most software engineers, we can get rid of most knowledge work. Given the state of robotics, manual labor is likely to outlive intellectual labor.
I nearly added a section about that. I wanted to contrast the thing where many companies are reducing junior engineering hires with the thing where Cloudflare and Shopify are hiring 1,000+ interns. I ran out of time and hadn't figured out a good way to frame it though so I dropped it.
The big labs are (mostly) investing a lot of resources into reducing the chance their models will trigger self-harm and AI psychosis and suchlike. See the GPT-4o retirement (and resulting backlash) for an example of that.
But the number of users is exploding too. If they make things 5x less likely to happen but sign up 10x more people it won't be good on that front.
Sure -- but that's fair game in engineering. I work on cars. If we kill people with safety faults I expect it to make more headlines than all the fun roadtrips.
What I find interesting with chat bots is that they're "web apps" so to speak, but with safety engineering aspects that type of developer is typically not exposed to or familiar with.
One of the tough problems here is privacy. AI labs really don't want to be in the habit of actively monitoring people's conversations with their bots, but they also need to prevent bad situations from arising and getting worse.
Crypto bros in hindsight were so much less dangerous than AI bros. At least they weren't trying to construct data centers in rural America or prop up artificial stocks like $NVDA.
> The year of YOLO and the Normalization of Deviance #
On this including AI agents deleting home folders, I was able to run agents in Firejail by isolating vscode (Most of my agents are vscode based ones, like Kilo Code).
Took a bit of tweaking, vscode crashing a bunch of times with not being able to read its config files, but I got there in the end. Now it can only write to my projects folder. All of my projects are backed up in git.
Pretty much a whole year of nothing really. Just coming with a bunch of abstraction and ideas trying to solve an unsolvable problem. Getting reliable results from an unreliable process while assuming the process is reliable.
At least when herding cats, you can be sure that if the cats are hungry, they will try to get where the food is.
I’m not sure how to tell you how obvious it is you haven’t actually used these tools.
Why do people assume negative critique is ignorance?
People denied that bicycles could possibly balance even as others happily pedaled by. This is the same thing.
Please tell me which one of the headings is not about increased usage o LLMs and derived tools and is about some improvement in the axes of reliability or or any kind of usefulness.
Here is the changelog for OpenBSD 7.8:
https://www.openbsd.org/78.html
There's nothing here that says: We make it easier to use it more of it. It's about using it better and fixing underlying problems.
The coding agent heading. Claude Code and tools like it represent a huge improvement in what you can usefully get done with LLMs.
Mistakes and hallucinations matter a whole lot less if a reasoning LLM can try the code, see that it doesn't work and fix the problem.
Bicycles don't balance, the human on the bicycle is the one doing the balancing.
Yes, that is the analogy I am making. People argued that bicycles (a tool for humans to use) could not possibly work - even as people were successfully using them.
People use drugs as well but I'm not sure I'd call that successful use of chemical compounds without further context. There are many analogies one can apply here that would be equally valid.
Yikes.
People did?
I'm curious how all of the progress will be seen if it does indeed result in mass unemployment (but not eradication) of professional software engineers.
My prediction: If we can successfully get rid of most software engineers, we can get rid of most knowledge work. Given the state of robotics, manual labor is likely to outlive intellectual labor.
I nearly added a section about that. I wanted to contrast the thing where many companies are reducing junior engineering hires with the thing where Cloudflare and Shopify are hiring 1,000+ interns. I ran out of time and hadn't figured out a good way to frame it though so I dropped it.
Not in this review: Also the record year in intelligent systems aiding in and prompting human users into fatal self-harm.
Will 2026 fare better?
The people working on this stuff have convinced themselves they're on a religious quest so it's not going to get better: https://x.com/RobertFreundLaw/status/2006111090539687956
I really hope so.
The big labs are (mostly) investing a lot of resources into reducing the chance their models will trigger self-harm and AI psychosis and suchlike. See the GPT-4o retirement (and resulting backlash) for an example of that.
But the number of users is exploding too. If they make things 5x less likely to happen but sign up 10x more people it won't be good on that front.
Also essential self-fulfilment.
But that one doesn't make headlines ;)
Sure -- but that's fair game in engineering. I work on cars. If we kill people with safety faults I expect it to make more headlines than all the fun roadtrips.
What I find interesting with chat bots is that they're "web apps" so to speak, but with safety engineering aspects that type of developer is typically not exposed to or familiar with.
One of the tough problems here is privacy. AI labs really don't want to be in the habit of actively monitoring people's conversations with their bots, but they also need to prevent bad situations from arising and getting worse.
Great summary of the year in LLMs. Is there a predictions (for 2026) blogpost as well?
Given how badly my 2025 predictions aged I'm probably going to sit that one out! https://simonwillison.net/2025/Jan/10/ai-predictions/
Remember, back in the day, when a year of progress was like, oh, they voted to add some syntactic sugar to Java...
> they voted to add some syntactic sugar to Java...
I remember when we just wanted to rewrite everything in Rust.
Those were the simpler times, when crypto bros seemed like the worst venture capitalism could conjure.
Crypto bros in hindsight were so much less dangerous than AI bros. At least they weren't trying to construct data centers in rural America or prop up artificial stocks like $NVDA.
These are excellent every year, thank you for all the wonderful work you do.
[delayed]
> The (only?) year of MCP
I like to believe, but MCP is quickly turning into an enterprise thing so I think it will stick around for good.
I think it will stick around, but I don't think it will have another year where it's the hot thing it was back in January through May.
> The year of YOLO and the Normalization of Deviance #
On this including AI agents deleting home folders, I was able to run agents in Firejail by isolating vscode (Most of my agents are vscode based ones, like Kilo Code).
I wrote a little guide on how I did it https://softwareengineeringstandard.com/2025/12/15/ai-agents...
Took a bit of tweaking, vscode crashing a bunch of times with not being able to read its config files, but I got there in the end. Now it can only write to my projects folder. All of my projects are backed up in git.
> Vendor-independent options include GitHub Copilot CLI, Amp, OpenHands CLI, and Pi
...and the best of them all, OpenCode[1] :)
[1]: https://opencode.ai
How did I miss this until now! Thank you for sharing.
Good call, I'll add that. I think I mentally scrambled it with OpenHands.
Thanks for adding pi to it though :)
What an amazing progress in just short time. The future is bright! Happy New Year y'all!
2025: The Year in LLMs
I will never stop treating hallucinations as inventions. I dare you to stop me. i double dog dare y