2025: The Year in LLMs

87 points | by simonw 2 hours ago

36 comments

  • skydhash 42 minutes ago

    Pretty much a whole year of nothing really. Just coming with a bunch of abstraction and ideas trying to solve an unsolvable problem. Getting reliable results from an unreliable process while assuming the process is reliable.

    At least when herding cats, you can be sure that if the cats are hungry, they will try to get where the food is.

      MattRix 36 minutes ago

      I’m not sure how to tell you how obvious it is you haven’t actually used these tools.

        skydhash 29 minutes ago

        Why do people assume negative critique is ignorance?

          dmd 26 minutes ago

          People denied that bicycles could possibly balance even as others happily pedaled by. This is the same thing.

            skydhash 14 minutes ago

            Please tell me which one of the headings is not about increased usage o LLMs and derived tools and is about some improvement in the axes of reliability or or any kind of usefulness.

            Here is the changelog for OpenBSD 7.8:

            https://www.openbsd.org/78.html

            There's nothing here that says: We make it easier to use it more of it. It's about using it better and fixing underlying problems.

              simonw 10 minutes ago

              The coding agent heading. Claude Code and tools like it represent a huge improvement in what you can usefully get done with LLMs.

              Mistakes and hallucinations matter a whole lot less if a reasoning LLM can try the code, see that it doesn't work and fix the problem.

            measurablefunc 21 minutes ago

            Bicycles don't balance, the human on the bicycle is the one doing the balancing.

              dmd 17 minutes ago

              Yes, that is the analogy I am making. People argued that bicycles (a tool for humans to use) could not possibly work - even as people were successfully using them.

                measurablefunc 6 minutes ago

                People use drugs as well but I'm not sure I'd call that successful use of chemical compounds without further context. There are many analogies one can apply here that would be equally valid.

              moralestapia 18 minutes ago

              Yikes.

            tehnub 10 minutes ago

            People did?

  • websiteapi 25 minutes ago

    I'm curious how all of the progress will be seen if it does indeed result in mass unemployment (but not eradication) of professional software engineers.

      ori_b 11 minutes ago

      My prediction: If we can successfully get rid of most software engineers, we can get rid of most knowledge work. Given the state of robotics, manual labor is likely to outlive intellectual labor.

      simonw 21 minutes ago

      I nearly added a section about that. I wanted to contrast the thing where many companies are reducing junior engineering hires with the thing where Cloudflare and Shopify are hiring 1,000+ interns. I ran out of time and hadn't figured out a good way to frame it though so I dropped it.

  • sho_hn 27 minutes ago

    Not in this review: Also the record year in intelligent systems aiding in and prompting human users into fatal self-harm.

    Will 2026 fare better?

      measurablefunc 23 minutes ago

      The people working on this stuff have convinced themselves they're on a religious quest so it's not going to get better: https://x.com/RobertFreundLaw/status/2006111090539687956

      simonw 23 minutes ago

      I really hope so.

      The big labs are (mostly) investing a lot of resources into reducing the chance their models will trigger self-harm and AI psychosis and suchlike. See the GPT-4o retirement (and resulting backlash) for an example of that.

      But the number of users is exploding too. If they make things 5x less likely to happen but sign up 10x more people it won't be good on that front.

      andai 23 minutes ago

      Also essential self-fulfilment.

      But that one doesn't make headlines ;)

        sho_hn 21 minutes ago

        Sure -- but that's fair game in engineering. I work on cars. If we kill people with safety faults I expect it to make more headlines than all the fun roadtrips.

        What I find interesting with chat bots is that they're "web apps" so to speak, but with safety engineering aspects that type of developer is typically not exposed to or familiar with.

          simonw 15 minutes ago

          One of the tough problems here is privacy. AI labs really don't want to be in the habit of actively monitoring people's conversations with their bots, but they also need to prevent bad situations from arising and getting worse.

  • npalli an hour ago

    Great summary of the year in LLMs. Is there a predictions (for 2026) blogpost as well?

  • waldrews an hour ago

    Remember, back in the day, when a year of progress was like, oh, they voted to add some syntactic sugar to Java...

      throwup238 an hour ago

      > they voted to add some syntactic sugar to Java...

      I remember when we just wanted to rewrite everything in Rust.

      Those were the simpler times, when crypto bros seemed like the worst venture capitalism could conjure.

        OGEnthusiast 39 minutes ago

        Crypto bros in hindsight were so much less dangerous than AI bros. At least they weren't trying to construct data centers in rural America or prop up artificial stocks like $NVDA.

  • AndyNemmity an hour ago

    These are excellent every year, thank you for all the wonderful work you do.

  • the_mitsuhiko an hour ago

    > The (only?) year of MCP

    I like to believe, but MCP is quickly turning into an enterprise thing so I think it will stick around for good.

      simonw an hour ago

      I think it will stick around, but I don't think it will have another year where it's the hot thing it was back in January through May.

  • aussieguy1234 40 minutes ago

    > The year of YOLO and the Normalization of Deviance #

    On this including AI agents deleting home folders, I was able to run agents in Firejail by isolating vscode (Most of my agents are vscode based ones, like Kilo Code).

    I wrote a little guide on how I did it https://softwareengineeringstandard.com/2025/12/15/ai-agents...

    Took a bit of tweaking, vscode crashing a bunch of times with not being able to read its config files, but I got there in the end. Now it can only write to my projects folder. All of my projects are backed up in git.

  • sanreau an hour ago

    > Vendor-independent options include GitHub Copilot CLI, Amp, OpenHands CLI, and Pi

    ...and the best of them all, OpenCode[1] :)

    [1]: https://opencode.ai

      nineteen999 30 minutes ago

      How did I miss this until now! Thank you for sharing.

      simonw an hour ago

      Good call, I'll add that. I think I mentally scrambled it with OpenHands.

        the_mitsuhiko 41 minutes ago

        Thanks for adding pi to it though :)

  • agentifysh 31 minutes ago

    What an amazing progress in just short time. The future is bright! Happy New Year y'all!

  • castwide 42 minutes ago

    2025: The Year in LLMs

    I will never stop treating hallucinations as inventions. I dare you to stop me. i double dog dare y