10 comments

  • Tiberium 2 minutes ago

    The frontend examples, especially the first one, look uncannily similar to what Gemini 3 Pro usually produces. Make of that what you will :)

      reissbaker a few seconds ago

      I don't mind if they're distilling frontier models to make them cheaper, and open-sourcing the weights!

  • XCSme 15 minutes ago

    Funny how they didn't include Gemini 3.0 Pro in the bar chart comparison, considering that it seems to do the best in the table view.

  • esafak 40 minutes ago

    The terminal bench scores look weak but nice otherwise. I hope once the benchmarks are saturated, companies can focus on shrinking the models. Until then, let the games continue.

      CuriouslyC 32 minutes ago

      We're not gonna see significant model shrinkage until the money tap dries up. Between now and then, we'll see new benchmarks/evals that push the holes in model capabilities in cycles as they saturate each new round.

        lanthissa 17 minutes ago

        isn't gemini 3 flash already model shrinkage that does well in coding?

          hedgehog 14 minutes ago

          Smaller open-weights models are also improving noticeably (like Qwen3 Coder 30B), the improvements are happening at all sizes.

            cmrdporcupine 9 minutes ago

            Devstral Small 24b looks promising as something I want to try fine tuning on DSLs, etc. and then embedding in tooling.

      bigyabai 16 minutes ago

      It's a good model, for what it is. Z.ai's big business prop is that you can get Claude Code with their GLM models at much lower prices than what Anthropic charges. This model is going to be great for that agentic coding application.

  • cmrdporcupine 17 minutes ago

    Running it in Crush right now and so far fairly impressed. It seems roughly in the same zone as Sonnet, but not as good as Opus or GPT 5.2.