2 comments

  • aetherspawn 9 minutes ago

    It makes sense to me that distributing across more parameters results in models that can be quant more heavily (information theory - more bits available)

  • lwansbrough 11 minutes ago

    Anyone with a billion dollars want to try this and report back?