33 comments

  • Tiberium an hour ago

    Sounds like a hallucination unless proven otherwise, even the leading LLMs can do those from time to time, and they will always appear plausible like that. Also could be the session having a lot previous context, like 800K+, which (I think) makes hallucinations more likely.

    Relevant comment from the OP which makes a hallucination more likely:

    > There is one tool call result that includes a string that printed a pathname including minecraft.py because it was listing the files in a Python virtual environment and the Pygments package has a lexer called minecraft.py

      andy99 2 minutes ago

      I realize hallucination has no precise definition but this doesn’t sound at all like anything I’ve ever heard called hallucination. Hallucination is usually plausible wrong answers or made up info that ends up fitting the most likely response (like a manufactured citation) and comes from the way LLMs work at predicting tokens. This example demonstrates completely implausible output, it’s not something that fits with hallucination.

      All that said, it doesn’t require cross session leakage, it could just be training data or like those nightingale (probably the wrong bird) data generations where they just prompt an LLM with nothing and it starts spitting out conversations.

      macNchz an hour ago

      The person posting this claims to have reproduced in a separate context down the thread:

      > Same thing just happened on a Claude Mobile session in same Enterprise account. Common theme in both is Sonnet 5, first response after more than 5 minutes (cache miss).

      xyzzy_plugh an hour ago

      I don't disagree but this sort of thing has to be investigated regardless.

      It's unfortunate that there is so little transparency that even if they deny there was a leak we will never know for certain.

      alserio 20 minutes ago

      Why? what does make it more likely?

  • bix6 26 minutes ago

    So the options are this amazing tech is so stupid it just randomly brings up Minecraft or it’s got a major security issue?

      27183 25 minutes ago

      ¿Por qué no los dos?

  • ryantsuji 9 minutes ago

    Note the repro condition: first response after 5+ min, i.e. a cache miss. A cache leak would show up on hits (someone else's cached prefix), not on misses where everything is recomputed from your own tokens.

  • Avicebron an hour ago

    In order Fable 5 has rejected:

    "Recipe for red-braised pork, I have pork shoulder"

    "Write up a framework for MCP patterns I can give to claude code"

    "explain the biomechanics of motion in c. elegans" (I get this one, I mostly did it to test and it's related to my hobby project)

    Do we get an extra day of functional Fable 5 because it's down?

  • ec109685 an hour ago

    Caching doesn’t work the way the bug reporter implies. Caches are shared (at least across the enterprise), but its key is always a function of the input before it.

    We achieved significant savings simply by moving everything that varies across individuals out of the system prompt so every session starts from a cache point.

    For example you never want your system prompt to start with the time that the session started. Move that to the first user message if needed.

      macNchz an hour ago

      Caching is not supposed to work like that, but that doesn’t preclude the cache key computation function from having bugs.

        marginalia_nu 43 minutes ago

        Yeah there's quite a lot of potential bugs that could have this shape. If I were to guess it could be a buffer in a buffer pool not being sized and zeroed correctly, allowing stale data to bleed between sessions.

      estebarb 9 minutes ago

      Hash functions necesarily have collisions. Also, it is perfectly possible to introduce bugs in the hash function (hash inputs, hash function itself) that allows cross account contamination.

      Waterluvian 20 minutes ago

      There is a massive incentive for optimization, so I expect they’re doing a ton of very clever tricks, all of which make this kind of bug more likely.

      supriyo-biswas an hour ago

      There could just also be a bug where the output tokens of session 1 were shared with session 2, due to a race condition or similar.

  • acepl an hour ago

    Oh yes, we do not need programmers any more…

      emehex an hour ago

      "Coding is largely solved"

        supriyo-biswas 7 minutes ago

        The funny thing is at my current employer, they mentioned that "coding is increasingly becoming a solved problem" and in the same breath, mentioned that one project was too hard for anyone to do so they're not doing it and would rather sell existing features...

        consp an hour ago

        While abused by LLM vendors, that phrase in one form or another I've been hearing since the early '00s and it's likely way older.

          ethagnawl 36 minutes ago

          Sure but have you ever seen it actually play out in practice like it currently is? Whether or not it's true (of course it's not) people are currently behaving as if it is and firing/hiring accordingly.

            philipov 8 minutes ago

            Well, when was the last time you wrote machine code by hand?

            ... but then they went and changed what coding meant.

            We've always been layering abstractions on top of abstractions. If we get to an abstraction that works well enough that you no longer have to dive down into the previous layer, we say we've solved coding, and change what coding means. Obviously LLMs aren't there yet.

        techpression an hour ago

        I love that quote, especially considering the insane amount of bugs that are produced. It’s as easy to debunk as someone claiming ”I can jump to the moon”.

      kylehotchkiss an hour ago

      50% unemployment :D

  • jstummbillig an hour ago

    Is there anything particular about LLMs that would make separating customer data harder than in all SaaS cases?

      adam_arthur 17 minutes ago

      Vibe-coding the implementation.

      I haven't had much issue with Codex, but seems Claude Code has major issues being reported nearly on the daily.

      They also happen to be the most boastful about not reading or looking at the code.

      LLMs are very capable, but not nearly to the level they seem to be messaging.

      (We've actually moved on from vibe-coding to having the LLM vibe code itself in a loop)

        rabbidruster 6 minutes ago

        Interestingly I had an almost identical experience to this report in codex. It output a user memory file that looked awfully real and wasn't at all related to my work.

        27183 9 minutes ago

        > having the LLM vibe code itself in a loop

        The businesslatin name for this is Recursive Self-Improvement

      27183 35 minutes ago

      If I had to hazard a guess, doing anything in a multi-tenant way on a GPU is going to be hard mode compared to most SaaS due to lack of memory safe tooling. I've built multi-tenant SaaS systems, and I've done a little GPU programming (a long time ago), but I've never tried to combine the two disciplines.

      woadwarrior01 31 minutes ago

      It'd be terribly compute inefficient to not share prefix caches (KV cache) across customers.

        acepl 22 minutes ago

        What is the probability that two customers will have exactly the same tokens in cache? Wouldnt it require using the exact same CLAUDE.md, skills, MCPs and context? After that it is even worse since the nondeterminism of LLMs and humans

          27183 17 minutes ago

          I suspect what GP is getting at is there will be a strong incentive to implement some structural sharing across tenants to avoid redundantly storing the same tokens over and over. At least I'd be tempted to do this if I was working with a very precious, constrained resource (e.g. VRAM). Doing this correctly seems.. very difficult.

  • Kapura 21 minutes ago

    happy fourth of july everybody!