I struggle with these abstractions over context windows, esp when anthropic is actively focused on improving things like compaction, and knowing the eventual* goal is for the models to yave real memory layers baked in. Until then we have to optimize with how agents work best and ephemeral context is a part of that (they weren’t RL’d/trained with memory abstractions so we shouldn’t use them at inference either). Constant rediscovery that is task specific has worked well for me, doesn’t suffer from context decay, though it does eat more tokens.
Otherwise the ability to search back through history is a valuable simple (rip)grep/jq combo over the session directory. Simple example of mine: https://github.com/backnotprop/rg_history
There is certainly a level where at any time you could be building some abstraction that is no longer required in a month, or 3.
I feel that way too. I have a lot of these things.
But the reality is, it doesn't really happen that often in my actual experience. Everyone is very slow as a whole to understand what these things mean, so far you get quite a bit of time just with an improved, customized system of your own.
My somewhat naive heuristic would be that memory abstractions are a complete mistep in terms of optimization. There is no "super claude mem" or "continual claude" until there actually is.
I tend to agree with you, however compacting has gotten much worse.
So... it's tough. I think memory abstractions are generally a mistake, and generally not needed, however I also think that compacting has gotten so wrong recently that they are also required until Claude Code releases a version with improved compacting.
But I don't do memory abstraction like this at all. I use skills to manage plans, and the plans are the memory abstraction.
But that is more than memory. That is also about having a detailed set of things that must occur.
Your approach essentially matches mine, but I call them plans. I agree with you that the other tools don't seem to add any value compared to this structure.
I think at this point in time, we both have it right.
Is anyone else just completely overwhelmed with the number of things you _need_ for claude code? Agents, sub agents, skills, claud.md, agents.md, rules, hooks, etc.
We use Cursor where I work and I find it a good medium for still being in control and knowing what is happening with all of the changes being reviewed in an IDE. Claude feels more like a black box, and one with so many options that it's just overwhelming, yet I continue to try and figure out the best way to use it for my personal projects.
I'm in Claude Code 30+ hr/wk and always have a at least three tabs of CC agents open in my terminal.
Agree with the other comments: pretty much running vanilla everything and only the Playwright MCP (IMO way better than the native chrome integration) and ccstatusline (for fun). Subagents can be as simple as saying "do X task(s) with subagent(s)". Skills are just self @-ing markdown files.
Two of the most important things are 1) maintaining a short (<250 lines) CLAUDE.md and 2) having a /scratch directory where the agent can write one-off scripts to do whatever it needs to.
I like the finetuning aspect to it quite a lot. It makes sense to me. What I achieved now is a very streamlined process of autonomous work of an agent, which can more and more often be simply managed than controlled on a code review level basis for everything.
I agree that this level of finetuning feels overwhelming and might let yourself doubting whether you do utilize Claude to its optimum and the beauty is, that finetunging and macro usage don't interfere, when you stay in your lane.
For example I now don't use the planing agent anymore instead incorporated this process into the normal agents much to the project's advantage. Consistency is key. Anthropic did the right thing.
Codex is quite a different beast and comes from the opposite direction so to say.
I use both, Codex and Claude Opus especially, in my daily work and found them complementary not mutual exclusive. It is like two different evangelists who are on par exercising with different tools to achieve a goal, that both share.
you really don't need any of this crap. you just need Claude Code and CLAUDE.MD in directories where you need to direct it. complicated AI set ups are mid curve
It feels like Claude is taking more of the Android approach of a less opinionated, but more open stack, so people are bending it to the shape they want to match their workflow. I think of the amnesia problem as pretty agent-agnostic, though, knowing what happens while you're delivering product is more of an agent execution layer problem than a tool problem, and it gets bigger when you have swarms coordinating - Jaya wrote a pretty good article about this https://x.com/AustinBaggio/status/2004599657520123933?s=20
You say "My Claude Code Setup" but where is the actual setup there? I generally agree with everything about how LLMs should be called you say, but I don't see any concrete steps of changing Claude Code's settings in there? Where are the "35 agents. 68 skills. 234MB of context."? Is the implementation of the "Layer 4" programs intended to be left to the reader? That's hardly approachable.
I just don't have much faith in "if you're doing it right the results will be magically better than what you get otherwise" anymore. Any single person saying "the problems you run into with using LLMs will be solved if you do it my way" has to really wow me if they want me to put in effort on their tips. I generally agree with your why of why you set up like that. I'm skeptical that it will get over the hump of where I still run into issues.
When you've tried 10 ways of doing it but they all end up getting into a "feed the error back into the LLM and see what it suggests next" you aren't that motivated to put that much effort into trying out an 11th.
The current state of things is extremely useful for a lot of things already.
That's completely fair, I also don't have much faith in that anymore. Very often, the people who make those claims have the most basic implementation that barely is one.
I'm not sure if the problems you run into with using LLMs will be solved if you do it my way. My problems are solved doing it my way. If I heard more about your problems, I would have a specific answer to them.
These are the solutions to where I have run into issues.
For sure, but my solutions are not feed the error back into the LLM. My solutions are varied, but as the blog shows, they are move as much as possible into scripts, and deterministic solutions, and keep the LLM to the smallest possible scope.
The current state of things is extremely useful for a subset of things. That subset of things feels small to me. But it may be every thing a certain person wants to do exists in that subset of things.
It just depends. We're all doing radically different things, and trying very different things.
I certainly understand and appreciate your perspective.
You don't need all that, just have Claude write the same documentation you would (should) write for any project. I find it best to record things chronologically and then have Claude do periodic reviews of the docs and update key design documents and roadmap milestones. The best part is you get a written record of everything that you can review when you need to remember when and why something changed. They also come in handy for plan mode since they act as a guide to the existing code.
Honestly, very reasonable ask, you're not the first person to ask for a self-hosted version. We have a privacy policy we've drafted that is up-to-date with the current version of the product https://www.ensue-network.ai/privacy-policy.
The project is still in alpha, so you could shape what we build next - what do you need to see, or what gets you comfortable sending proprietary code to other external services?
> what do you need to see, or what gets you comfortable sending proprietary code to other external services?
Honestly? It just has to be local.
At work, we have contracts with OpenAI, Anthropic, and Google with isolated/private hosting requirements, coupled with internal, custom, private API endpoints that enforce our enterprise constraints. Those endpoints perform extensive logging of everything, and reject calls that contain even small portions of code if it's identified as belonging to a secret/critical project.
There's just no way we're going to negotiate, pay for, and build something like that for every possible small AI tooling vendor.
And at home, I feed AI a ton of personal/private information, even when just writing software for my own use. I also give the AI relatively wide latitude to vibe-code and execute things. The level of trust I need in external services that insert themselves in that loop is very high. I'm just not going to insert a hard dependency on an external service like this -- and that's putting aside the whole "could disappear / raise prices / enshittify at any time" aspect of relying on a cloud provider.
There's a lot of people interested in forming some sort of memory layer around vendored LLM services. I don't think they realize how much impact a single error that disappears from your immediate attention can have on downstream performance. Now think of the accrual of those errors over time and your lack of ability to discern if it was service degradation or a bad prompt or a bad AGENTS.md OR now this "long term memory" or whatever. If this sort of feature will ever be viable, the service providers will offer the best solution only behind their API, optimized for their models and their infrastructure.
I mostly use it during long Claude Code research sessions so I don’t lose my place between days.
I run it in automatic mode with decent namespacing, so thoughts, notes, and whole conversations just accumulate in a structured way. As I work, it stores the session and builds small semantic, entity-based hypergraphs of what I was thinking about.
Later I’ll come back and ask things like:
what was I actually trying to fix here?
what research threads exist already?
where did my reasoning drift?
Sometimes I’ll even ask Claude to reflect on its own reasoning in a past session and point out where it was being reactive or missed connections.
You can use it with summaries for sure, but summaries often miss edge cases and long sessions drift. This makes it easier to jump between tasks, come back days later, and reorient without missing something that the summarization or compaction might have gotten rid of. I've often found post-compaction, the memory of even the current session feels so much dumber.
maybe you are in a claude code session and think "didn't i already make design doc for system like this one?" Or you could even look at your thought process in a previous session and reflect. but rn i mainly use it for reviewing research and the hypergraph retrieval
I absolutely love this concept! It's like the thing that I've been looking for my whole life. Well, at least since I've been using Claude Code, which is this year.
I'm sold.
With that said, I can't think of a way that this would work. How does this work? I took a very quick glance, and it's not obvious at first glance.
The whole problem is, the AI is short on context, it has limited memory. Of course, you can store lots of memory elsewhere, but how do you solve the problem of having the AI not know what's in the memory as it goes from step to step? How does it sort of find the relevant memory at the time that that relevance is most active?
Could you just walk through the sort of conceptual mechanism of action of this thing?
Appreciate it - yeah, you're right, models don't work well when you just give it a giant dump of memory.
We store memories in a small DB - think key/value pair with embeddings
Every time you ask Claude something, the skill:
1. Embeds the current request.
2. Runs a semantic + timestamp-weighted search over your past sessions.
Returns only the top N items that look relevant to this request.
3. Those get injected into the prompt as context (like extra system/user messages), so Claude sees just enough to stay oriented without blowing context limits.
Think of it like: Attention over your historical work, more so than brute force recall. Context on demand basically giving you an infinite context window. Bookmark + semantic grep + temporal rank. It doesn’t “know everything all the time.” It just knows how to ask its own past: “What from memory might matter for this?”
When you try it, I’d love to hear where the mechanism breaks for you.
yeah so you can run it in automatic mode, or read only mode. In automatic mode it hooks onto the conversation and tool calls so you get the entire conversation stored. If you dont want to get super deep, then read only is safe and only stores what you ask. You could ask it things like "why is my reasoning dumb" by recalling passed conversations, or even give it the claude tool call sequence and ask "how can claude be smarter about next time".
I think of it like a file tree with proper namespacing and keep abstract concepts in separate directories. so like my food preferences will be in like /preferences/sandos. or you can even do things like /system-design preferences and then load them into a relevant conversation for next time.
I struggle with these abstractions over context windows, esp when anthropic is actively focused on improving things like compaction, and knowing the eventual* goal is for the models to yave real memory layers baked in. Until then we have to optimize with how agents work best and ephemeral context is a part of that (they weren’t RL’d/trained with memory abstractions so we shouldn’t use them at inference either). Constant rediscovery that is task specific has worked well for me, doesn’t suffer from context decay, though it does eat more tokens.
Otherwise the ability to search back through history is a valuable simple (rip)grep/jq combo over the session directory. Simple example of mine: https://github.com/backnotprop/rg_history
There is certainly a level where at any time you could be building some abstraction that is no longer required in a month, or 3.
I feel that way too. I have a lot of these things.
But the reality is, it doesn't really happen that often in my actual experience. Everyone is very slow as a whole to understand what these things mean, so far you get quite a bit of time just with an improved, customized system of your own.
My somewhat naive heuristic would be that memory abstractions are a complete mistep in terms of optimization. There is no "super claude mem" or "continual claude" until there actually is.
I tend to agree with you, however compacting has gotten much worse.
So... it's tough. I think memory abstractions are generally a mistake, and generally not needed, however I also think that compacting has gotten so wrong recently that they are also required until Claude Code releases a version with improved compacting.
But I don't do memory abstraction like this at all. I use skills to manage plans, and the plans are the memory abstraction.
But that is more than memory. That is also about having a detailed set of things that must occur.
I’m interested to see your setup.
I think planning is a critical part of the process. I just built https://github.com/backnotprop/plannotator for a simple UX enhancement
There are a quadrillion startups (mem0, langmem, zep, supermemory), open source repos (claude-mem, beads), and tools that do this.
My approach is literally just a top-level, local, git version controlled memory system with 3 commands:
- /handoff - End of session, capture into an inbox.md
- /sync - Route inbox.md to custom organised markdown files
- /engineering (or /projects, /tasks, /research) - Load context into next session
I didn't want a database or an MCP server or embeddings or auto-indexing when I can build something frictionless that works with git and markdown.
Repo: https://github.com/ossa-ma/double (just published it publicly but its about the idea imo)
Writeup: https://ossa-ma.github.io/blog/double
Your approach essentially matches mine, but I call them plans. I agree with you that the other tools don't seem to add any value compared to this structure.
I think at this point in time, we both have it right.
Is anyone else just completely overwhelmed with the number of things you _need_ for claude code? Agents, sub agents, skills, claud.md, agents.md, rules, hooks, etc.
We use Cursor where I work and I find it a good medium for still being in control and knowing what is happening with all of the changes being reviewed in an IDE. Claude feels more like a black box, and one with so many options that it's just overwhelming, yet I continue to try and figure out the best way to use it for my personal projects.
I'm in Claude Code 30+ hr/wk and always have a at least three tabs of CC agents open in my terminal.
Agree with the other comments: pretty much running vanilla everything and only the Playwright MCP (IMO way better than the native chrome integration) and ccstatusline (for fun). Subagents can be as simple as saying "do X task(s) with subagent(s)". Skills are just self @-ing markdown files.
Two of the most important things are 1) maintaining a short (<250 lines) CLAUDE.md and 2) having a /scratch directory where the agent can write one-off scripts to do whatever it needs to.
How can you - or any human - review that much code?
I like the finetuning aspect to it quite a lot. It makes sense to me. What I achieved now is a very streamlined process of autonomous work of an agent, which can more and more often be simply managed than controlled on a code review level basis for everything.
I agree that this level of finetuning feels overwhelming and might let yourself doubting whether you do utilize Claude to its optimum and the beauty is, that finetunging and macro usage don't interfere, when you stay in your lane.
For example I now don't use the planing agent anymore instead incorporated this process into the normal agents much to the project's advantage. Consistency is key. Anthropic did the right thing.
Codex is quite a different beast and comes from the opposite direction so to say.
I use both, Codex and Claude Opus especially, in my daily work and found them complementary not mutual exclusive. It is like two different evangelists who are on par exercising with different tools to achieve a goal, that both share.
you really don't need any of this crap. you just need Claude Code and CLAUDE.MD in directories where you need to direct it. complicated AI set ups are mid curve
I refuse to learn all the complicated configuration because none of it will matter when they drop the next model.
Things that need special settings now won’t in the future and vice versa.
It’s not worth investing a bunch of time into learning features and prompting tricks that will be obsoleted soon
I wish that were true. Models don't feel like they've really had massive leaps.
They do get better, but not enough to change any of the configuration I have.
But you are correct, there is a real possibility that the time invested with be obsolete at some point.
For sure the work towards MCPs are basically obsolete via skills. These things happen.
It seems to mostly ignore Claude.md
It does, Claude.md is the least effective way to communicate to it.
It's always interesting reading other people's approaches, because I just find them all so very different than my experience.
I need Agents, and Skills to perform well.
It feels like Claude is taking more of the Android approach of a less opinionated, but more open stack, so people are bending it to the shape they want to match their workflow. I think of the amnesia problem as pretty agent-agnostic, though, knowing what happens while you're delivering product is more of an agent execution layer problem than a tool problem, and it gets bigger when you have swarms coordinating - Jaya wrote a pretty good article about this https://x.com/AustinBaggio/status/2004599657520123933?s=20
This isn't necessary. Claude will read CLAUDE.md from both:
I stick general preferences in what it calls "user memory" and stick project specific preferences in the working directory.A claude.md file will give you 90% of what you need.
Consider more when you're 50+ hours in and understand what more you want.
In my experience, I'm at the most where I entirely ignore Claude.md - so it's very interesting how many people have very different experiences.
With Opus 4.5 in Claude Code, I'm doing fine with just a (very detailed) CLAUDE.md.
Do you find you want to share the .md with the teams you work with? Or is it more for your solo coding?
I'm the opposite, I find it straight forward to use all these things, and am surprised people aren't getting it.
I've been trying to write blogs explaining it recently, but I don't think I'm very good at making it sound interesting to people.
What can I explain that you would be interested in?
Here was my latest attempt today.
https://vexjoy.com/posts/everything-that-can-be-deterministi...
You say "My Claude Code Setup" but where is the actual setup there? I generally agree with everything about how LLMs should be called you say, but I don't see any concrete steps of changing Claude Code's settings in there? Where are the "35 agents. 68 skills. 234MB of context."? Is the implementation of the "Layer 4" programs intended to be left to the reader? That's hardly approachable.
I got similar feedback with my first blog post on my do router - https://vexjoy.com/posts/the-do-router/
Here is what I don't get. it's trivial to do this. Mine is of course customized to me and what I do.
The idea is to communicate the ideas, so you can use them in your own setup.
It's trivial to put for example, my do router blog post in claude code and generate one customized for you.
So what does it matter to see my exact version?
These are the type of things I don't get. If I give you my details, it's less approachable for sure.
The most approachable thing I could do would be to release individual skills.
Like I have skills for generating images with google nano banana. That would be approachable and easy.
But it doesn't communicate the why. I'm trying to communicate the why.
I just don't have much faith in "if you're doing it right the results will be magically better than what you get otherwise" anymore. Any single person saying "the problems you run into with using LLMs will be solved if you do it my way" has to really wow me if they want me to put in effort on their tips. I generally agree with your why of why you set up like that. I'm skeptical that it will get over the hump of where I still run into issues.
When you've tried 10 ways of doing it but they all end up getting into a "feed the error back into the LLM and see what it suggests next" you aren't that motivated to put that much effort into trying out an 11th.
The current state of things is extremely useful for a lot of things already.
That's completely fair, I also don't have much faith in that anymore. Very often, the people who make those claims have the most basic implementation that barely is one.
I'm not sure if the problems you run into with using LLMs will be solved if you do it my way. My problems are solved doing it my way. If I heard more about your problems, I would have a specific answer to them.
These are the solutions to where I have run into issues.
For sure, but my solutions are not feed the error back into the LLM. My solutions are varied, but as the blog shows, they are move as much as possible into scripts, and deterministic solutions, and keep the LLM to the smallest possible scope.
The current state of things is extremely useful for a subset of things. That subset of things feels small to me. But it may be every thing a certain person wants to do exists in that subset of things.
It just depends. We're all doing radically different things, and trying very different things.
I certainly understand and appreciate your perspective.
You don't need all that, just have Claude write the same documentation you would (should) write for any project. I find it best to record things chronologically and then have Claude do periodic reviews of the docs and update key design documents and roadmap milestones. The best part is you get a written record of everything that you can review when you need to remember when and why something changed. They also come in handy for plan mode since they act as a guide to the existing code.
The PMs were right all along!
All I use is curse words and it does a damn great job most of the time
Same here :)))), he's really good at understanding when you're pissed off.
Yep, that usually works best.
Don't forget about the co-agents.. yeah.
Nope, I spend time learning my tools.
What's the data retention/deletion policy and is there a self-hosted option planned? I'd prefer not to send proprietary code to third-party servers.
Honestly, very reasonable ask, you're not the first person to ask for a self-hosted version. We have a privacy policy we've drafted that is up-to-date with the current version of the product https://www.ensue-network.ai/privacy-policy.
The project is still in alpha, so you could shape what we build next - what do you need to see, or what gets you comfortable sending proprietary code to other external services?
> what do you need to see, or what gets you comfortable sending proprietary code to other external services?
Honestly? It just has to be local.
At work, we have contracts with OpenAI, Anthropic, and Google with isolated/private hosting requirements, coupled with internal, custom, private API endpoints that enforce our enterprise constraints. Those endpoints perform extensive logging of everything, and reject calls that contain even small portions of code if it's identified as belonging to a secret/critical project.
There's just no way we're going to negotiate, pay for, and build something like that for every possible small AI tooling vendor.
And at home, I feed AI a ton of personal/private information, even when just writing software for my own use. I also give the AI relatively wide latitude to vibe-code and execute things. The level of trust I need in external services that insert themselves in that loop is very high. I'm just not going to insert a hard dependency on an external service like this -- and that's putting aside the whole "could disappear / raise prices / enshittify at any time" aspect of relying on a cloud provider.
There's a lot of people interested in forming some sort of memory layer around vendored LLM services. I don't think they realize how much impact a single error that disappears from your immediate attention can have on downstream performance. Now think of the accrual of those errors over time and your lack of ability to discern if it was service degradation or a bad prompt or a bad AGENTS.md OR now this "long term memory" or whatever. If this sort of feature will ever be viable, the service providers will offer the best solution only behind their API, optimized for their models and their infrastructure.
Just put a claude.md file in your directory. If you want more details about a subdirectory put one in there too.
Claude itself can just update the claude.md file with whatever you might have forgot to put in there.
You can stick it in git and it lives with the project.
I don't understand the use case. I think if you don't use agents, and skills currently effectively, then perhaps this is useful.
If you're using them though, we no longer have the problem of Claude forgetting things.
stop wasting context space with this stuff ミ · · 彡
Congrats for this! how does this differs from claude-mem? I've been using claude-mem for a while now
https://github.com/thedotmack/claude-mem
Have you tried https://github.com/steveyegge/beads
oh yeah beads is awesome! I'd say this is a bit more general purpose rn especially what is in the skill!
Non starter for us, We cant ship propriety data to a third party servers.
I just ask Claude to look at past conversations where I was working on x… it sometimes thinks it can’t see them, but it can.
I’ll give this a go though and let you know!
Here is a simple skill (markdown instruction only) that instructs a nice ripgrep approach - with utility of discovering current session.
https://github.com/backnotprop/rg_history
I mostly use it during long Claude Code research sessions so I don’t lose my place between days.
I run it in automatic mode with decent namespacing, so thoughts, notes, and whole conversations just accumulate in a structured way. As I work, it stores the session and builds small semantic, entity-based hypergraphs of what I was thinking about.
Later I’ll come back and ask things like:
what was I actually trying to fix here?
what research threads exist already?
where did my reasoning drift?
Sometimes I’ll even ask Claude to reflect on its own reasoning in a past session and point out where it was being reactive or missed connections.
Your site advertises careers in San Francisco/Remote. California law requires compensation disclosures.
Good flag, we're still pretty early, I think the strict requirement for compensation disclosures is post 15 employees in CA? Did I get this wrong?
Thank you for specifying it wasn't magic or AGI.
> Not magic. Not AGI. Just state.
Very clearly AI written
You're absolutely right!
jk it is AGI. First.
What is the advantage over summarizing previous sessions for the new one?
Or, over continuing the same session and compacting?
You can use it with summaries for sure, but summaries often miss edge cases and long sessions drift. This makes it easier to jump between tasks, come back days later, and reorient without missing something that the summarization or compaction might have gotten rid of. I've often found post-compaction, the memory of even the current session feels so much dumber.
maybe you are in a claude code session and think "didn't i already make design doc for system like this one?" Or you could even look at your thought process in a previous session and reflect. but rn i mainly use it for reviewing research and the hypergraph retrieval
I absolutely love this concept! It's like the thing that I've been looking for my whole life. Well, at least since I've been using Claude Code, which is this year.
I'm sold.
With that said, I can't think of a way that this would work. How does this work? I took a very quick glance, and it's not obvious at first glance.
The whole problem is, the AI is short on context, it has limited memory. Of course, you can store lots of memory elsewhere, but how do you solve the problem of having the AI not know what's in the memory as it goes from step to step? How does it sort of find the relevant memory at the time that that relevance is most active?
Could you just walk through the sort of conceptual mechanism of action of this thing?
Appreciate it - yeah, you're right, models don't work well when you just give it a giant dump of memory. We store memories in a small DB - think key/value pair with embeddings Every time you ask Claude something, the skill:
1. Embeds the current request.
2. Runs a semantic + timestamp-weighted search over your past sessions. Returns only the top N items that look relevant to this request.
3. Those get injected into the prompt as context (like extra system/user messages), so Claude sees just enough to stay oriented without blowing context limits.
Think of it like: Attention over your historical work, more so than brute force recall. Context on demand basically giving you an infinite context window. Bookmark + semantic grep + temporal rank. It doesn’t “know everything all the time.” It just knows how to ask its own past: “What from memory might matter for this?”
When you try it, I’d love to hear where the mechanism breaks for you.
It looks to me like the skill sets up a connection to their MCP server at api.ensue-network.ai during Claude session start via https://github.com/mutable-state-inc/ensue-skill/blob/main/s...
Then Claude uses the MCP tools according to the SKILL definition: https://github.com/mutable-state-inc/ensue-skill/blob/main/s...
yeah so you can run it in automatic mode, or read only mode. In automatic mode it hooks onto the conversation and tool calls so you get the entire conversation stored. If you dont want to get super deep, then read only is safe and only stores what you ask. You could ask it things like "why is my reasoning dumb" by recalling passed conversations, or even give it the claude tool call sequence and ask "how can claude be smarter about next time".
I think of it like a file tree with proper namespacing and keep abstract concepts in separate directories. so like my food preferences will be in like /preferences/sandos. or you can even do things like /system-design preferences and then load them into a relevant conversation for next time.
Total speculation:
Text Index of past conversations, using prompt-like summaries.