I'm not the target demographic, but this seems like a step backwards.
Like, once upon a time maybe you gave your jr programmer a list of things to do, and depending on their skill, familiarity with the cli, hangover status, spelling abilities, etc, you'll get different results. So you write a deterministic shell script.
There’s nothing specific markdown to this. It could just as well be some other markup language, plaintext, or even any other textual language.
You could, for example, put a C program on lines 2 and further and expect/hope/pray Claude to interpret or compile and run that (adding a comment “run the following program; download and compile an interpreter or compiler if needed first” as an instruction to Claude would improve your chances)
Yes, you can use this with any text file and file extension to send file content to Claude Code with unix-like pipe support. Markdown happens to be a format that models like Claude work well with. And they provide a very readable way to mix structured and unstructured content along with code. But I use this with other plain text files regularly.
You could also pass commented code/scripts straight into Claude Code using it quickly without changing how they execute. The prompt instructions could go at the top of a valid file (say python/typescript) as comments, e.g.
I have developed my personal agentic file format `.ag` for this purpose.
Here is the template I start with:
#!/usr/bin/env gpt-agent
input: <target/context (e.g., cwd)>
task: |
<one clear objective>
output: |
<deliverables + required format>
require:
- <must be true before start; otherwise stop + report>
invariant:
- <must stay true while working (scope + safety)>
ensure:
- <must be true at the end (definition of done)>
rescue: |
<what to do if any requirement/invariant/ensure cannot be met>
Thanks for this. I find templates helpful too, and that's a neat structure. I use templates heavily with Obsidian for non-code tasks also. If you want to try it out, you can use this with the claude-run tooling with flags etc with files using your `.ag` extension with the modified shebang.
This could be dinosaur mindset from 2022, but would it not make sense to prompt the LLM to create a bash script based on these instructions, so it could be more deterministic? Claude code is pretty reliable, but this is probably only one and a half nines at best.
As for safety, running this in a devcontainer[1][2] or as part of a CI system should be completely fine.
Thank you, and yes! That is what I already frequently do for quick automation tasks.
As you say, Claude is actually very good at writing shell scripts and using tools on-the-fly. But I know there is an AI-confidence factor involved for developers making the choice to leverage that.
For simple tasks (in practice) I already find you can often prompt the whole thing.
For tasks where you already have the other traditional scripts or building blocks, or where it is complex, then you might break it up.
Interestingly, you can intermix these approaches.
You can have runnable markdown that writes and runs scripts on the fly, mixed with running command line tools, and chained along with traditional tools in a bash script, and then call that script from a runnable markdown that passes in test results, or analyzes the code base and passes recommendations in.
The composability and ability to combine and embed code blocks and tool use within plain language is quite powerful. I’m still learning how to use this.
Also +1 to using containers and sandboxed environments! It means you can yolo it and skip permissions dangerously to experiment with vibe automation :)
More seriously, I agree that setting permissions to the minimum needed for the task and using sandboxed containers is sensible.
I agree that script execution safety is a real concern, as it is with AI coding tools generally. By default the runnable markdown files do not have permission to execute code, unless you specifically add those permissions.
I can see there might be valid arguments for enforcing file type associations for execution at the OS level. These are just text files, and Unix-like environments support making text files executable with a shebang as a universal convention.
I am a fan of that unix-like philosophy generally: tools that try to do a single thing well, can be chained together, and allow users to flexibly create automations using plain text. So I tried to stick with that approach for these scripts.
I'm a bear of little brain, and prompt engineering makes my head hurt. So part of the motivation was to be able to save prompts and collections of prompts once I've got them working, and then execute on demand. I think the high readability of markdown as scripts is helpful for creating assets that can be saved, shared and re-used, as they are self-documenting.
As far as I understand, by default your claude-shebang files inherit the permissions that have been previously granted in the current directory you're executing them in.
The ability to execute code is not granted as part of the directory permissions. By default the scripts will not be able to execute code, only run analysis and text gen tasks. You need to explicitly add the flags for permissions to execute code. There is an example of this above and a few more in the repo README.
The constraints work consistent with Claude’s -p mode. It is isolated from your regular Claude interactive sessions and settings on purpose. And that makes it safer by default because you have to explicitly add permissions.
You can try this out and you’ll see what I mean if you run a few simple examples. This approach was based on experimentation and trying to be consistent with Claude’s own philosophy here.
Does it help get repeatable results if you say, "Use a random seed of 42 for this task"? Or if you somehow lower the temperature, so it's more deterministic?
At the moment, it looks like Claude Code does not support using ‘temperature’ or ‘seed’ flags. It would be awesome if they add that.
Using the request to use a seed within the prompt will mean that when Claude rights the code it could use that seed inside what it writes for randomize functions. But sadly it wouldn’t impact Claude’s own text generation’s determinism.
There is active interest on GitHub to support this. But the most recent issue with it I could see was closed in July as “not planned”
Another format is an interesting idea. This tool will work with any text file content and file extension. So you could create another text-based format yourself and use it in theory.
I think the reasons Markdown is appealing include:
- It's just a text file.
- LLMs like Claude have high comprehension of the format, so Claude Code does very well with it.
- You can mix structured and unstructured text, and code with plain language: YAML frontmatter, outline/headings, code blocks, tables, links and images etc.
I used a heavily condensed version of the example prompt that Pete Koomen posted about as a simplified example, so that's really just me cutting it back to the most simple form of the concept.
In real-use it would be detailed, verbose and specific, and include the actual code blocks and external shell script references to retrieve and execute. So this really is just a proof of concept to give an idea of the sort of thing that people could create in future.
I know lots of us developers joke about getting the sack and losing out to AI. But for what it's worth, the sorts of points you raise are exactly why I think skilled developers become even more valuable than ever with AI.
Programming will change massively this next decade. But it has many times even in my life. So I'm definitely in the camp that thinks this is a new programming abstraction level, and Claude Code and Codex and others are useful tools that improve the productivity of skilled coders. Especially when they are used carefully and thoughtfully.
The YAML-inspired format is clear, and pretty cool actually. You can use those `.ag` files directly with claude-run in the shebang or from the command line unmodified. The shebang gets stripped before the file is passed to Claude Code.
Aside from what several others said about having done something similar locally, wouldn't this be a trivial modification to Simon Willison's `llm` wrapper?
Big fan of Simon Willison. It would be great to see support for executing markdown files directly added to other tools like `llm`. And to Claude Code, Codex themselves.
claude-run is just a bunch of little convenience scripts, but for it to work effectively with code execution, the handling needs to do a little more than just `cat` the file output, for example stripping shebang lines, supporting flags and permissions and a few other things. But all very simple if you see the repo.
Adding support for session isolation and support for different cloud providers and API keys to keep things separate from one's personal Claude subscription took a little work. But that is optional.
Thanks, it’s great to see people trying different approaches to runnable prompts and variations on literate programming. I think it’s an area with a lot of potential, and I expect there will be a lot of interesting ideas come out of it.
There are some tasks that LLMs are good at, but which can be hard to do with traditional command line tools or scripts. This is true even when you are a skilled coder and expert in Shell scripting. Examples include summarization, judgement-based evaluation, formatting etc.
Executable markdown provides a method of building these tasks into traditional pipelines as small, single-task-focused, composable modules. They also have the advantage that they can be easily shared and re-used.
Looks more like executable prompt-files, as there seem to be no extra markdown-handling except removing the shebang. I know AIs are good at handling Markdown-Syntax, but do they support other markup-languages too? So you could use whatever you want here.
Yes you can use this with any text file format and file extension. Markdown just happens to work well with Claude Code and is very readable. But some other comments here mention `.ag` as a nice alternative, and plain text with C code. But you can also use it to send yaml, xml, simple text, or commented code in directly.
The scripts are all pretty simple but they also:
- Handle script-context-relevant flags and control code execution permissions
- Convenience flags for directing scripts to run across cloud providers rather than a personal Claude subscription.
- Session isolation, especially between your regular interactive `claude` command and running with API keys
This means that your runnable script use can be kept isolated from your regular personal Claude environment that you use for interactive development.
I'm also a fan and heavy user of Jupyter notebooks and literate programming in general. I think the use case for runnable mardown files with AI tooling for automation applies to complementary cases.
Having said that, there are ad hoc automation tasks that I've traditionally used Jupyter notebooks to do that I'm finding are easier to get running using markdown files and Claude Code. It's early days and I still am getting a feel for this myself.
There are some comments from earlier with discussion of other literal program tools.
Oh dear... but... but why let some LLM set of unknown source of unknown iteration... execute code... in your machine...?
I was excited in the possibly extravagant implementation idea and... when I read enough to realize it's based on some yet another LLM... Sorry, no, never. You do you.
Roger that. Thank you! Apparently, while I've being employed in security as software engineer for at least 19 years now, I've never ever considered it all serious, and still do not.
Sorry, I have literally no interest in all of it that makes you dependent on it, atrophies mind, degrades research and social skills, and negates self-confidencen with respect to other authors, their work, and attributions. Nor any of my colleagues in military and those I know better in person.
Constant research, general IDEs like JetBrains's, IDA Pro, Sublime Text, VS Code, etc. backed by forums, chats, and Communities, is absolutely enough for the accountable and fun work in our teams, who manage to keep in adequate deadlines.
I just disable it everywhere possible, and will do all my life. The close case to my environment was VS Code, and hopefully there's no reason to build it from source since they still leave built-in options to disable it: https://stackoverflow.com/a/79534407/5113030 (How can I disable GitHub Copilot in VS Code?...)
Isn't it just inadequate to not think and develop your mind, and let alone pass control of your environment to a yet another model or "advanced T9" of unknown source of unknown iteration.
In pentesting, random black-box IO, medicine experimental unverified intel, log data approximation why not? But in environment control, education, art or programming, fine art... No, never ^^
You can use this without letting the markdown scripts you write execute any code at all, whether that is via Claude Code or other AI tool in future.
The default permissions are to not allow execution. Which means that you can use the eval and text-generation capabilities of LLMs to perform assessments and evaluations of piped-in content without ever executing code themselves.
The script shebang has to explicitly add the permissions to run code, which you control. It supports the full Claude Code flag model for this.
One silly fun thing about PHP is php tags. So you can do notebooks like this for kinda free... if you want to execute code? just put it in <?php ?> blocks.
So ... you are letting a nondeterministic LLM operate on the shell, via quasi-shellscript. This will appeal mostly to people who do not have the skillset to write an actual shell-script.
In short, isn't that like giving a voice-controlled scalpel to a random guy on the street an tell them 'just tell it to neurosurgery', and hope it accidentally does the right procedure?
I know this will not appeal to developers who don’t see a legitimate role for the use of AI coding tools with nondeterministic output.
It is intended to be a useful complement to traditional Shell scripting, Python scripting etc. for people who want to add composable AI tooling to their automation pipelines.
I also find that it helps improve the reliability of AI in workflows when you can break down prompts into re-useable single-task-focused modules that leverage LLMs for tasks they are good at (format.md, summarize-logs.md, etc). These can then be chained with traditional Shell scripts and command line tools.
Examples are summarizing reports, formatting content. These become composable building blocks.
So I hope that is something that has practical utility even for users like yourself who don’t see a role for plain language prompting in automation per se.
In practice this is a way to add composable AI-based tooling into scripts.
Many people are concerned about (or outright opposed to) the use of AI coding tools. I get that this will not be useful for them. Many folks like myself find tools like Claude helpful, and this just makes it easier to use them in automation pipelines.
My view is that readability and ease of understanding have a real impact on auditability. Nondeterministic output also clearly has a significant impact on auditability.
The balance between readability and determinism for auditability partly relates to developer philosophy. Tech is famous for religious arguments. I have friends who hate AI coding, and want to avoid nondeterminstic tools at all costs. And other friends whose productivity has increased significantly, and who see the future of programming as natural language.
The quality of AI models and tools like Claude Code is improving fast, and there are many developers who find value in them, myself included. I built this to make life easier for developers who want to use AI tools for automation.
I find it much faster to parse and understand plain language than many code scripts I've seen. It was one of Python's great insights that people spend more time reading code than writing it. And there is a tradeoff in auditability between determinism and the ability to quickly read and understand what systems do.
There are clearly many people who find AI useful, and who are becoming skilled in its use as a tool. This is just a little tool that I put together for myself and other people who fall in that basket.
Learning where to use AI tools appropriately - how to constrain the dangers while maximizing the value - is part of the challenge. From using this particular tool for real work, it fits some use cases well, and can make things easier both to understand and share, as well as to write.
I hope it's useful for some other people wanting to use AI for scripting and automation.
I think that quickly understandable instructions are part of auditability. Not the whole thing, and their use needs to be balanced with safety and security. But an important part of it.
I accept there are plenty of folks who don't see AI tools that way. We're sharing this for people who see the value in this new approach, even though it is a fast-moving field and there are a lot of imperfections.
Any reasonably competent Claude Code user who is careful about setting permissions boundaries is no more going to delete their hard drive than a competent command line user would. There will be things that go wrong with AI, as before it.
In years of tech support, I've personally had to help people who neutered their Windows install or deleted files they needed. Those things happen and I'd argue they come down to skill issues, with AI or without. New tools have a learning curve.
I get that you think that's bizarre to see readability with AI-based tools as more auditable, and I really do understand that perspective.
- Lets you make regular Markdown files directly executable using shebang line.
- It keeps the Markdown itself clean and standard rather than using variable placeholders or any kind of special syntax.
- Includes support for session isolation
- Allows you to keep script use separate from your regular Claude Code subscription, by allowing you to specify the provider cloud / model in scripts, or switch them on the fly.
Another commenter suggested a custom format for executable llm scripts, which looks like the direction mdflow takes.
Using claude-switcher you can also use multiple clouds/keys for billing and failover, and to keep your subscription tokens for interactive or personal use, which I think is also useful.
This is absolutely a new type of nondeterministic tool, so you're spot on there.
One of the key things we realized starting to use it is that the approach allows you to mix deterministic and non-deteministic tools together as part of a composable chain.
So you can, for example, use LLMs for their evaluation capabilities with a natiural language script as part of a broader chain that wraps it in deterministic code, and that also can include and run deterministic code nested within the plain language script.
So it allows us to create pipelines that combine the best of both approaches as appropriate based on the sub-task at hand.
One thing to consider is that the steps in the pipeline can be deterministic (the code executed) while the outputs (summaries, reviews, evaluations, explanations) may be nondeterministic. An example would be summarizing data calculated via a traditional script, and piping it to a report-format markdown script that generates the report and summarizes the results.
I agree that this is a choice by each person using tools like this, and that it is up to each of us as developers whether a tool like this suits the use case at hand.
My own view is that the world is rapidly moving to more human language programming tools, and that system automation and shell scripting will be part of this. There is a wide array of sensible potential use cases I can see between the two polarized views of "never use an LLM' and "let's vibe code system automation".
The markdown format provides a way to clearly format and structure instructions that Claude Code does well understanding, with clear markers for structure/headings, tables and code blocks.
Making a markdown `.md` text file executable with Claude Code is effective in practice because Claude Code can easily understand the content.
Claude Code supports a set of flags to control behaviour such as permissions, and both `--permission-mode bypassPermissions` and `--dangerously-skip-permissions` are examples of those.
The claude-run helper supports passing in those flags supported by Claude Code itself that are relevant to a shell-scripting like context.
It also adds a couple of convenience flags (`--aws`, `--azure`, `--vercel`, `--vertex` for cloud API key use).
I'm not the target demographic, but this seems like a step backwards.
Like, once upon a time maybe you gave your jr programmer a list of things to do, and depending on their skill, familiarity with the cli, hangover status, spelling abilities, etc, you'll get different results. So you write a deterministic shell script.
There’s nothing specific markdown to this. It could just as well be some other markup language, plaintext, or even any other textual language.
You could, for example, put a C program on lines 2 and further and expect/hope/pray Claude to interpret or compile and run that (adding a comment “run the following program; download and compile an interpreter or compiler if needed first” as an instruction to Claude would improve your chances)
Yes, you can use this with any text file and file extension to send file content to Claude Code with unix-like pipe support. Markdown happens to be a format that models like Claude work well with. And they provide a very readable way to mix structured and unstructured content along with code. But I use this with other plain text files regularly.
You could also pass commented code/scripts straight into Claude Code using it quickly without changing how they execute. The prompt instructions could go at the top of a valid file (say python/typescript) as comments, e.g.
`claude-run --azure --opus my_script.py`
https://bellard.org/tcc/tcc-doc.html#:~:text=ab.o.-,Scriptin...
I have developed my personal agentic file format `.ag` for this purpose.
Here is the template I start with:
Thanks for this. I find templates helpful too, and that's a neat structure. I use templates heavily with Obsidian for non-code tasks also. If you want to try it out, you can use this with the claude-run tooling with flags etc with files using your `.ag` extension with the modified shebang.
`#!/usr/bin/env claude-run --permission-mode bypassPermissions`
Or use the .ag files you have unmodified:
`claude-run --opus --vercel task.ag`
Great idea, this reminds me of a Makefile! However, I do dread the cmake version of this that will nevertheless emerge in the next 10 years.
I found this useful.
This could be dinosaur mindset from 2022, but would it not make sense to prompt the LLM to create a bash script based on these instructions, so it could be more deterministic? Claude code is pretty reliable, but this is probably only one and a half nines at best.
As for safety, running this in a devcontainer[1][2] or as part of a CI system should be completely fine.
1. (conventional usage) https://code.visualstudio.com/docs/devcontainers/containers
2. (actual spec) https://containers.dev/
Thank you, and yes! That is what I already frequently do for quick automation tasks.
As you say, Claude is actually very good at writing shell scripts and using tools on-the-fly. But I know there is an AI-confidence factor involved for developers making the choice to leverage that.
For simple tasks (in practice) I already find you can often prompt the whole thing.
For tasks where you already have the other traditional scripts or building blocks, or where it is complex, then you might break it up.
Interestingly, you can intermix these approaches.
You can have runnable markdown that writes and runs scripts on the fly, mixed with running command line tools, and chained along with traditional tools in a bash script, and then call that script from a runnable markdown that passes in test results, or analyzes the code base and passes recommendations in.
The composability and ability to combine and embed code blocks and tool use within plain language is quite powerful. I’m still learning how to use this.
I’m glad it is already useful and thank you.
Also +1 to using containers and sandboxed environments! It means you can yolo it and skip permissions dangerously to experiment with vibe automation :)
More seriously, I agree that setting permissions to the minimum needed for the task and using sandboxed containers is sensible.
IDK man it feels like you are making a less-useful unsafe wheel.
- file types exist for a reason
- this is just prompt engineering which is already easy to do
I agree that script execution safety is a real concern, as it is with AI coding tools generally. By default the runnable markdown files do not have permission to execute code, unless you specifically add those permissions.
I can see there might be valid arguments for enforcing file type associations for execution at the OS level. These are just text files, and Unix-like environments support making text files executable with a shebang as a universal convention.
I am a fan of that unix-like philosophy generally: tools that try to do a single thing well, can be chained together, and allow users to flexibly create automations using plain text. So I tried to stick with that approach for these scripts.
I'm a bear of little brain, and prompt engineering makes my head hurt. So part of the motivation was to be able to save prompts and collections of prompts once I've got them working, and then execute on demand. I think the high readability of markdown as scripts is helpful for creating assets that can be saved, shared and re-used, as they are self-documenting.
As far as I understand, by default your claude-shebang files inherit the permissions that have been previously granted in the current directory you're executing them in.
The ability to execute code is not granted as part of the directory permissions. By default the scripts will not be able to execute code, only run analysis and text gen tasks. You need to explicitly add the flags for permissions to execute code. There is an example of this above and a few more in the repo README.
Why wouldn't Claude Code, called by you, do its normal .claude/settings.local.json processing?
The constraints work consistent with Claude’s -p mode. It is isolated from your regular Claude interactive sessions and settings on purpose. And that makes it safer by default because you have to explicitly add permissions.
You can try this out and you’ll see what I mean if you run a few simple examples. This approach was based on experimentation and trying to be consistent with Claude’s own philosophy here.
Does it help get repeatable results if you say, "Use a random seed of 42 for this task"? Or if you somehow lower the temperature, so it's more deterministic?
At the moment, it looks like Claude Code does not support using ‘temperature’ or ‘seed’ flags. It would be awesome if they add that.
Using the request to use a seed within the prompt will mean that when Claude rights the code it could use that seed inside what it writes for randomize functions. But sadly it wouldn’t impact Claude’s own text generation’s determinism.
There is active interest on GitHub to support this. But the most recent issue with it I could see was closed in July as “not planned”
I think I get the idea... but why not officially create a new file type altogether?
Not only it would avoid any confusion (Markdown wasn't meant to be executable?) but it would allow future extensions in a domain that is moving fast.
The recent incident (https://news.ycombinator.com/item?id=46532075) regarding Claude Code's changelog shows that pure Markdown can break things if it is consumed raw.
Also, regarding: "Detect my OS and architecture, download the right binary from GitHub releases, extract to ~/.local/bin, update my shell config."
I have a hard time seeing how this is "more auditable" than a shell script with hardcoded URLs/paths.
"the right binary" is something that would make me reject an issue from a PM, asking for clarifications because it's way too vague.
But maybe that's why I'll soon get the sack?
Another format is an interesting idea. This tool will work with any text file content and file extension. So you could create another text-based format yourself and use it in theory.
I think the reasons Markdown is appealing include:
- It's just a text file.
- LLMs like Claude have high comprehension of the format, so Claude Code does very well with it.
- You can mix structured and unstructured text, and code with plain language: YAML frontmatter, outline/headings, code blocks, tables, links and images etc.
I used a heavily condensed version of the example prompt that Pete Koomen posted about as a simplified example, so that's really just me cutting it back to the most simple form of the concept.
In real-use it would be detailed, verbose and specific, and include the actual code blocks and external shell script references to retrieve and execute. So this really is just a proof of concept to give an idea of the sort of thing that people could create in future.
I know lots of us developers joke about getting the sack and losing out to AI. But for what it's worth, the sorts of points you raise are exactly why I think skilled developers become even more valuable than ever with AI.
Programming will change massively this next decade. But it has many times even in my life. So I'm definitely in the camp that thinks this is a new programming abstraction level, and Claude Code and Codex and others are useful tools that improve the productivity of skilled coders. Especially when they are used carefully and thoughtfully.
I use .ag and it's YAML-inspired. I have incorporated ideas from Design by contract: https://news.ycombinator.com/item?id=46554477
The YAML-inspired format is clear, and pretty cool actually. You can use those `.ag` files directly with claude-run in the shebang or from the command line unmodified. The shebang gets stripped before the file is passed to Claude Code.
cat StarWars.mkv > claude "Make Leia the hero Jedi warrior, and Luke the handsome prince she rescues" > vlc
Aside from what several others said about having done something similar locally, wouldn't this be a trivial modification to Simon Willison's `llm` wrapper?
Big fan of Simon Willison. It would be great to see support for executing markdown files directly added to other tools like `llm`. And to Claude Code, Codex themselves.
claude-run is just a bunch of little convenience scripts, but for it to work effectively with code execution, the handling needs to do a little more than just `cat` the file output, for example stripping shebang lines, supporting flags and permissions and a few other things. But all very simple if you see the repo.
Adding support for session isolation and support for different cloud providers and API keys to keep things separate from one's personal Claude subscription took a little work. But that is optional.
A few people have already mentioned similar tools here but one worth mentioning is Atuin Desktop (yes, the same shell history Atuin): https://blog.atuin.sh/atuin-desktop-runbooks-that-run/
"Executable runbooks" is the name given to the concept there
Thanks, it’s great to see people trying different approaches to runnable prompts and variations on literate programming. I think it’s an area with a lot of potential, and I expect there will be a lot of interesting ideas come out of it.
Executable markdown? Why not just write a shell script?
There are some tasks that LLMs are good at, but which can be hard to do with traditional command line tools or scripts. This is true even when you are a skilled coder and expert in Shell scripting. Examples include summarization, judgement-based evaluation, formatting etc.
Executable markdown provides a method of building these tasks into traditional pipelines as small, single-task-focused, composable modules. They also have the advantage that they can be easily shared and re-used.
Looks more like executable prompt-files, as there seem to be no extra markdown-handling except removing the shebang. I know AIs are good at handling Markdown-Syntax, but do they support other markup-languages too? So you could use whatever you want here.
Yes you can use this with any text file format and file extension. Markdown just happens to work well with Claude Code and is very readable. But some other comments here mention `.ag` as a nice alternative, and plain text with C code. But you can also use it to send yaml, xml, simple text, or commented code in directly.
The scripts are all pretty simple but they also:
- Handle script-context-relevant flags and control code execution permissions
- Convenience flags for directing scripts to run across cloud providers rather than a personal Claude subscription.
- Session isolation, especially between your regular interactive `claude` command and running with API keys
This means that your runnable script use can be kept isolated from your regular personal Claude environment that you use for interactive development.
Jupyter notebooks also exist. Also see https://en.wikipedia.org/wiki/Literate_programming
I'm also a fan and heavy user of Jupyter notebooks and literate programming in general. I think the use case for runnable mardown files with AI tooling for automation applies to complementary cases.
Having said that, there are ad hoc automation tasks that I've traditionally used Jupyter notebooks to do that I'm finding are easier to get running using markdown files and Claude Code. It's early days and I still am getting a feel for this myself.
There are some comments from earlier with discussion of other literal program tools.
Oh dear... but... but why let some LLM set of unknown source of unknown iteration... execute code... in your machine...?
I was excited in the possibly extravagant implementation idea and... when I read enough to realize it's based on some yet another LLM... Sorry, no, never. You do you.
> but why let some LLM set of unknown source of unknown iteration... execute code... in your machine...?
That’s entirely what Claude Code does.
Roger that. Thank you! Apparently, while I've being employed in security as software engineer for at least 19 years now, I've never ever considered it all serious, and still do not.
Sorry, I have literally no interest in all of it that makes you dependent on it, atrophies mind, degrades research and social skills, and negates self-confidencen with respect to other authors, their work, and attributions. Nor any of my colleagues in military and those I know better in person.
Constant research, general IDEs like JetBrains's, IDA Pro, Sublime Text, VS Code, etc. backed by forums, chats, and Communities, is absolutely enough for the accountable and fun work in our teams, who manage to keep in adequate deadlines.
I just disable it everywhere possible, and will do all my life. The close case to my environment was VS Code, and hopefully there's no reason to build it from source since they still leave built-in options to disable it: https://stackoverflow.com/a/79534407/5113030 (How can I disable GitHub Copilot in VS Code?...)
Isn't it just inadequate to not think and develop your mind, and let alone pass control of your environment to a yet another model or "advanced T9" of unknown source of unknown iteration.
In pentesting, random black-box IO, medicine experimental unverified intel, log data approximation why not? But in environment control, education, art or programming, fine art... No, never ^^
Related: https://www.tomshardware.com/tech-industry/artificial-intell...
You can use this without letting the markdown scripts you write execute any code at all, whether that is via Claude Code or other AI tool in future.
The default permissions are to not allow execution. Which means that you can use the eval and text-generation capabilities of LLMs to perform assessments and evaluations of piped-in content without ever executing code themselves.
The script shebang has to explicitly add the permissions to run code, which you control. It supports the full Claude Code flag model for this.
One silly fun thing about PHP is php tags. So you can do notebooks like this for kinda free... if you want to execute code? just put it in <?php ?> blocks.
Did something similar some time ago: https://gitlab.com/ceving/mdexec
So ... you are letting a nondeterministic LLM operate on the shell, via quasi-shellscript. This will appeal mostly to people who do not have the skillset to write an actual shell-script.
In short, isn't that like giving a voice-controlled scalpel to a random guy on the street an tell them 'just tell it to neurosurgery', and hope it accidentally does the right procedure?
I know this will not appeal to developers who don’t see a legitimate role for the use of AI coding tools with nondeterministic output.
It is intended to be a useful complement to traditional Shell scripting, Python scripting etc. for people who want to add composable AI tooling to their automation pipelines.
I also find that it helps improve the reliability of AI in workflows when you can break down prompts into re-useable single-task-focused modules that leverage LLMs for tasks they are good at (format.md, summarize-logs.md, etc). These can then be chained with traditional Shell scripts and command line tools.
Examples are summarizing reports, formatting content. These become composable building blocks.
So I hope that is something that has practical utility even for users like yourself who don’t see a role for plain language prompting in automation per se.
In practice this is a way to add composable AI-based tooling into scripts.
Many people are concerned about (or outright opposed to) the use of AI coding tools. I get that this will not be useful for them. Many folks like myself find tools like Claude helpful, and this just makes it easier to use them in automation pipelines.
Don’t worry, it’s “more auditable”!
I get the intent, but it’s bizarre to hear invocation of nondeterministic tools that occasionally delete people’s entire drives “more auditable”.
My view is that readability and ease of understanding have a real impact on auditability. Nondeterministic output also clearly has a significant impact on auditability.
The balance between readability and determinism for auditability partly relates to developer philosophy. Tech is famous for religious arguments. I have friends who hate AI coding, and want to avoid nondeterminstic tools at all costs. And other friends whose productivity has increased significantly, and who see the future of programming as natural language.
The quality of AI models and tools like Claude Code is improving fast, and there are many developers who find value in them, myself included. I built this to make life easier for developers who want to use AI tools for automation.
I find it much faster to parse and understand plain language than many code scripts I've seen. It was one of Python's great insights that people spend more time reading code than writing it. And there is a tradeoff in auditability between determinism and the ability to quickly read and understand what systems do.
There are clearly many people who find AI useful, and who are becoming skilled in its use as a tool. This is just a little tool that I put together for myself and other people who fall in that basket.
Learning where to use AI tools appropriately - how to constrain the dangers while maximizing the value - is part of the challenge. From using this particular tool for real work, it fits some use cases well, and can make things easier both to understand and share, as well as to write.
I hope it's useful for some other people wanting to use AI for scripting and automation.
I think that quickly understandable instructions are part of auditability. Not the whole thing, and their use needs to be balanced with safety and security. But an important part of it.
I accept there are plenty of folks who don't see AI tools that way. We're sharing this for people who see the value in this new approach, even though it is a fast-moving field and there are a lot of imperfections.
Any reasonably competent Claude Code user who is careful about setting permissions boundaries is no more going to delete their hard drive than a competent command line user would. There will be things that go wrong with AI, as before it.
In years of tech support, I've personally had to help people who neutered their Windows install or deleted files they needed. Those things happen and I'd argue they come down to skill issues, with AI or without. New tools have a learning curve.
I get that you think that's bizarre to see readability with AI-based tools as more auditable, and I really do understand that perspective.
A shitty org-mode reinvention, now without being reproducible.
Ive been doing this for a couple weeks
Why not MdFlow? https://github.com/johnlindquist/mdflow
Thanks, I hadn't seen that. This tool:
- Lets you make regular Markdown files directly executable using shebang line.
- It keeps the Markdown itself clean and standard rather than using variable placeholders or any kind of special syntax.
- Includes support for session isolation
- Allows you to keep script use separate from your regular Claude Code subscription, by allowing you to specify the provider cloud / model in scripts, or switch them on the fly.
Another commenter suggested a custom format for executable llm scripts, which looks like the direction mdflow takes.
Using claude-switcher you can also use multiple clouds/keys for billing and failover, and to keep your subscription tokens for interactive or personal use, which I think is also useful.
1)What…
…could possibly go wrong?
Internal screaming intensifies
lmao nondeterministic shell scripting
This is absolutely a new type of nondeterministic tool, so you're spot on there.
One of the key things we realized starting to use it is that the approach allows you to mix deterministic and non-deteministic tools together as part of a composable chain.
So you can, for example, use LLMs for their evaluation capabilities with a natiural language script as part of a broader chain that wraps it in deterministic code, and that also can include and run deterministic code nested within the plain language script.
So it allows us to create pipelines that combine the best of both approaches as appropriate based on the sub-task at hand.
If you mix deterministic and nondeterministic, then the result is nondeterministic.
Which means your entire pipeline is tainted.
If your process is fine with that, whatever, but don't pretend that the result can be controlled.
One thing to consider is that the steps in the pipeline can be deterministic (the code executed) while the outputs (summaries, reviews, evaluations, explanations) may be nondeterministic. An example would be summarizing data calculated via a traditional script, and piping it to a report-format markdown script that generates the report and summarizes the results.
I agree that this is a choice by each person using tools like this, and that it is up to each of us as developers whether a tool like this suits the use case at hand.
My own view is that the world is rapidly moving to more human language programming tools, and that system automation and shell scripting will be part of this. There is a wide array of sensible potential use cases I can see between the two polarized views of "never use an LLM' and "let's vibe code system automation".
If the code executed is deterministic - then so is the output.
If your output is in any way nondeterministic, then so is the code executed. Fin. No nuance to mathematically be had.
Randomness is nothing new. Various algorithms have always been non-deterministic. Randomness is in most standard libraries.
My problem here is not with what you're doing - but that you're presenting as if you do not understand what you're doing.
Exactly. The above is a terrible idea.
I guess these so called “developers” these days did not ever think about why this is needed. Ever.
The “senior/staff” engineers of 2025 are now at the same knowledge level of juniors in 2015 or were not at all “senior” to begin with ideas like this.
What does any of this have to do with Markdown?
The markdown format provides a way to clearly format and structure instructions that Claude Code does well understanding, with clear markers for structure/headings, tables and code blocks.
Making a markdown `.md` text file executable with Claude Code is effective in practice because Claude Code can easily understand the content.
There's something similar in concept here: https://runme.dev/
error: cannot decode raw data
(iOS Safari)
And we're officially going down the drain with linux / command line:
``` #!/usr/bin/env claude-run --permission-mode bypassPermissions ```
Claude Code supports a set of flags to control behaviour such as permissions, and both `--permission-mode bypassPermissions` and `--dangerously-skip-permissions` are examples of those.
The claude-run helper supports passing in those flags supported by Claude Code itself that are relevant to a shell-scripting like context.
It also adds a couple of convenience flags (`--aws`, `--azure`, `--vercel`, `--vertex` for cloud API key use).