There's an older pure Python version but it's no longer maintained - the author of that recently replaced it with a Python library wrapping the C# code.
This looks to me like the perfect opportunity for a language-independent conformance suite - a set of tests defined as data files that can be shared across multiple implementations.
This would not only guarantee that the existing C# and TypeScript implementations behaved exactly the same way, but would also make it much easier to build and then maintain more implementations across other languages.
That new Python library is https://pypi.org/project/fractured-json/ but it's a wrapper around the C# library and says "You must install a valid .NET runtime" - that makes it mostly a non-starter as a dependency for other Python projects because it breaks the ability to "pip install" them without a significant extra step.
And OK it's not equivalent to a formal proof, but passing 1,000+ tests that cover every aspect of the specification is pretty close from a practical perspective, especially for a visual formatting tool.
UC Berkeley: “Top-level functional equivalence requires that, for any possible set of inputs x, the two pieces of code produce the same output. … testing, or input-output (I/O) equivalence, is the default correctness metric used by the community. … It is infeasible to guarantee full top-level functional equivalence (i.e., equivalence for any value of x) with testing since this would require testing on a number of inputs so large as to be
practically infinite.”
In practice mutation fuzz testers are able to whitebox see where branches are in the underlying code, with a differential fuzz test under that approach its generally able to fuzz over test cases that go over all branches.
So I think under some computer science theory case for arbitrary functions its not possible, but for the actual shape of behavior in question from this library I think its realistic that a decent corpus of 'real' examples and then differential fuzzing would give you more confidence that anyone has in nearly any program's correctness here on real Earth.
Yes, there are different levels of sureness being described.
When I hear guarantee, it makes me think of correctness proofs.
Confidence is more of a practical notion for how much you trust the system for a given use case. Testing can definitely provide confidence in this scenario.
You can guarantee that all the cases in the code are tested. That doesn't necessarily mean that all the behaviour is tested. If two implementations use very different approaches, which happen to have different behaviour on the Mersenne primes (for deep mathematical reasons), but one of them special-cases byte values using a lookup table generated from the other, you wouldn't expect mutation testing to catch the discrepancy. Each implementation is still the local optimum as far as passing tests is concerned, and the mutation test harness wouldn't know that "disable the small integer cache" is the kind of mutation that shouldn't affect whether tests pass.
There are only 8 32-bit Mersenne primes, 4 of which are byte-valued. Fuzzing might catch the bug, if it happened to hit one of the four other 32-bit Mersenne primes (which, in many fuzzers, is more likely than a uniform distribution would suggest), but I'm sure you can imagine situations where it wouldn't.
> but one of them special-cases byte values using a lookup table generated from the other, you wouldn't expect mutation testing to catch the discrepancy
Sure you would. If the mutation tester mutates that lookup table. Which is quite easy to do, and which mutmut will do (if that lookup table is inside a function, because mutmut is based on mutant schemata).
If the mutation tester mutates that lookup table, then that will eventually lead to all entries in the lookup table being tested. That does not mean that the four divergent values outside the lookup table will end up being tested.
I think if you hit full path coverage in each of them independently and run all the cases through both and check they're consistent you're still done.
Or branch coverage for the lesser version, the idea is still to generate interesting cases based on each implementation, not based solely on one of them.
If the buggy implementation relies indirectly on the assumption that 2^n - 1 is composite, by performing a calculation that's only valid for composite values on a prime value, there won't be a separate path for the failing case. If the Mersenne numbers don't affect flow control in a special way in either implementation, there's no reason for the path coverage heuristic to produce a case that distinguishes the implementations.
Well yeah, but then any discrepancies that are found can be discussed (to decide which of the behaviors is the expected one) and then added as a test for all existing and future implementations.
> $ fjson --help
Rust port of FracturedJsonJs: human-friendly JSON formatter with optional comment support.
Usage: fjson [OPTIONS] [FILE]...
Arguments:
[FILE]... Input file(s). If not specified, reads from stdin
Options:
-o, --output <FILE>
Output file. If not specified, writes to stdout
-c, --compact
Minify output (remove all whitespace)
-w, --max-width <MAX_WIDTH>
Maximum line length before wrapping [default: 120]
-i, --indent <INDENT>
Number of spaces per indentation level [default: 4]
-t, --tabs
Use tabs instead of spaces for indentation
--eol <EOL>
Line ending style [default: lf] [possible values: lf, crlf]
--comments <COMMENTS>
How to handle comments in input [default: error] [possible values: error, remove, preserve]
--trailing-commas
Allow trailing commas in input
--preserve-blanks
Preserve blank lines from input
--number-align <NUMBER_ALIGN>
Number alignment style in arrays [default: decimal] [possible values: left, right, decimal, normalize]
--max-inline-complexity <MAX_INLINE_COMPLEXITY>
Maximum nesting depth for inline formatting (-1 to disable) [default: 2]
--max-table-complexity <MAX_TABLE_COMPLEXITY>
Maximum nesting depth for table formatting (-1 to disable) [default: 2]
--simple-bracket-padding
Add padding inside brackets for simple arrays/objects
--no-nested-bracket-padding
Disable padding inside brackets for nested arrays/objects
-h, --help
Print help
-V, --version
Print version
This is very interesting, though the limitations for 'security' reasons seem somewhat surprising to me compared to the claim "Anything JSON can do, it can do. Anything JSON can't do, it can't do.".
Simplest example, "a\u0000b" is a perfectly valid and in-bounds JSON string that valid JSON data sets may have in it. Doesn't it end up falling short of 'Anything JSON can do, it can do" to refuse to serialize that string?
"a\u0000b" ("a" followed by a vertical tabulation control code) is also a perfectly valid and in-bounds BONJSON string. What BONJSON rejects is any invalid UTF-8 sequences, which shouldn't even be present in the data to begin with.
My example was a three character string where the second one is \u0000, which is the NUL character in the middle of the string.
The spec on the GitHub says that it is banned to include NUL under a security stance, that someone that after parse someone might do strlen and accidentally truncate to a shorter string in C.
Which I think has some premise, but its a valid string contents in JSON (and in Utf8), so it is deliberately breaking 1:1 parity with JSON parity in the name of a security hypothetical.
So I think it's a very neat format, but my feedback as a random person on the Internet is that I don't think it does uphold the claimed vision in the end of being 1:1 to JSON (the security parts, but also you do end up adding extra types too) and that's a bit of a shame compared to the top line deliverable.
Just focusing narrowly on the \0 part to explain why I say so: the spec proposed is that implementations have to either hard ban embedded \0 or disallow by default with an opt in. So someone comes with a dataset that has it, they can get support in this case only if they configure both the serializer and parser to allow it. But if you're willing to exert that level of special case extra control, I think all of the other preexisting binary-json implementations that exist do meet the top line definition you are setting as well. For some binary-json implementation which has additional types, if someone is in full end to end control to special case, then they could just choose not to use those types too, the mere existence of extra types in the binary format is no extra "problem" for 1:1 than this choice.
IMO the deliverable that a 1:1 mapping would give us "there is no bonjson data that won't losslessly round trip to JSON and vice versa". The benefit is when it is over all future data that you haven't seen yet, where the downside of using something that is not bijective is that you run for a long time suddenly you have data dependent failures in your system because you can't 1:1 map legal data.
And especially with this guarantee, what will inevitably happen is some downstream handling will also take as a given that they can strlen() since they "knew" the bonjson format spec banned it, so suddenly when you have it as in-bounds data you also won't be able to trivially flip the switch, instead you are stuck with legal JSON that you can't ingest in your system without an expensive audit because the reduction from 1:1 gets entrenched as an invariant into the handling code.
Note that my vantage point might be a bit skewed here: I work on Protobuf and this shape of ecosystem interoperability topics are top of mind for me in ways that they don't necessarily need to be for small projects, and I also recognize that "what even is legal JSON" itself is not actually completely clear, so take it all with a grain of salt (and again, I also do think it looks like a very nice encoding in general).
Oh yes, I do understand what you're getting at. I'm willing to go a little off-script in order to make things safer. The NUL thing can be configured away if needed, but requires a conscious decision to do so.
Friction? yeah, but that's just how it's gonna be.
For the invalid Unicode and duplicate key handling, I'll offer no quarter. The needs of the many outweigh the needs of the few.
Can you tell me what was the context that lead you to create this?
Unrelated JSON experience:
I worked on a serializer which save/load json files as well as binary file (using a common interface).
From my own use case I found JSON to be restrictive for no benefit (because I don't use it in a Javascript ecosystem)
So I change the json format into something way more lax (optional comma, optional colon, optional quotes, multi line string, comments).
I wish we would stop pretending JSON to be a good human-readable format outside of where it make sense and we would have a standard alternative for those non-json-centric case.
I know a lot of format already exists but none really took off so far.
Basically, for better or worse JSON is here to stay. It exists in all standard libraries. Swift's codec system revolves around it (it only handles types that are compatible with JSON).
It sucks, but we're stuck with JSON. So the idea here is to make it suck a little less by stopping all this insane text processing for data that never ever meets a human directly.
The progression I envisage is:
1. Dev reaches for JSON because it's easy and ubiquitous.
2. Dev switches to BONJSON because it's more efficient and requires no changes to their code other than changing the codec library.
3. Dev switches to a sane format after the complexity of their app reaches a certain level where a substantial code change is warranted.
I'm in the JS ecosystem pretty regularly and "restrictive with no benefit" is the right description. I use JSON5 now when I have to, which greatly reduces the restrictions. I already have a build step so throwing in a JSON5 -> JSON converter is negligible.
As for FracturedJson, it looks great. The basic problem statement of "either minified and unreadable or prettified and verbose" isn't one I had put my finger on before, but now that it's been said I can't unsee it.
Have you heard of EDN? It's mostly used in Clojure and ClojureScript, as it is to Clojure what JSON is to JS.
If you need custom data types, you can use tagged elements, but that requires you to have functions registered to convert the data type to/from representable values (often strings).
It natively supports quite a bit more than JSON does, without writing custom data readers/writers.
Another thing to possibly consider would be ASN.1 (you can also use the nonstandard extensions that I made up, called ASN.1X, if you want some of the additional types I included such as a key/value list). (You are not required to implement or use all of the types or other features of ASN.1 in your programs; only use the parts that you use for your specific application.) Unlike EDN, ASN.1 has a proper byte string type, it is not limited to Unicode, it has a clearly defined canonical form (DER, which is probably the best format (and is the format used by X.509 certificates); BER is too messy), etc. DER is a binary format (and the consistent framing of different types in DER makes it easier to implement and work with than the formats that use inconsistent framing, although that also makes it less compact); I made up a text format called TER, which is intended to be converted to DER.
That's neat, but I'm much more intrigued by your Concise Encoding project[1]. I see that it only has a single Go reference implementation that hasn't been updated in 3 years. Is the project still relevant?
I'm actually having second thoughts with Concise Encoding. It's gotten very big with all the features it has, which makes it less likely to be adopted (people don't like new things).
I use ASN.1X, so I use some types that those other formats do not have. Some of the types of ASN.1 are: unordered set, ISO 2022 string, object identifier, bit string. I added some additional types into ASN.1X, such as: TRON string, rational numbers, key/value list (with any types for keys and for values (and the types of keys do not necessarily have to match); for one thing, keys do not have to be Unicode), and reference to other nodes. However, ASN.1 (and ASN.1X) does not distinguish between qNaN and sNaN. I had also made up TER, which is a text format that can be converted to DER (like how ORT can be converted to ORB, although its working is differently, and is not compatible with JSON (TER somewhat resembles PostScript)).
Your extensions of JSON with comments, hexadecimal notation, optional commas, etc is useful though (my own program to convert JSON to DER does treat commas as spaces, although that is an implementation detail).
Probably, we need a formal data format, because JSON is just a notation. It does not mandate the bit width of numbers, for example, or whether ints are different from floats. Once there is such formal model, we can map it 1:1 between representations.
I think JSON is too limited and has some problems, so BONJSON has mostly the same problems. There are many other formats as well, some of which add additional types beyond JSON and some don't. Also, a few programs may expect (and possibly require) that files may contain invalid UTF-8, even though it is not proper JSON (I think it would be better that they should not use JSON, due to this and other issues), so there is that too. Using normalized Unicode has its own problems, as does allowing 64-bit integers when some programs expect it and others don't. JSON and Unicode are just not good formats, in general. (There is also a issue with JSON.stringify(-0) but that is an issue with JavaScript that does not seem to be relevant with BONJSON, as far as I can tell.)
Nevertheless, I believe your claims are mostly accurate, except for a few issues with which things are allowed or not allowed, due to JavaScript and other things (although in some of these cases, the BONJSON specification allows options to control this). Sometimes rejecting certain things is helpful, but not always; for example sometimes you do want to allow mismatched surrogates, and sometimes you might want to allow null characters. (The defaults are probably reasonable, but are often the result of a bad design anyways, as I had mentioned above.) Also, the top of the specification says it is safe against many attacks, but these are a feature of the implementation, which would also be the case if you are implement JSON or other formats (although the specification for BONJSON does specify that implementations are supposed to check for these things to make them safe).
(The issue of overlong UTF-8 encodings in IIS web servers is another security issue, which is using a different format for validation and for usage. In this case there are actually two usages though, because one of these usages is the handling of relative URLs (using the ASCII format) and the other is the handling of file names on the server (which might be using UTF-16 here; in addition to that is the internal format of the file paths into individual pieces with the internal handling of relative file paths). There are reasons to avoid and to check for overlong UTF-8 encodings, although this is a different more general issue than the character encoding.)
Another issue is canonical forms; the canonical form of JSON can be messy, especially for numbers (I don't know what the canonical form for numbers in JSON is, but I read that apparently it is complicated).
I think DER is better. BONJSON is more compact but that also makes the framing more complicated to handle than DER (which uses consistent framing for all types). I also wrote a program to convert JSON to DER (I also made up some nonstandard types, although the conversion from JSON to DER only uses one of these nonstandard types (key/value list); the other types it needs are standard ASN.1 types). Furthermore, DER is already canonical form (and I had made up SDER and SDSER for when you do not want canonical form but also do not want the messiness of BER; SDSER does have chunking and does not require the length to be known ahead of time, so more like BONJSON in these ways). Because of the consistent framing, you can easily ignore any types that you do not use; even though there are many types you do not necessarily need all of them.
Yup, and that's perfectly valid. I'm OK with BONJSON not fitting everyone's use case. For me, safety is by far more important than edge cases for systems that require bad data representations. Anyone who needs unsafe things can just stick with JSON (or fix the underlying problems that led to these requirements).
Safe, sane defaults, and some configurability for people who (hopefully) know what they're doing. Falling into success rather than falling into failure.
It's not the end-all-be-all of data formats; it's just here to make the JSON pipeline suck less.
JSON implementations can be made just as safe, but the issue is that unsafe JSON implementations are still considered valid implementations (and so almost all JSON implementations are unsafe because nobody is an authority on which design is correct). Mandating safety and consistency within the spec is a MAJOR help towards raising the safety of all implementations and avoiding these security vulnerabilities in your infrastructure.
> Safe, sane defaults, and some configurability for people who (hopefully) know what they're doing.
Yes, I agree (if you want to use it at all, which as I have mentioned you should consider if you should not use JSON or something related), although some of the things that you specify as not having options will make it more restrictive than JSON will be, even if those restrictions might be reasonable by default. One of these is mismatched surrogates (although matched surrogates should always be disallowed, an option to allow mismatched surrogates should be permitted (but not required)). Also, I think checking for duplicate names probably should not use normalized Unicode. Furthermore, the part that says that names MUST NOT be null seems redundant to me, since it already says that names MUST be strings (for compatibility with JSON) and null is not a string.
> Mandating safety and consistency within the spec is a MAJOR help towards raising the safety of all implementations and avoiding these security vulnerabilities in your infrastructure.
OK, this is a valid point, although there is still the possibility of incorrect implementations (adding test cases would help with that problem, though).
RCL (https://github.com/ruuda/rcl) pretty-prints its output by default. Pipe to `rcl e` to pretty-print RCL (which has slightly lighter key-value syntax, good if you only want to inspect it), while `rcl je` produces json output.
It doesn’t align tables like FracturedJson, but it does format values on a single line where possible. The pretty printer is based on the classic A Prettier Printer by Philip Wadler; the algorithm is quite elegant. Any value will be formatted wide if it fits the target width, otherwise tall.
And BTW, thanks for supporting comments - the reason given for keeping comments out of standard Json is silly ( "they would be used for parsing directives" ).
It's a pretty sensible policy, really. Corollary to Hyrum's Law - do not permit your API to have any behaviours, useful or otherwise, which someone might depend on but which aren't part of your design goals. For programmers in particular, who are sodding munchkins and cannot be trusted not to do something clever but unintended just because it solves a problem for them, that means aggressively hamstringing everything.
A flathead screwdriver should bend like rubber if someone tries to use it as a prybar.
JSON is used as config files and static resources all the time. These type of files really need comments. Preventing comments in JSON is punishing the wide majority to prevent a small minority from doing something stupid. But stupid gonna stupid, it's just condescending from Mister JSON to think he can do anything about it.
> A flathead screwdriver should bend like rubber if someone tries to use it as a prybar.
While I admire his design goals, people will just work around it in a pinch by adding a "comment" or "_comment" or "_comment_${random_uuid}", simply because they want to do the job they need.
If your screwdriver bends like a rubber when prying, damn it, I'll just put a screw next to it, so it thinks it is used for driving screws and thus behaves correctly.
On one hand, it has made json more ubiquitous due to it's frozen state. On another hand, it forces everyone to move to something else and fragments progress. It would be much easier for people to move to json 2.0 rather than having hundreds of json + x standards. Everyone is just reinventing json with their own little twist that I feel sad that we haven't standardized to a single solution that doesn't go super crazy like xml.
I don't disagree with the choice, but seeing how things turned out I can't just help but look at the greener grass on the other side.
"do not permit your API to have any behaviours, useful or otherwise, which someone might depend on but which aren't part of your design goals"
I can not follow this law by making my API depend, say, the contents of a string value.
Preventing APIs depending on the value of a comment is no different, so your argument is not a reason for not having comments.
I was talking about the parent comment, which has spaces inside the parenthesis (I do prefer no spaces inside brackets and braces in my JSONs, but that’s another story).
This is pretty cool, but I hope it isn't used for human-readable config files. TOML/YAML are better options for that. Git diff also can be tricky with realignment, etc.
I can see potential usefulness of this is in debug mode APIs, where somehow comments are sent as well and are rendered nicely. Especially useful in game dev jsons.
Yeah, but it's a fun slogan. My real peeve is constantly getting the spaces wrong and no tooling to compensete for its warts. If there were linters and test frameworks and unit tests etc for yaml, I'd just sigh and move on. But current situation is, for instance in ADO Yaml: "So it's time to cut a release and time is short - we have a surprise for you! This will make some condition go true which triggers something not tested up till now, you will now randomly commit shit on the release branch until it builds again."
Stuff that would have been structurally impossible in XML will happen in yaml. And I don't even like XML.
I think it would be better to require quotation marks around all string values, in order to avoid this kind of problems. (It is not the only problem with YAML, but it is my opinion of how any format with multiple types should require explicitly mentioning if it is a string type, but YAML (and some other formats) doesn't.) (If keys are required to strings, then it can be reasonable to allow keys to be unquoted if the set of characters that unquoted keys can contain is restricted (and disallowing unquoted empty strings as keys).)
It might be that there’s some setting that fixes this or some better library that everyone should be switching to, but YAML has nothing that I want and has been a repeated source of footguns, so I haven’t found it worth looking into. (I am vaguely aware that different tools do configure YAML parsing with different defaults, which is actually worse. It’s another layer of complexity on an already unnecessarily complex base language.)
The 1.1 spec was released about _twenty_ years ago, I explicitly used the word _implemented_ for a reason. As in: Our Yaml lib vendor had begun officially supporting that version more than ten years ago.
1.1 partially fixed it, so that strings (quoted ”no”) did not become Boolean false. 1.2 strengthened it to remove unquoted no from list of tokens which could be interpreted as Boolean false.
Because there's a metric ton of software out there that was built once upon a time and then that bit was never updated. I've seen this issue out in the wild across more industries than I can count.
I’m not here clanking down on Java for lacking Lambda features, the problem is that I did not update my Java environment past the 2014 version, not a problem with Java.
I think this mixes up two separate things. If you're working with Java, it's conceivable that you could probably update with some effort. If you're an aerospace engineer using software that was certified decades ago for an exorbitant amount of money, it's never going to happen. Swap for nearly any industry of your liking, since most of the world runs on legacy software by definition. A very large number of people running into issues like these are not in a position where they could solve the problem even if they wanted to.
That’s about 99% of the argument I am making. The problem is legacy software and bad certification workflows, not the software being used.
If I’m working with Java it’s indeed conceivable that I could update with some effort.
If I’m working with Node it’s conceivable that I could update with some effort.
If I working with YAML is it not conceivable that I could update with some effort?
PHP is stupid because version 3 did not support object oriented programming.
CSS is bad because version 2 did not support grid layouts or flexbox.
Why should I critique on these based on something that they have fixed a long time ago instead of working on updating to the version which contain the fix I am complaining about?
There is a gradient limit where the onus shifts squarely to one side once the spec has changed and a number of libraries have begun supporting the new spec.
I made a silly groovy script called "mommyjson" that doesn't try to preserve JSON formatting but just focuses on giving you the parentage (thus the name) including array indexes, object names, etc., all on the same line, so that when you find something, you know exactly where it is semantically. Not gonna claim that everybody should use it or that it cures insomnia cancer & hangnails, but feel free to borrow it:
This is good! There are a number of these, so it seems like it's definitely somthing people want. The most popular of which I think is gron[0]. My own is jstream[1]. One tiny point of friendly feedback: you may want to consider adding an example usage/output so folks can see what it does literally.
Is JSON a format that needs improvement for human readability? I think there are much better ways to present data to users, and JSON is a format that should be used to transfer data from system to system.
If you're reaching for a tool like this, it's because you don't have a well-defined schema and corresponding dedicated visualization; you're looking at some arbitrary internal or insufficiently-documented transfer-level data with nested structure, perhaps in the midst of a debug breakpoint, and need a quick and concise visualization without the ability (or time) to add substantial code into the running runtime. Especially if you're working on integration code with third parties, it's common to come across this situation daily.
I think yes? I fairly often find that I have something in JSON, which probably is from some system to system comms, and I'm trying to read it. Once it's not trivially small I often pipe it through jq or python -m json.tool or whatever, I like the idea of something that just does a better job of that.
If you discard the human-readability component of it, JSON is an incredibly inefficient choice of encoding. Other than its ubiquity, you should only be using JSON because it’s both human and machine readable (and being human-readable is mainly valuable for debugging)
This is interesting.
I’d very much like to see a code formatter do that kind of thing; currently formatters are pretty much inflexible, which makes getting structure out of a formatted code sometimes hard.
I just built a C++ formatter that does this (owned by my employee, unfortunately). There's really only two formatting objects: tab-aligned tables, and single line rows. Both objects also support a right-floating column/tab aligned "//" comment.
Both objects desugar to a sequence of segments (lines).
The result is that you can freely mix expression/assignment blocks & statements. Things like switch-case blocks & macro tables are suddenly trivial to format in 2d.
Because comments are handled as right floating, all comments nicely align.
I vibe coded the base layer in an hour. I'm using with autogenerated code, so output is manually coded based on my input. The tricky bit would be "discovering" tables & block. I'd jus use a combo of an LSP and direct observation of sequential statements.
Right. In my previous work, I wrote a custom XML formatter for making it look table-like which was our use case. Of course, an ideal solution would have been to move away from XML, but can't run away from legacy.
I have a JSON formatter called Virtuous (https://marketplace.visualstudio.com/items?itemName=karyfoun...) and till now I thought that is the best way to format JSON, and I most confess that I'll throw away my own formatter in favor of this one. What a great job.
What I like about fractured json is the middle ground between too-sparse pretty printing, and too-compact non-pretty printing, nu doesn't give me that by default.
One thing that neither fractured json nor nushell gives me, which I'd like, is the ability to associate an annotation with a particular datum, convert to json, convert back to the first language, and have that comment still be attached to that datum. Of course the intermediate json would need to have some extra fields to carry the annotations, which would be fine.
I really like this, I think I'd find it useful fairly often and I like the idea of just making something that I use irregularly but not that rarely a bit better.
But then I found it's in C#. And apparently the CLI app isn't even published any more (apparently nobody wanted it? Surprises me but ok). Anyway, I don't think I want this enough to install .NET to get it, so that's that. But I'd have liked a version in Go or Rust or whatever.
I'm the maintainer of FracturedJson. The decision to stop publishing a binary for the CLI version was made a long time ago: fewer features, fewer users, less mature .NET tooling (as far as I knew). And as you say, .NET isn't a common language for distributing CLI tools.
I plan to take a new look at that when I have the time. But a port to a more CLI-friendly platform could probably do a better job.
These JSON files are actually readable, congrats.
I’m wondering whether this could be handled via an additional attached file instead. For example, I could have mycomplexdata.json and an accompanying mycomplexdata.jsonfranc. When the file is opened in the IDE, the IDE would merge the two automatically.
That way, the original JSON file stays clean and isn’t polluted with extra data.
I had to do a double take on the repo author here :)
this tool also looks super useful, I spend so much time at work looking at json logs that this will surely come in handy. It’s the kind of thing I didn’t even know I needed, but now that I saw it it makes perfect sense.
I like this idea a lot. Currently the biggest issue for adoption seems to be the missing packages for most programming languages, and for homebrew/etc.
It should even be possible to compile the dotnet library to a C-compatible shared library and provide packages for many other languages.
Works right up until you get an entity where the field `comments` is suddenly relevant and then you need to go change everything everywhere. Much better to use the right tool for the job, if you want JSONC, be explicit and use JSONC.
Surely it could be suffixed or keyed with a less likely collision target than this very simplistic example. I suppose JSONC and similar exist, although they are rarely used in the wild in contrast to actual JSON usage, compatibility is important.
Personally, I think if your JSON needs comments then it's probably for config or something the user is expected to edit themselves, and at that point you have better options than plain JSON and adding commentary to the actual payload.
If it's purely for machine consumption then I suspect you might be describing a schema and there are also tools for that.
idk... "ans: 42 // an old reference from DA API" seems easier to read than wasting 4 lines of yours
multiply that for a long file... it takes a toll
---
also sometimes one field contains a lot of separate data (because it's straight up easier to deserialize into a single std::vector and then do stuff) - so you need comments between data points
This looks very readable. The one example I didn't like is the expanded one where it expanded all but 1 of the elements. I feel like that should be an all or norhing thing, but there's bound to be edge cases.
I could see FracturedJson being great for a browser extension or preview extension in IDE, as in just for _viewing_ JSON in a formatted way, not formatting the source.
How many times do you actually need to look at large JSONs? The cost of readability is too high, IMO.
Personally, I don't spend much time looking at complex JSON; a binary format like Protobuf along with a typed DSL is often what you need. You can still derive JSON from Proto if you need that. In return, you get faster transport and type safety.
Also, on another note, tools like jq are so ubiquitous that any format that isn't directly supported by jq will have a really hard time seeing mass adoption.
Love the spirit, but the attack-plans example IMO looks worse with this formatting. I don’t love the horizontal scrolling through properties of an object.
I dont know if you spend a fraction of your life scrolling vertically through megabyte sizes json files, but if something can reduce the height of the file thats welcome. We dont need to read every single line fro left to right, we just need to quickly browse through the entire file. If a line in this format is longer than fits the screen, its likely we dont need to know whats in the cut off right corner anyway.
Gigabytes even (people do the silliest things). But ‘find’ gets me there 90% of the time, and at that point the amount of vertical scrolling isn’t really any different than in a 2kb file.
Nice... I like using JSON to stdout for logging, this would be a nice formatting option when doing local dev to prettify it without full decomposition.
I tokenized these and they seem to use around 20% less tokens than the original JSONs. Which makes me think a schema like this might optimize latency and costs in constrained LLM decoding.
I know that LLMs are very familiar with JSON, and choosing uncommon schemas just to reduce tokens hurts semantic performance. But a schema that is sufficiently JSON-like probably won't disrupt model path/patterns that much and prevent unintended bias.
Yeah, but I tried switching to minified JSON on a semantic labelling task and saw a ~5% accuracy drop.
I suspect this happened because most of the pre-training corpus was pretty-printed JSON, and the LLM was forced to derail from likely path and also lost all "visual cues" of nesting depth.
This might happen here too, but maybe to a lesser extent. Anyways, I'll stop building castles in the air now and try it sometime.
if you really care about structured output switch to XML. much better results, which is why all AI providers tend to use pseudo-xml in their system prompts and tool definitions
The trouble with yaml is that it's too hard to keep track of how indented something is if its parent is off the screen. I have to keep a t-square on my desk and hang it from the top of my monitor whenever this comes up.
That, and the fact that it has enough bells and whistles to that there are yaml parser exploits out there.
That is remarkable. I recently implemented this very functionality in Python using roughly 200 lines of code, completely unaware that a pre-built library was available.
Give https://rcl-lang.org/#intuitive-json-queries a try! It can fill a similar role, but the syntax is very similar to Python/TypeScript/Rust, so you don’t need an LLM to write the query for you.
The issue isn’t jq’s syntax. It’s that I already use other tools that fill that niche and have done since as long as jq has been a thing. And frankly, I personally believe the other tools are superior so I don’t want to fallback to jq just because someone on HN tells me to.
It looks like there are two maintained implementations of this at the moment - one in C# https://github.com/j-brooke/FracturedJson/wiki/.NET-Library and another in TypeScript/JavaScript https://github.com/j-brooke/FracturedJsonJs. They each have their own test suite.
There's an older pure Python version but it's no longer maintained - the author of that recently replaced it with a Python library wrapping the C# code.
This looks to me like the perfect opportunity for a language-independent conformance suite - a set of tests defined as data files that can be shared across multiple implementations.
This would not only guarantee that the existing C# and TypeScript implementations behaved exactly the same way, but would also make it much easier to build and then maintain more implementations across other languages.
Interestingly the now-deprecated Python library does actually use a data-driven test suite in the kind of shape I'm describing: https://github.com/masaccio/compact-json/tree/main/tests/dat...
That new Python library is https://pypi.org/project/fractured-json/ but it's a wrapper around the C# library and says "You must install a valid .NET runtime" - that makes it mostly a non-starter as a dependency for other Python projects because it breaks the ability to "pip install" them without a significant extra step.
Just ported it to rust and plan on maintaining it if you want to add it to your original comment.
More details on a sibling comment:
https://github.com/fcoury/fracturedjson-rs https://crates.io/crates/fracturedjson
Comment with details: https://news.ycombinator.com/item?id=46468641
This is a good idea, though I don’t think it would guarantee program equivalence beyond the test cases.
Depends on how comprehensive the test suite is.
And OK it's not equivalent to a formal proof, but passing 1,000+ tests that cover every aspect of the specification is pretty close from a practical perspective, especially for a visual formatting tool.
With mutation testing you can guarantee that all the behavior in the code is tested.
UC Berkeley: “Top-level functional equivalence requires that, for any possible set of inputs x, the two pieces of code produce the same output. … testing, or input-output (I/O) equivalence, is the default correctness metric used by the community. … It is infeasible to guarantee full top-level functional equivalence (i.e., equivalence for any value of x) with testing since this would require testing on a number of inputs so large as to be practically infinite.”
https://www2.eecs.berkeley.edu/Pubs/TechRpts/2025/EECS-2025-...
In practice mutation fuzz testers are able to whitebox see where branches are in the underlying code, with a differential fuzz test under that approach its generally able to fuzz over test cases that go over all branches.
So I think under some computer science theory case for arbitrary functions its not possible, but for the actual shape of behavior in question from this library I think its realistic that a decent corpus of 'real' examples and then differential fuzzing would give you more confidence that anyone has in nearly any program's correctness here on real Earth.
Yes, there are different levels of sureness being described.
When I hear guarantee, it makes me think of correctness proofs.
Confidence is more of a practical notion for how much you trust the system for a given use case. Testing can definitely provide confidence in this scenario.
How is that relevant for mutation testing?
You can guarantee that all the cases in the code are tested. That doesn't necessarily mean that all the behaviour is tested. If two implementations use very different approaches, which happen to have different behaviour on the Mersenne primes (for deep mathematical reasons), but one of them special-cases byte values using a lookup table generated from the other, you wouldn't expect mutation testing to catch the discrepancy. Each implementation is still the local optimum as far as passing tests is concerned, and the mutation test harness wouldn't know that "disable the small integer cache" is the kind of mutation that shouldn't affect whether tests pass.
There are only 8 32-bit Mersenne primes, 4 of which are byte-valued. Fuzzing might catch the bug, if it happened to hit one of the four other 32-bit Mersenne primes (which, in many fuzzers, is more likely than a uniform distribution would suggest), but I'm sure you can imagine situations where it wouldn't.
> but one of them special-cases byte values using a lookup table generated from the other, you wouldn't expect mutation testing to catch the discrepancy
Sure you would. If the mutation tester mutates that lookup table. Which is quite easy to do, and which mutmut will do (if that lookup table is inside a function, because mutmut is based on mutant schemata).
If the mutation tester mutates that lookup table, then that will eventually lead to all entries in the lookup table being tested. That does not mean that the four divergent values outside the lookup table will end up being tested.
I think if you hit full path coverage in each of them independently and run all the cases through both and check they're consistent you're still done.
Or branch coverage for the lesser version, the idea is still to generate interesting cases based on each implementation, not based solely on one of them.
If the buggy implementation relies indirectly on the assumption that 2^n - 1 is composite, by performing a calculation that's only valid for composite values on a prime value, there won't be a separate path for the failing case. If the Mersenne numbers don't affect flow control in a special way in either implementation, there's no reason for the path coverage heuristic to produce a case that distinguishes the implementations.
Well yeah, but then any discrepancies that are found can be discussed (to decide which of the behaviors is the expected one) and then added as a test for all existing and future implementations.
This is also basically a pure function which makes it super simple to write a harness.
Data driven test suites are really good for building trust in a library. Both the html5lib-tests suite and my recent xss-bench are examples of this!
I ported it to Rust with a cli tool that allows you to format json in this format:
https://github.com/fcoury/fracturedjson-rs
https://crates.io/crates/fracturedjson
And install with:
cargo install fracturedjson
Ports are a derivative work; you should preserve the original author's copyright attribution.
This is great! The more human-readable, the better!
I've also been working in the other direction, making JSON more machine-readable:
https://github.com/kstenerud/bonjson/
It has EXACTLY the same capabilities and limitations as JSON, so it works as a drop-in replacement that's 35x faster for a machine to read and write.
No extra types. No extra features. Anything JSON can do, it can do. Anything JSON can't do, it can't do.
This is very interesting, though the limitations for 'security' reasons seem somewhat surprising to me compared to the claim "Anything JSON can do, it can do. Anything JSON can't do, it can't do.".
Simplest example, "a\u0000b" is a perfectly valid and in-bounds JSON string that valid JSON data sets may have in it. Doesn't it end up falling short of 'Anything JSON can do, it can do" to refuse to serialize that string?
"a\u0000b" ("a" followed by a vertical tabulation control code) is also a perfectly valid and in-bounds BONJSON string. What BONJSON rejects is any invalid UTF-8 sequences, which shouldn't even be present in the data to begin with.
You're thinking of "a\u000b". "a\u0000b" is the three-character string also written "a\x00b".
Bleh... This is why my text formats use \[10c0de] to escape unicode codepoints. Much easier for humans to parse.
My example was a three character string where the second one is \u0000, which is the NUL character in the middle of the string.
The spec on the GitHub says that it is banned to include NUL under a security stance, that someone that after parse someone might do strlen and accidentally truncate to a shorter string in C.
Which I think has some premise, but its a valid string contents in JSON (and in Utf8), so it is deliberately breaking 1:1 parity with JSON parity in the name of a security hypothetical.
The spec says that implementations must disable NUL by default (as in, the default configuration must disallow). https://github.com/kstenerud/bonjson/blob/main/bonjson.md#nu...
Users can of course enable NUL in the rare cases where they need it, but I want safe defaults.
Actually, I'll make that section clearer.
So I think it's a very neat format, but my feedback as a random person on the Internet is that I don't think it does uphold the claimed vision in the end of being 1:1 to JSON (the security parts, but also you do end up adding extra types too) and that's a bit of a shame compared to the top line deliverable.
Just focusing narrowly on the \0 part to explain why I say so: the spec proposed is that implementations have to either hard ban embedded \0 or disallow by default with an opt in. So someone comes with a dataset that has it, they can get support in this case only if they configure both the serializer and parser to allow it. But if you're willing to exert that level of special case extra control, I think all of the other preexisting binary-json implementations that exist do meet the top line definition you are setting as well. For some binary-json implementation which has additional types, if someone is in full end to end control to special case, then they could just choose not to use those types too, the mere existence of extra types in the binary format is no extra "problem" for 1:1 than this choice.
IMO the deliverable that a 1:1 mapping would give us "there is no bonjson data that won't losslessly round trip to JSON and vice versa". The benefit is when it is over all future data that you haven't seen yet, where the downside of using something that is not bijective is that you run for a long time suddenly you have data dependent failures in your system because you can't 1:1 map legal data.
And especially with this guarantee, what will inevitably happen is some downstream handling will also take as a given that they can strlen() since they "knew" the bonjson format spec banned it, so suddenly when you have it as in-bounds data you also won't be able to trivially flip the switch, instead you are stuck with legal JSON that you can't ingest in your system without an expensive audit because the reduction from 1:1 gets entrenched as an invariant into the handling code.
Note that my vantage point might be a bit skewed here: I work on Protobuf and this shape of ecosystem interoperability topics are top of mind for me in ways that they don't necessarily need to be for small projects, and I also recognize that "what even is legal JSON" itself is not actually completely clear, so take it all with a grain of salt (and again, I also do think it looks like a very nice encoding in general).
Oh yes, I do understand what you're getting at. I'm willing to go a little off-script in order to make things safer. The NUL thing can be configured away if needed, but requires a conscious decision to do so.
Friction? yeah, but that's just how it's gonna be.
For the invalid Unicode and duplicate key handling, I'll offer no quarter. The needs of the many outweigh the needs of the few.
But I'll still say it's 1:1 because marketing.
> But I'll still say it's 1:1 because marketing.
Isn't that lying? Marketing is when you help connect people who require a product or service (the market) with a provider of that product or service.
Did you read "Parsing JSON is a minefield"?
Can you tell me what was the context that lead you to create this?
Unrelated JSON experience:
I worked on a serializer which save/load json files as well as binary file (using a common interface).
From my own use case I found JSON to be restrictive for no benefit (because I don't use it in a Javascript ecosystem)
So I change the json format into something way more lax (optional comma, optional colon, optional quotes, multi line string, comments).
I wish we would stop pretending JSON to be a good human-readable format outside of where it make sense and we would have a standard alternative for those non-json-centric case.
I know a lot of format already exists but none really took off so far.
Basically, for better or worse JSON is here to stay. It exists in all standard libraries. Swift's codec system revolves around it (it only handles types that are compatible with JSON).
It sucks, but we're stuck with JSON. So the idea here is to make it suck a little less by stopping all this insane text processing for data that never ever meets a human directly.
The progression I envisage is:
1. Dev reaches for JSON because it's easy and ubiquitous.
2. Dev switches to BONJSON because it's more efficient and requires no changes to their code other than changing the codec library.
3. Dev switches to a sane format after the complexity of their app reaches a certain level where a substantial code change is warranted.
Thanks for the details!
I'm in the JS ecosystem pretty regularly and "restrictive with no benefit" is the right description. I use JSON5 now when I have to, which greatly reduces the restrictions. I already have a build step so throwing in a JSON5 -> JSON converter is negligible.
As for FracturedJson, it looks great. The basic problem statement of "either minified and unreadable or prettified and verbose" isn't one I had put my finger on before, but now that it's been said I can't unsee it.
Have you heard of EDN? It's mostly used in Clojure and ClojureScript, as it is to Clojure what JSON is to JS.
If you need custom data types, you can use tagged elements, but that requires you to have functions registered to convert the data type to/from representable values (often strings).
It natively supports quite a bit more than JSON does, without writing custom data readers/writers.
https://github.com/edn-format/edn
Another thing to possibly consider would be ASN.1 (you can also use the nonstandard extensions that I made up, called ASN.1X, if you want some of the additional types I included such as a key/value list). (You are not required to implement or use all of the types or other features of ASN.1 in your programs; only use the parts that you use for your specific application.) Unlike EDN, ASN.1 has a proper byte string type, it is not limited to Unicode, it has a clearly defined canonical form (DER, which is probably the best format (and is the format used by X.509 certificates); BER is too messy), etc. DER is a binary format (and the consistent framing of different types in DER makes it easier to implement and work with than the formats that use inconsistent framing, although that also makes it less compact); I made up a text format called TER, which is intended to be converted to DER.
I haven't, but it's an interesting format for sure.
I've found a more comprehensive documentation here. [1]
At first glance, I would say it's a bit more complex that it should for a "human readable" format.
[1] https://edn-format.dev/
That's neat, but I'm much more intrigued by your Concise Encoding project[1]. I see that it only has a single Go reference implementation that hasn't been updated in 3 years. Is the project still relevant?
Thanks for sharing your work!
[1]: https://concise-encoding.org/
Thanks!
I'm actually having second thoughts with Concise Encoding. It's gotten very big with all the features it has, which makes it less likely to be adopted (people don't like new things).
I've been toying around with a less ambitious format called ORB: https://github.com/kstenerud/orb
It's essentially an extension of BONJSON (so it can read BONJSON documents natively) that adds extra types and features.
I'm still trying to decide what types will actually be of use in the real world... CE's graph type is cool, but if nobody uses it...
I use ASN.1X, so I use some types that those other formats do not have. Some of the types of ASN.1 are: unordered set, ISO 2022 string, object identifier, bit string. I added some additional types into ASN.1X, such as: TRON string, rational numbers, key/value list (with any types for keys and for values (and the types of keys do not necessarily have to match); for one thing, keys do not have to be Unicode), and reference to other nodes. However, ASN.1 (and ASN.1X) does not distinguish between qNaN and sNaN. I had also made up TER, which is a text format that can be converted to DER (like how ORT can be converted to ORB, although its working is differently, and is not compatible with JSON (TER somewhat resembles PostScript)).
Your extensions of JSON with comments, hexadecimal notation, optional commas, etc is useful though (my own program to convert JSON to DER does treat commas as spaces, although that is an implementation detail).
Probably, we need a formal data format, because JSON is just a notation. It does not mandate the bit width of numbers, for example, or whether ints are different from floats. Once there is such formal model, we can map it 1:1 between representations.
I am writing this because I work on a related topic https://replicated.wiki/blog/args.html
Reminds me of Lite3 that was posted here not long ago:
https://github.com/fastserial/lite3
What about compression rates?
It compresses fairly similarly to JSON.
I think JSON is too limited and has some problems, so BONJSON has mostly the same problems. There are many other formats as well, some of which add additional types beyond JSON and some don't. Also, a few programs may expect (and possibly require) that files may contain invalid UTF-8, even though it is not proper JSON (I think it would be better that they should not use JSON, due to this and other issues), so there is that too. Using normalized Unicode has its own problems, as does allowing 64-bit integers when some programs expect it and others don't. JSON and Unicode are just not good formats, in general. (There is also a issue with JSON.stringify(-0) but that is an issue with JavaScript that does not seem to be relevant with BONJSON, as far as I can tell.)
Nevertheless, I believe your claims are mostly accurate, except for a few issues with which things are allowed or not allowed, due to JavaScript and other things (although in some of these cases, the BONJSON specification allows options to control this). Sometimes rejecting certain things is helpful, but not always; for example sometimes you do want to allow mismatched surrogates, and sometimes you might want to allow null characters. (The defaults are probably reasonable, but are often the result of a bad design anyways, as I had mentioned above.) Also, the top of the specification says it is safe against many attacks, but these are a feature of the implementation, which would also be the case if you are implement JSON or other formats (although the specification for BONJSON does specify that implementations are supposed to check for these things to make them safe).
(The issue of overlong UTF-8 encodings in IIS web servers is another security issue, which is using a different format for validation and for usage. In this case there are actually two usages though, because one of these usages is the handling of relative URLs (using the ASCII format) and the other is the handling of file names on the server (which might be using UTF-16 here; in addition to that is the internal format of the file paths into individual pieces with the internal handling of relative file paths). There are reasons to avoid and to check for overlong UTF-8 encodings, although this is a different more general issue than the character encoding.)
Another issue is canonical forms; the canonical form of JSON can be messy, especially for numbers (I don't know what the canonical form for numbers in JSON is, but I read that apparently it is complicated).
I think DER is better. BONJSON is more compact but that also makes the framing more complicated to handle than DER (which uses consistent framing for all types). I also wrote a program to convert JSON to DER (I also made up some nonstandard types, although the conversion from JSON to DER only uses one of these nonstandard types (key/value list); the other types it needs are standard ASN.1 types). Furthermore, DER is already canonical form (and I had made up SDER and SDSER for when you do not want canonical form but also do not want the messiness of BER; SDSER does have chunking and does not require the length to be known ahead of time, so more like BONJSON in these ways). Because of the consistent framing, you can easily ignore any types that you do not use; even though there are many types you do not necessarily need all of them.
Yup, and that's perfectly valid. I'm OK with BONJSON not fitting everyone's use case. For me, safety is by far more important than edge cases for systems that require bad data representations. Anyone who needs unsafe things can just stick with JSON (or fix the underlying problems that led to these requirements).
Safe, sane defaults, and some configurability for people who (hopefully) know what they're doing. Falling into success rather than falling into failure.
BONJSON is a small spec, and easy to implement ( https://github.com/kstenerud/ksbonjson/blob/main/library/src... and https://github.com/kstenerud/ksbonjson/blob/main/library/src... ).
It's not the end-all-be-all of data formats; it's just here to make the JSON pipeline suck less.
JSON implementations can be made just as safe, but the issue is that unsafe JSON implementations are still considered valid implementations (and so almost all JSON implementations are unsafe because nobody is an authority on which design is correct). Mandating safety and consistency within the spec is a MAJOR help towards raising the safety of all implementations and avoiding these security vulnerabilities in your infrastructure.
> Safe, sane defaults, and some configurability for people who (hopefully) know what they're doing.
Yes, I agree (if you want to use it at all, which as I have mentioned you should consider if you should not use JSON or something related), although some of the things that you specify as not having options will make it more restrictive than JSON will be, even if those restrictions might be reasonable by default. One of these is mismatched surrogates (although matched surrogates should always be disallowed, an option to allow mismatched surrogates should be permitted (but not required)). Also, I think checking for duplicate names probably should not use normalized Unicode. Furthermore, the part that says that names MUST NOT be null seems redundant to me, since it already says that names MUST be strings (for compatibility with JSON) and null is not a string.
> Mandating safety and consistency within the spec is a MAJOR help towards raising the safety of all implementations and avoiding these security vulnerabilities in your infrastructure.
OK, this is a valid point, although there is still the possibility of incorrect implementations (adding test cases would help with that problem, though).
Is there an option for it to read the contents from a pipe? that's by far my biggest use for the jq app.
There's a C# CLI app in the repo: https://github.com/j-brooke/FracturedJson/blob/main/Fracture...
It looks like both the JavaScript version and the new Python C# wrapper have equivalent CLI tools as well.I don't see a CLI tool in the Typescript repo.
Huh, you're right - could have sworn I saw one but I must have been mistaken.
RCL (https://github.com/ruuda/rcl) pretty-prints its output by default. Pipe to `rcl e` to pretty-print RCL (which has slightly lighter key-value syntax, good if you only want to inspect it), while `rcl je` produces json output.
It doesn’t align tables like FracturedJson, but it does format values on a single line where possible. The pretty printer is based on the classic A Prettier Printer by Philip Wadler; the algorithm is quite elegant. Any value will be formatted wide if it fits the target width, otherwise tall.
I don't know, but you can always use <() process substitution to create a temp file.
You can (usually) specify the input file name as “-“ (single hyphen) to read from stdin
Or you can use `/dev/stdin`, which has the upside of not needing tool support.
I somewhat regularly use this on Linux. I think it also works on OS X
And conversely, `/dev/stdout` (resp. `/dev/stderr`) is a convenient way to "redirect" output to stdout (resp stderr) instead of a file
this would be amazing to be chained with jq, that was my first thought as well.
Nice.
And BTW, thanks for supporting comments - the reason given for keeping comments out of standard Json is silly ( "they would be used for parsing directives" ).
It's a pretty sensible policy, really. Corollary to Hyrum's Law - do not permit your API to have any behaviours, useful or otherwise, which someone might depend on but which aren't part of your design goals. For programmers in particular, who are sodding munchkins and cannot be trusted not to do something clever but unintended just because it solves a problem for them, that means aggressively hamstringing everything.
A flathead screwdriver should bend like rubber if someone tries to use it as a prybar.
JSON is used as config files and static resources all the time. These type of files really need comments. Preventing comments in JSON is punishing the wide majority to prevent a small minority from doing something stupid. But stupid gonna stupid, it's just condescending from Mister JSON to think he can do anything about it.
> A flathead screwdriver should bend like rubber if someone tries to use it as a prybar.
While I admire his design goals, people will just work around it in a pinch by adding a "comment" or "_comment" or "_comment_${random_uuid}", simply because they want to do the job they need.
If your screwdriver bends like a rubber when prying, damn it, I'll just put a screw next to it, so it thinks it is used for driving screws and thus behaves correctly.
And we wonder why people are calling for licensed professional software engineers.
On one hand, it has made json more ubiquitous due to it's frozen state. On another hand, it forces everyone to move to something else and fragments progress. It would be much easier for people to move to json 2.0 rather than having hundreds of json + x standards. Everyone is just reinventing json with their own little twist that I feel sad that we haven't standardized to a single solution that doesn't go super crazy like xml.
I don't disagree with the choice, but seeing how things turned out I can't just help but look at the greener grass on the other side.
> A flathead screwdriver should bend like rubber if someone tries to use it as a prybar.
Better not let me near your JSON files then. I pound in wall anchors with the bottom of my drill if my hammer is not within arms reach.
"do not permit your API to have any behaviours, useful or otherwise, which someone might depend on but which aren't part of your design goals"
I can not follow this law by making my API depend, say, the contents of a string value. Preventing APIs depending on the value of a comment is no different, so your argument is not a reason for not having comments.
XML people were doing crazy things in the Java/.NET world and "<!--[if IE 6]>" was still a thing in HTML when JSON was being designed.
I also would have wanted comments, but I see why Crockford must have been skeptical. He just didn't want JSON to be the next XML.
Unrelated: why spaces inside the parentheses? It’s not the first time I see this, but this is incorrect!
JSON doesn't have parentheses, but it does have braces and brackets. The JSON spec specifically allows spaces.
> Insignificant whitespace is allowed before or after any token.
I was talking about the parent comment, which has spaces inside the parenthesis (I do prefer no spaces inside brackets and braces in my JSONs, but that’s another story).
Probably someone who writes C/C++ and formats their code that way
Personally, I find it hard to read.Agreed. But I find this easier to read:
Spaces help group things.This is pretty cool, but I hope it isn't used for human-readable config files. TOML/YAML are better options for that. Git diff also can be tricky with realignment, etc.
I can see potential usefulness of this is in debug mode APIs, where somehow comments are sent as well and are rendered nicely. Especially useful in game dev jsons.
Yaml is the worst. Humans and LLMs alike get it wrong. I used to laugh at XML but Yaml made me look at XML wistfully.
Yaml - just say Norway
The Norway issue is a bit blown out of proportion seeing as the country should really be a string `"no"` rather than the `no` value
YAML strings should really require delimiters rather than being context-dependent.
Yeah, but it's a fun slogan. My real peeve is constantly getting the spaces wrong and no tooling to compensete for its warts. If there were linters and test frameworks and unit tests etc for yaml, I'd just sigh and move on. But current situation is, for instance in ADO Yaml: "So it's time to cut a release and time is short - we have a surprise for you! This will make some condition go true which triggers something not tested up till now, you will now randomly commit shit on the release branch until it builds again."
Stuff that would have been structurally impossible in XML will happen in yaml. And I don't even like XML.
> If there were linters
Available and kept up-to-date. I found for Python, PHP:
https://github.com/j13k/yaml-lint
https://github.com/adrienverge/yamllint
Also .net:
https://github.com/aaubry/YamlDotNet
And NPM/js:
https://github.com/stoplightio/spectral
Just say Norway to YAML.
This is a reference to YAML parsing the two letter ISO country code for Norway:
As equivalent to a boolean falsy value: It is a relatively common source of problems. One solution is to escape the value: More context: https://www.bram.us/2022/01/11/yaml-the-norway-problem/I think it would be better to require quotation marks around all string values, in order to avoid this kind of problems. (It is not the only problem with YAML, but it is my opinion of how any format with multiple types should require explicitly mentioning if it is a string type, but YAML (and some other formats) doesn't.) (If keys are required to strings, then it can be reasonable to allow keys to be unquoted if the set of characters that unquoted keys can contain is restricted (and disallowing unquoted empty strings as keys).)
We stopped having this problem over ten years ago when spec 1.1 was implemented. Why are people still harking on about it?
Current PyYAML:
Other people did not stop having this problem.It might be that there’s some setting that fixes this or some better library that everyone should be switching to, but YAML has nothing that I want and has been a repeated source of footguns, so I haven’t found it worth looking into. (I am vaguely aware that different tools do configure YAML parsing with different defaults, which is actually worse. It’s another layer of complexity on an already unnecessarily complex base language.)
The ancient rule of ”use software that is updated with bugfixes” certainly applies here.
A new spec version doesn’t mean we stop having the problem.
E.g. kubernetes wrote about solving this only five months ago[1] and by moving from yaml to kyaml, a yaml subset.
[1]: https://kubernetes.io/blog/2025/07/28/kubernetes-v1-34-sneak...
The 1.1 spec was released about _twenty_ years ago, I explicitly used the word _implemented_ for a reason. As in: Our Yaml lib vendor had begun officially supporting that version more than ten years ago.
Note that you reference 1.1, I think that version still had the norway behavior.
1.1 partially fixed it, so that strings (quoted ”no”) did not become Boolean false. 1.2 strengthened it to remove unquoted no from list of tokens which could be interpreted as Boolean false.
Because there's a metric ton of software out there that was built once upon a time and then that bit was never updated. I've seen this issue out in the wild across more industries than I can count.
I’m not here clanking down on Java for lacking Lambda features, the problem is that I did not update my Java environment past the 2014 version, not a problem with Java.
I think this mixes up two separate things. If you're working with Java, it's conceivable that you could probably update with some effort. If you're an aerospace engineer using software that was certified decades ago for an exorbitant amount of money, it's never going to happen. Swap for nearly any industry of your liking, since most of the world runs on legacy software by definition. A very large number of people running into issues like these are not in a position where they could solve the problem even if they wanted to.
That’s about 99% of the argument I am making. The problem is legacy software and bad certification workflows, not the software being used.
If I’m working with Java it’s indeed conceivable that I could update with some effort.
If I’m working with Node it’s conceivable that I could update with some effort.
If I working with YAML is it not conceivable that I could update with some effort?
PHP is stupid because version 3 did not support object oriented programming.
CSS is bad because version 2 did not support grid layouts or flexbox.
Why should I critique on these based on something that they have fixed a long time ago instead of working on updating to the version which contain the fix I am complaining about?
There is a gradient limit where the onus shifts squarely to one side once the spec has changed and a number of libraries have begun supporting the new spec.
Because once a technology develops a reputation for having a problem it's practically impossible to rehabilitate it.
Now add brackets and end-tags, I'll reconsider. ;)
Brackets works fine:
End tags, that I’m not sure what that is. But three dashes is part of the spec to delineate sections:Yaml works really well with LLMs (not to generate but to consume). So yes, we use it all the time in our service.
I made a silly groovy script called "mommyjson" that doesn't try to preserve JSON formatting but just focuses on giving you the parentage (thus the name) including array indexes, object names, etc., all on the same line, so that when you find something, you know exactly where it is semantically. Not gonna claim that everybody should use it or that it cures insomnia cancer & hangnails, but feel free to borrow it:
https://github.com/zaboople/bin/blob/master/mommyjson.groovy
(btw I would happily upvote a python port, since groovy is not so popular)
This is good! There are a number of these, so it seems like it's definitely somthing people want. The most popular of which I think is gron[0]. My own is jstream[1]. One tiny point of friendly feedback: you may want to consider adding an example usage/output so folks can see what it does literally.
[0] - https://github.com/tomnomnom/gron [1] - https://github.com/ckampfe/jstream
do you have an ouput example?
Is JSON a format that needs improvement for human readability? I think there are much better ways to present data to users, and JSON is a format that should be used to transfer data from system to system.
If you're reaching for a tool like this, it's because you don't have a well-defined schema and corresponding dedicated visualization; you're looking at some arbitrary internal or insufficiently-documented transfer-level data with nested structure, perhaps in the midst of a debug breakpoint, and need a quick and concise visualization without the ability (or time) to add substantial code into the running runtime. Especially if you're working on integration code with third parties, it's common to come across this situation daily.
I think yes? I fairly often find that I have something in JSON, which probably is from some system to system comms, and I'm trying to read it. Once it's not trivially small I often pipe it through jq or python -m json.tool or whatever, I like the idea of something that just does a better job of that.
If you discard the human-readability component of it, JSON is an incredibly inefficient choice of encoding. Other than its ubiquity, you should only be using JSON because it’s both human and machine readable (and being human-readable is mainly valuable for debugging)
I think JSON is not really so good either way, due to problems with the data and with the file format.
This is interesting. I’d very much like to see a code formatter do that kind of thing; currently formatters are pretty much inflexible, which makes getting structure out of a formatted code sometimes hard.
I just built a C++ formatter that does this (owned by my employee, unfortunately). There's really only two formatting objects: tab-aligned tables, and single line rows. Both objects also support a right-floating column/tab aligned "//" comment.
Both objects desugar to a sequence of segments (lines).
The result is that you can freely mix expression/assignment blocks & statements. Things like switch-case blocks & macro tables are suddenly trivial to format in 2d.
Because comments are handled as right floating, all comments nicely align.
I vibe coded the base layer in an hour. I'm using with autogenerated code, so output is manually coded based on my input. The tricky bit would be "discovering" tables & block. I'd jus use a combo of an LSP and direct observation of sequential statements.
You built it, but your employee owns it? That sounds highly unusual.
Autocorrect — employer. Too late to change, now!
Probably a single-letter typo. Makes complete sense if changed to “employer.”
Auto corrected employer?
Right. In my previous work, I wrote a custom XML formatter for making it look table-like which was our use case. Of course, an ideal solution would have been to move away from XML, but can't run away from legacy.
I have a JSON formatter called Virtuous (https://marketplace.visualstudio.com/items?itemName=karyfoun...) and till now I thought that is the best way to format JSON, and I most confess that I'll throw away my own formatter in favor of this one. What a great job.
When I want something more readable than json I usually use nushell. The syntax is almost the same and you can just pipe through "from json" and "to json" to convert: https://gist.github.com/MatrixManAtYrService/9d25fddc15b2494...
What I like about fractured json is the middle ground between too-sparse pretty printing, and too-compact non-pretty printing, nu doesn't give me that by default.
One thing that neither fractured json nor nushell gives me, which I'd like, is the ability to associate an annotation with a particular datum, convert to json, convert back to the first language, and have that comment still be attached to that datum. Of course the intermediate json would need to have some extra fields to carry the annotations, which would be fine.
Why reinvent the parser when the role of this library is formatting?
I really like this, I think I'd find it useful fairly often and I like the idea of just making something that I use irregularly but not that rarely a bit better.
But then I found it's in C#. And apparently the CLI app isn't even published any more (apparently nobody wanted it? Surprises me but ok). Anyway, I don't think I want this enough to install .NET to get it, so that's that. But I'd have liked a version in Go or Rust or whatever.
I'm the maintainer of FracturedJson. The decision to stop publishing a binary for the CLI version was made a long time ago: fewer features, fewer users, less mature .NET tooling (as far as I knew). And as you say, .NET isn't a common language for distributing CLI tools.
I plan to take a new look at that when I have the time. But a port to a more CLI-friendly platform could probably do a better job.
I really liked the idea, so I am porting it to Rust https://github.com/fcoury/fracturedjson-rs
[dead]
These JSON files are actually readable, congrats. I’m wondering whether this could be handled via an additional attached file instead. For example, I could have mycomplexdata.json and an accompanying mycomplexdata.jsonfranc. When the file is opened in the IDE, the IDE would merge the two automatically.
That way, the original JSON file stays clean and isn’t polluted with extra data.
I had to do a double take on the repo author here :)
this tool also looks super useful, I spend so much time at work looking at json logs that this will surely come in handy. It’s the kind of thing I didn’t even know I needed, but now that I saw it it makes perfect sense.
I like this idea a lot. Currently the biggest issue for adoption seems to be the missing packages for most programming languages, and for homebrew/etc.
It should even be possible to compile the dotnet library to a C-compatible shared library and provide packages for many other languages.
While I wish JSON formally supported comments, it seems more sensible (compatible) to just nest them inside of a keyed list or object as strings.
Works right up until you get an entity where the field `comments` is suddenly relevant and then you need to go change everything everywhere. Much better to use the right tool for the job, if you want JSONC, be explicit and use JSONC.
Surely it could be suffixed or keyed with a less likely collision target than this very simplistic example. I suppose JSONC and similar exist, although they are rarely used in the wild in contrast to actual JSON usage, compatibility is important.
Hadn't heard of JSONC, but I've always been a proponent of JSON5 for this reason.
https://github.com/json5/json5
Personally, I think if your JSON needs comments then it's probably for config or something the user is expected to edit themselves, and at that point you have better options than plain JSON and adding commentary to the actual payload.
If it's purely for machine consumption then I suspect you might be describing a schema and there are also tools for that.
idk... "ans: 42 // an old reference from DA API" seems easier to read than wasting 4 lines of yours
multiply that for a long file... it takes a toll
---
also sometimes one field contains a lot of separate data (because it's straight up easier to deserialize into a single std::vector and then do stuff) - so you need comments between data points
I had a need to put all array data on single lines, but otherwise normal JSON: https://github.com/kardianos/json
This looks very readable. The one example I didn't like is the expanded one where it expanded all but 1 of the elements. I feel like that should be an all or norhing thing, but there's bound to be edge cases.
The lengths people go to (not) use the XML. It has everything: comments, validation, schema, what have you.
Though, I guess, the only(?) great XML workflow is with C# LINQ
Generally much larger for the same date and not readable unless using something that indents. Even then, I'd argue it's still less legible.
I could see FracturedJson being great for a browser extension or preview extension in IDE, as in just for _viewing_ JSON in a formatted way, not formatting the source.
How many times do you actually need to look at large JSONs? The cost of readability is too high, IMO.
Personally, I don't spend much time looking at complex JSON; a binary format like Protobuf along with a typed DSL is often what you need. You can still derive JSON from Proto if you need that. In return, you get faster transport and type safety.
Also, on another note, tools like jq are so ubiquitous that any format that isn't directly supported by jq will have a really hard time seeing mass adoption.
Love the spirit, but the attack-plans example IMO looks worse with this formatting. I don’t love the horizontal scrolling through properties of an object.
I dont know if you spend a fraction of your life scrolling vertically through megabyte sizes json files, but if something can reduce the height of the file thats welcome. We dont need to read every single line fro left to right, we just need to quickly browse through the entire file. If a line in this format is longer than fits the screen, its likely we dont need to know whats in the cut off right corner anyway.
Gigabytes even (people do the silliest things). But ‘find’ gets me there 90% of the time, and at that point the amount of vertical scrolling isn’t really any different than in a 2kb file.
Your usecase is different then. Some of us actually want to browse through the entire json.
Nice... I like using JSON to stdout for logging, this would be a nice formatting option when doing local dev to prettify it without full decomposition.
I tokenized these and they seem to use around 20% less tokens than the original JSONs. Which makes me think a schema like this might optimize latency and costs in constrained LLM decoding.
I know that LLMs are very familiar with JSON, and choosing uncommon schemas just to reduce tokens hurts semantic performance. But a schema that is sufficiently JSON-like probably won't disrupt model path/patterns that much and prevent unintended bias.
Minified json would use even less tokens
Yeah, but I tried switching to minified JSON on a semantic labelling task and saw a ~5% accuracy drop.
I suspect this happened because most of the pre-training corpus was pretty-printed JSON, and the LLM was forced to derail from likely path and also lost all "visual cues" of nesting depth.
This might happen here too, but maybe to a lesser extent. Anyways, I'll stop building castles in the air now and try it sometime.
if you really care about structured output switch to XML. much better results, which is why all AI providers tend to use pseudo-xml in their system prompts and tool definitions
If my json is too complicated to be readable I just use jq to find the things I want out of it.
All this work and there's no mention of YAML on the repository is kind of funny to me
The trouble with yaml is that it's too hard to keep track of how indented something is if its parent is off the screen. I have to keep a t-square on my desk and hang it from the top of my monitor whenever this comes up.
That, and the fact that it has enough bells and whistles to that there are yaml parser exploits out there.
Let's implement this formatting in all code editors and replace YAML with it :)
That is remarkable. I recently implemented this very functionality in Python using roughly 200 lines of code, completely unaware that a pre-built library was available.
I love doing this type of formatting with source code. It'll be nice when people start writing linters that format code like this
Linters shouldn’t format code.
I prefer JSON5
Great. Now integrate this into every JSON library and tool so I get to see it's output more often
I think integration into jq would be both powerful and sufficient.
Powerful but not sufficient. There’s plenty of us who don’t use jq for various reasons.
LLMs have allowed me to start using jq for more than pretty printing JSON.
Give https://rcl-lang.org/#intuitive-json-queries a try! It can fill a similar role, but the syntax is very similar to Python/TypeScript/Rust, so you don’t need an LLM to write the query for you.
Nice! Thanks!
The issue isn’t jq’s syntax. It’s that I already use other tools that fill that niche and have done since as long as jq has been a thing. And frankly, I personally believe the other tools are superior so I don’t want to fallback to jq just because someone on HN tells me to.
Great stuff
I am a bit saddened by the fact people get obsessed by syntactic innovation or even less than that. Don't we have plenty of urgent problems around us?
People have a problem and are trying to solve it. We are not all required, nor able, to solve whatever the world’s most urgent problem is today.
In this case they are formatting JSON in an easier to read way. It’s not an alternative to CRDT, it is a totally different issue.
What can I say. I want all problems in my life to be like that.