Analog Hacker News

throw03172019 8 minutes ago

So CSV with a “typed” header?

dtagames 2 hours ago

Personally, I'm against anything that goes against the standard LLM data formats of JSON and MD. Any perceived economy is outweighed by confusion when none of these alternative formats exist in the training data in any real sense and every one of them has to be translated (by the LLM) to be used in your code or to apply to your real data.

Any tokens you saved will be lost 3x over in that process, as well as introducing confusing new context information that's unrelated to your app.

[-]

maheshvaikri99 2 hours ago

Fair point, but I'd push back on "none of these alternative formats exist in training data."

ISON isn't inventing new syntax. It's CSV/TSV with a header - which LLMs have seen billions of times. The table format:

table.users id name email 1 Alice alice@example.com

...is structurally identical to markdown tables and CSVs that dominate training corpora.

On the "3x translation overhead" - ISON isn't meant for LLM-to-code interfaces where you need JSON for an API call. It's for context stuffing: RAG results, memory retrieval, multi-agent state passing.

If I'm injecting 50 user records into context for an LLM to reason over, I never convert back to JSON. The LLM reads ISON directly, reasons over it, and responds.

The benchmark: same data, same prompt, same task. ISON uses fewer tokens and gets equivalent accuracy. Happy to share the test cases if you want to verify.

dClauzel 2 hours ago

Just use CSV at this point :D

Show HN: ISON – Data format that uses 30-70% fewer tokens than JSON for LLMs