Do you hate XML? (2010)

22 points | by theanonymousone 2 hours ago

26 comments

  • kccqzy 29 minutes ago

    In my opinion, the reason people hate XML is because of what M signifies: it is a markup language and most of the time we don’t need a markup language. Markup languages are great for rich text documents. They are just not a good fit for representing data. The markup-nature of XML introduces unnecessary choice in whether to use an attribute or a child element to represent data; for HTML such ambiguity doesn’t actually exist but for data it does. Consider this piece of XML from the Python docs:

        <country name="Liechtenstein">
            <rank>1</rank>
            <year>2008</year>
            <gdppc>141100</gdppc>
            <neighbor name="Austria" direction="E"/>
            <neighbor name="Switzerland" direction="W"/>
        </country>
    
    Why is the country name an attribute but not the rank? Why are all information about neighbors attributes but not children?

    Furthermore parsing JSON or YAML gives you an AST that consists of the basic data types like lists and dictionaries. Parsing XML gives you an AST that requires a lot more effort to turn into data in your domain. Even on the web, very few people like to use the verbose XML DOM API like childNodes, nodeType, getElementsByTagName et al; it is basically unheard of for anyone to use it outside the web such as in Python, despite that the DOM API is in the Python standard library since forever (see https://github.com/python/cpython/blob/3.14/Lib/xml/dom/mini... for example).

      mickeyp 8 minutes ago

      Because SAX parsing is a thing, and the visitor pattern makes it easy to elide searches in sub-trees if an attribute does not match.

      So if name == "foobar" then read; else ignore. For a 500 GiB XML file that makes a difference.

      As for your other point about an "AST" (it's actually just a DOM.) That's the the benefit? And you're in for a surprise when you learn that reaching into a deeply-nested JSON structure deserialised into whatever memory format most appropriate for your pet language is also an abstract data type that you act on with getters/accessors/what-have-yous that is in all but name a DOM.

      And we do have tools to deal with it: XSLT for transformation. For querying? XPath.

      dofm 2 minutes ago

      > Why is the country name an attribute but not the rank?

      Perhaps because it's an example of what is possible in XML and how to parse it, and not, in fact, a particularly good or canonical example of XML?

        woodruffw a minute ago

        I think GP’s question was rhetorical. They know it’s an example.

      cryptos 25 minutes ago

      Interesting point of view. JSON is also not the right thing to use in many scenarios, but it is the de-facto standard now. Maybe something like protobuf is the way to go.

      actionfromafar 25 minutes ago

      YAML made me not hate XML.

        SoftTalker 18 minutes ago

        Agreed. Among text based formats, nothing I hate more than YAML.

        tonyedgecombe 6 minutes ago

        [delayed]

      sfn42 8 minutes ago

      Not really. In C# I use a parsing library for which I just write a class and then the library automatically serializes the JSON into an instance of that class.

      I can do the same thing with XML. Of course it doesn't necessarily go that smoothly with all xml, but as long as the xml is fairly simple like a JSON document would be it's totally fine. It's only when you start to use all the features of xml that don't fit neatly into a class model that it starts to get annoying. But if JSON serves your needs then simple xml does as well. I wouldn't use it because JSON works just fine but it's not as bad as people make it seem, unless people make it really bad.

  • cryptos 27 minutes ago

    At least XML is hated for the wrong reasons (e.g. verbosity, esthetics) most of the time. There was for sure an era where it was overused (see Apache Cocoon from 2006 https://en.wikipedia.org/wiki/Apache_Cocoon). But XML is still a pretty good format to exchange (and store) data and make sure the data conforms to a certain schema. JSON Schema in comparison is not nearly as powerful.

      AnimalMuppet 13 minutes ago

      1. What, in your view, are the right reasons to hate XML?

      2. To me, verbosity and aesthetics seem like perfectly valid reasons to hate XML. Once you learn S expressions, XML looks disgusting. They implemented half of Common Lisp in a markup language.

  • crispyambulance 27 minutes ago

    XML was a good, well-intentioned idea.

    The problem, IMHO, was that rampant "xml-abuse" in the naughts. ws-* standards and over-engineered garbage like SOAP ("complex object access protocol") made people loathe XML.

    I did like JAXB in Java, XLST, schemas, XPATH. Never got into XSL, but it seemed like good thing too. It worked best when your tooling manipulated it for you or at least helped you in an intelligent way. Much of the hate for XML came from situations where you had to deal with someone's over-the-top-one-size-fits all schema without the benefit of tooling to at least hint you in the right direction.

    It still survives in WPF and c# *.proj files. If it were just me, I would still use it for object serialization. But json is king now even though it's inferior.

  • mickeyp 15 minutes ago

    XML is unfairly maligned. Yes, people bought into it too much 26 years ago, but then you would too if you had to maintain someone else's massive packed struct dumped into a file and documented in a poorly-maintained word document --- or worse, a brace of dumb IETF RFCs that contradict eachother.

    I am glad that younger generations are looking at it with fresh eyes. XML is a useful format; it has its place in your toolbox. Ignore the haters.

  • reenorap 18 minutes ago

    I’ve hated XML since 2004. The worst part about it is the tags vs attributes fights. They both do the same thing and the only difference is preference. Having two ways of doing the same thing invite and incite religious positions and cause unnecessary fighting. There should be one, opinionated way of doing things so you avoid confusion.

      jolmg 12 minutes ago

      > The worst part about it is the tags vs attributes fights. They both do the same thing and the only difference is preference.

      They're not the same thing. If you look at it as the extensible markup language for documents that it is, "tags" (i.e. inner content) would be visible and "attributes" would not. If your XML document was processed by an application to convert to another type of document (PDF, etc.), and it didn't recognize a particular tag, it would be sensible for attributes to disappear, but inner content ("tags") to remain.

      It's only seems like a preference thing if you look at XML as a structured data format like JSON is.

        edflsafoiewq a few seconds ago

        In data structure terms, attributes do allow nodes to be decorated with additional information without forcing any change on existing parsers. In JSON, this would require swapping, eg. "str" -> {"value": "str", "attrib1": "..."}.

      rf15 15 minutes ago

      yeah it's not a good design to have tags have two sets of children: a Set of key-value children and then a List of tree object children.

  • jolmg 19 minutes ago

    > developers must become domain experts [my emphasis] in a rich and complex space that is essentially unrelated to the application itself.

    XML is a markup language, but most people that used it just needed a standard structured data format. In comes JSON which is more easily compatible with the object systems of various languages and in particular is compatible with Javascript syntax, and XML loses most of the people that used it.

    As a markup language though, it seems pretty good. It's just that the amount of people that actually need an extensible markup language is much smaller.

  • int_19h 18 minutes ago

    Honestly I miss it. As overengineered as it was, at least we had proper tooling for it, and while there were dialects in the associated tech (e.g. XML Schema vs RELAX NG vs Schematron) it was minor compared to the wild west that JSON is to this day.

  • jjgreen 21 minutes ago

    It will be back like vinyl.

  • trueno 6 minutes ago

    i dont hate it, the declaration kind of annoys me from time to time digging into attributes can be annoying its obviously not the best form of structured data.

    json is just easier for my brain at this point if it needs to go over http, but ive seen some pretty... poorly designed json structures.

    csv is always a good time. love when i can just plop important data into a table and query away

  • hackrmn 25 minutes ago

    Every time XML comes up, I feel obligated to share my opinion (I too wrote XML a the turn of the millennium and have seen it become and still witness on occasion it being excommunicated).

    XML is verbose and therefore uglier than it ought to be. I think most of the haters hate it for that alone -- there's not much else to hate because you don't have to deal with the rest, it's not really imposed on you unless you really have to deal with someone else's XML application.

    What do I mean? Well, the brackets thing and the necessity to repeat name of every element twice, in correct (LIFO, last in first out) order, isn't great, admittedly.

    What XML has that the dev-bro alternatives that have flooded the void XML left since, haven't gotten and thus see being reinvented, are: namespaces, attributes and interop using the former two. Sure you can write JSON and YAML (the latter deservingly being incredibly hard to parse correctly -- they tried to design a better XML but failed IMO) -- but these suck as meta-languages because there's not much "meta" there. JSON, for example, allows you create properties and has a few types (kind of more than XML, really) but it leaves semantics up to you and namespaces are up to you to re-invent, poorly. If you think I am stretching the argument, see if you can represent an HTML document (no, not Markdown) with a JSON file.

    YAML is a similar story, albeit with a few cool things like aliases. I think it's a better attempt to give the world a better XML, but the jury is still out on that one.

    The killer thing with XML, for better and for worse, was plethora of tools to work with it. I wrote a fair share of XSLT documents to transform data, back when there was momentum in XHTML, for example. XSLT barely supports JSON and it's not pretty. XPath cannot natively understand YAML -- unless you convert it to XML which I guess re-animates XML as some sort of Frankenstein's monster. And even if it were a [pretty] monster, dealing with intermediate representation for the kind of purpose, is a can of worms all of its own.

    Ironically nobody seems to hate HTML 5, seemingly. Or React basically turned it into a greasy cogwheel nobody needs to look at. Because if you look at it, it's in my opinion an abomination even compared to XML (unpopular opinion) -- the parser is quirky and behaviour is defined by the standard per element type (i.e. some elements need a closing tag and some do not, and what happens if you forget a closing tag is element-specific; care to remember the set of rules to ensure your document renders to your liking?). It has no namespaces but it has "custom elements" which require a dash in the name as poor' man's namespaces and you can't omit one, and now we have a Web of `x-spinner` and `x-carousel` because it turns out everyone rightfully wanted default namespace but didn't get one. Anyway, it's all plumbing, right -- the idea of _writing_ HTML has largely come and gone us by. And I am digressing.

      jolmg 5 minutes ago

      > Well, the brackets thing and the necessity to repeat name of every element twice,

      As a document format, it's supposed to be hand-written by humans. If you have paragraphs between the opening tag and closing tag, it makes sense to let the reader know what they're seeing the closing of.

      After deciding you do want to repeat the element name, the angle brackets make more sense. Otherwise, you can have a syntax like LaTeX's.

      int_19h 17 minutes ago

      I don't like HTML5 and to this day I don't understand what was actually gained by dropping XHTML.

        tommica 7 minutes ago

        Not having the page break because of a small mistake. Though I did get pretty good at writing XHTML, and strictness is a blessing in certain cases.

        kccqzy 4 minutes ago

        XHTML was dropped because it wasn’t backwards compatible, and it was too strict in its syntax. Minor syntax errors that could be automatically corrected by the browser turned into full page errors.