XML vs YAML vs JSON: A Study To Find Answers
November 5, 2008
November 5, 2008
XML is commonly used for web application messaging - sending information back to a browser from a web server, or sending information between web services. It's dead easy to do this and it works very well, hence XML has become the de-facto choice for data exchange for web applications.
Alternatives such as YAML and JSON have found significant support in recent years. Both aim to be a more suitable alternative to XML in some cases.
How much interest is there in knowing which is best? Let's see.
Ok, so that's not an exact search. But is does suggest a huge amount of interest in a comparison between XML, YAML and JSON. (And no, Google, I didn't mean "xml vs xml" nor "yaml vs jason", but thanks anyway).
XML might not be the best choice in all cases, but that's no revelation.
Dare Obasanjo referred to JSON as being
"another nail in the coffin of XML on the Web".
Tim Bray solved the problem for us 2 years ago.
David Megginson decided it all ends up looking like XML when you add a little complexity, but did note that:
JSON [has] the important advantage [of making] the most trivial cases easy to represent.
James Bennett reminds us that JSON works:
because most people don't really need all that overhead, and because it's often possible to do really interesting things with really simple formats
Even 6 years ago David Mertz pointed out
"some situations where YAML provides a better object serialization format than XML".
And, of course, Dustin Diaz informed the masses that JSON was not only fast but so easy it'll make you sick.
There's no end to the argument, but also not much factual evidence either.
Ultimately, I think Jeff Atwood best sums up the gist of the issue.
I don't necessarily think XML sucks, but the mindless, blanket application of XML as a dessert topping and a floor wax certainly does. Like all tools, it's a question of how you use it.
So we know XML is not ideal, and JSON or YAML may be better in some cases. JSON might be faster, YAML might be better (and more beautiful).
But in what cases would you go for one instead of the other? What benefits might you see and where? I want cold hard facts, numbers, charts and answers.
How much academic research has been made in the field? Let's see what journal articles have been published that compare either XML, YAML or JSON in any way.
Ok, I'm getting desperate. The ACM and the IEEE are not small. They should have at least something of relevance.
Searching ... searching ... searching ...
Nope, turns out the ACM and IEEE journal archives contain nothing of direct relevance. There's even one article that relates to a completely different YAML.
Google Scholar, can you help?
Well, there is one academic article that explicitly compares XML, YAML and JSON (PDF, 200Kb).
It seems that both YAML and JSON are faster to encode for up to about 5000 elements, then XML takes over. It also looks like both YAML and JSON require twice as much memory as XML when decoding. I couldn't determine whether the article relates this to real world performance (the article speaks Portuguese, I don't).
The point: not much academic research appears to have been undertaken and there's a huge amount of interest in some form of performance comparison.
There is no clear sign of any scientifically-arranged, repeatable, verifiable hard-evidence-based comparison. So I'll do just that.
Goodbye life for the next 2-3 months, and hello data object serialization formats for the new world.
I'll run some tests to determine which of the three technologies offers the least:
The tests will be strictly scientific - I'll be doing my best to remove or minimise any influencing factors. Everything is going to be precise, exact and - most importantly - repeatable.
The results themselves might excite or scare a small number of developers. For the benefit of the rest of the world, I'll also be looking into why this is actually useful.
Just to top it all off, I'll also look at whether we need to be sending string-based serialised data between web services and whether we might be better off opting for much much faster choices such as Google Protocol Buffers. And anything else along the way that may be relevant, time permitting.
I've set up some test into the perception of time delays. I'd like to initiate some form of distributed stress testing on some web services. There will surely be plenty of tests and tasks that would benefit from a few minutes of everyone's time.
This is part of my final year project, due at the end of April 2009. I'll have some results before then (I hope!) and will write up short pieces where possible. I'll try to make full and final results available after I finish my final exams, so that'll be some time around the end of June 2009.