Mark Pilgram expresses his concerns about how some aggregators plan to handle invalid Atom feeds. Mark believes that rejecting invalid XML on the client side is a bad idea, and proposes a thought experiment in which all web browsers use strict XML parsers and refuse to display XHTML that isn’t well-formed.
This is a bit of a bait-and-switch, though. XHTML has the whole sordid history of HTML on its back, and since browsers have generally been forgiving of even the most convoluted HTML there’s a substantial backwards-compatibility issue. The XML-based Atom, however, is brand-spanking new, so it doesn’t have the same baggage as XHTML.
So, let’s try another thought experiment. I’ve copied Mark’s Atom newsfeed and made it invalid XML by adding a single &
character, then uploaded it to my site:
http://www.bradsoft.com/feeds/badatom.xml
Mark asks us to “imagine that all web browsers use strict XML parsers,” but rather than use our imaginations, lets see what happens when we browse this feed in Internet Explorer:
Hmmm…IE appears to be doing client-side validation, and it shows an error instead of displaying the feed’s contents. Okay, so let’s try Mozilla:
Looks like Mozilla does the same thing. How about Opera?
So, the most popular Windows browsers all perform client-side validation, and fail to display the contents of the invalid Atom feed. There’s nothing surprising here, of course – any validating XML parser will reject this feed.
Consumers of RSS feeds have had to code around all sorts of validation problems in order to be backwards-compatible with existing feeds. Atom, however, is new, so customers aren’t already subscribed to hundreds of invalid Atom feeds. Being well-formed is a requirement of XML, and Atom is defined as an XML format, so why not expect Atom feeds to be well-formed? Let’s get it right this time.
Experiment?
Nick Bradbury points out that viewing XML files in popular browsers triggers errors when the file isn’t well-formed. He might…