FeedDemon and well-formed Atom feeds

NetNewsWire creator Brent Simmons recently announced that NetNewsWire’s future support for Atom will require Atom feeds to be well-formed. Some people aren’t too happy about this, claiming that he’s applying a double standard that will make Atom appear less useful than RSS.

So, I’ll add to the stink by stating that my plan is the same as Brent’s. FeedDemon will also support Atom, but if an Atom feed isn’t well-formed XML, FeedDemon will display an error rather than try to parse it. In fairness I have to consider this decision open to input from my customers, but I want to explain why I believe this is important.

When I started coding FeedDemon, I immediately ran into an ugly problem: a huge number of RSS feeds are invalid. This made it impossible to use an off-the-shelf validating XML parser, since it would choke on so many existing feeds. A number of very popular RSS feeds are shockingly invalid, and I couldn’t expect FeedDemon to compete in the RSS aggregator market if it couldn’t handle them. So, I coded my own XML parser, and made it extremely forgiving of problematic feeds.

Atom, however, is a new format, and there’s a chance we can get it right. Rather than wasting our time working around validation issues, aggregator authors such as myself can spend our time coding the features our users really want. This isn’t just self-serving on my part: it will make it easier for anyone who wants to consume Atom feeds if they can expect them to at least be well-formed. It’s not like well-formed XML is hard to do – Tim Bray has listed the four “Bozo Factor” rules that are required, and I have to think that anyone who can spell “XML” can follow these rules.

If both NetNewsWire and FeedDemon require well-formed Atom feeds, then perhaps we’ve provided authors of Atom feeds enough incentive to spit out valid XML instead of the tag soup that has infected too many RSS feeds.

Note: Be sure to read part II of this topic.

30 thoughts on “FeedDemon and well-formed Atom feeds

  1. FeedDemon and Atom

    Yes, it would be easier and “fairer” to be liberal with what we accept – but that’s why we’re stuck with Windows IE CSS hacks, isn’t it?

  2. I think this is a great idea. It’s not about using market clout to destroy the future of an upcoming standard; it’s about making sure that standard is easy for anyone to parse and maintains a standard of being well-formed XML.
    It might take a bit of a rough start, but if the majority of aggregators enforce this then everyone will be better off.
    I just hope the rest of the community doesn’t decide the opposite should be true and FD / NNW lose customers as a result.

  3. So one day I’m at lunch with C and we are explaining news readers to another person in the group. Sites produce a feed, a news reader reads the feed, you bring the news to you. So on and so forth. Then we got to talking about using smart quotes on the site. C flips out and exclaims we can’t use smart quotes on the site because when he copies and pastes from our site into blogger it spits out broken xml and 50 people send him email telling him that his feed is broken. For many, many people with very few subscribers (me) this won’t be a problem. But C is Cory Doctorow, posting on boingboing.net. A very smart guy, with very little time. He wants to do the right thing, but if his publishing tool is broken, he is going to be the one who pays for it.
    I’m not trying to say that enforcing valid XML is a “bad” thing, just that it’s going to affect people in an adverse way that you might not foresee. If all publishing tools enforced validity, I guess it wouldn’t be a problem. But I always think about Zeldman and his hand rolled xml feed and Cory with his smart quote copy-n-paste problem.

  4. I think that its the best option.
    I’m working on a project were we try to aggregate feeds from several academic institutions and the lack of DTD for RSS is a “delayer”… it has brought us to this world of “funcky” feeds..
    Look at the BBC.. they even created their own DTD :)
    I’m really looking forward for the Atom spec.

  5. Great news, Nick. As I posted over at Brent’s, the full definition of what constitutes a valid Atom feed is still under discussion. My own preference would be the switch between accept/reject happens at the level of DTD validity, errors in content being considered non-fatal. Your opinion on these (and any other) issues would be valued on the atom-syntax list.

  6. Applause, Applause

    Beeing a geek and blogger I am following the discussions around the evolving syndication standards quite closely without getting myself too much into debate. The RSS 1.0 vs. 2.0 war was already dirty enough. For my own part, I can say I am clear suppor…

  7. Atom & XML

    Les auteurs respectifs de NetNewsWire et de FeedDemon ont annoncés cette semaine que leur support d’Atom serait XMLement strict. Cela veut dire que les fils Atom devront nécessairement être valides XML pour être lu dans ces 2 aggrégateurs et que …

  8. Ironically, this page is not valid XHTML. In fact, it is so incredibly invalid that the W3C validator presents an error I’ve never seen before, and doesn’t even show the offending source.
    I also note that you are using Typepad, which lists as one of its selling points strict standards compliance (producing valid XHTML by default). But something you have done (I honestly do not know what) has slipped past Typepad’s defenses.
    Luckily, my browser has been *specially designed* to ignore author errors like this and display the rest of your page anyway. And thank goodness for that, otherwise we would not be having this stimulating conversation.
    There are no exceptions to Postel’s Law.

  9. OK, I dug into the source for this page and figured out the problem. It’s the trackback you received from “Znarf Infos” — it contains characters that are illegal according to your specified character set. That is, in fact, what the W3C validator was trying to tell me, but I had never seen the error message before and didn’t understand it.
    Now let’s pretend that you were doing XHTML the right way. And by that, I mean that you were serving it with a MIME type of application/xhtml+xml. This triggers an unforgiving XML mode in Mozilla; if your page is not well-formed XML, it will display an XML debugging error instead of the contents of your page. This is analogous to the behavior you are suggesting incorporating into FeedDemon.
    Let’s further pretend that every browser works this way.
    Now let’s pretend that you were hyper-diligent with your smartquotes and your ampersands, and that you validated your page immediately after authoring it, either with the W3C’s validator or by viewing it in Mozilla (making sure the page was visible in its strict XML mode). Or perhaps such validation could be built into Typepad itself, and it would not let you post an invalid page. Regardless, the page was valid when you posted it.
    So you have done everything right, and yet the page is now invalid, due to circumstances completely and utterly beyond your control. Because someone linked to your page, and your (uncustomizeable) publishing tools are buggy, and they wrecked your page even after you did everything right.
    What happens next? Well, first of all, the discussion certainly stops, since everyone is using unforgiving browsers and all they see when they visit this page is an XML debugging error. Second of all, some of those people will be frustrated enough to hunt down your email address and tell you that your site is broken. Some of them might be intelligent enough to give you a URL; others will just curse at you and tell you you suck. (Surely you understand what I’m talking about — you’ve dealt with end-user bug reports before.)
    Keep in mind that, during all this, you don’t have the slightest idea what’s going on, since the page was valid when you authored it. Does Typepad even give you the option of deleting or editing trackbacks? I’m guessing it lets you delete them. Of course that assumes that you can figure out what the problem is in the first place.
    But wait! Let’s pretend that the administration page displays the trackbacks before it lets you delete them. But of course you can’t see that page, for the same reason your users can’t view your published page — bad characters snuck in. Now you’ve got a catch-22, and you’re sending emails off to Six Apart saying “WTF? I’m totally locked out of my admin page, and my readers are screaming at me, and why the heck am I paying for this grief?”
    This is the world you’re advocating, a world where clients enforce data quality, no exceptions. Think about who it hurts, before you go jumping into it whole hog.

  10. Thought experiment

    The client is the wrong place to enforce data integrity. It’s just the wrong place. If you want to do it, of course I can’t stop you. But think about who it will hurt.

  11. XML thoughts

    If we all lived in a perfect world, or intended to, Nick Bradbury would be right. Unfortunately, we don’t, so Mark Pilgrim’s approach just works. On the one hand, the Web developed organically, with “standards” coming after impl…

  12. XML thoughts

    If we all lived in a perfect world, or intended to, Nick Bradbury would be right. Unfortunately, we don’t, so Mark Pilgrim’s approach just works. On the one hand, the Web developed organically, with “standards” coming after impl…

  13. Mark has a point,
    the only aspect that can make a differente is that Atom is made for machine to machine interoperability; and HTML for machine to human.
    I think that an Atom viewer for humans should have an option to be “tolerant” or “unforgiving”… On this point Tim Bray has a point..
    my 2cents.

  14. The whole reason we have XHTML(XML) is that it was impossible to parse/reuse HTML docs consistantly because they were unpredictable in structure due to every syntaz mistake under the sun being tolerated.
    Now, if Atom, and RSS are indeed XML, then in no way shape or form should they be tolerated in any other form; i.e. non-validating XML. If they are, then we just lost the best feature of XML, and to some extent, XML as a standard: predictable parsibility.

  15. “Atom, however, is a new format, and there’s a chance we can get it right.”
    This is a dangerous way of thinking, that gets proven wrong again and again. (Netscape: “what we need to do to win the browser war is rewrite the code. This time we’ll get it right.”)
    Sure we may have learned things since the last format, but new things have happened since then. People haven’t gotten smarter, more responsible, or less lazy since the last format. So there’s certainly no way we can “get it more right” on the human level.
    On the other hand, computers got faster and parsers got better, so make them do the dirty work of suffering through poorly-formed XML.

  16. XML thoughts

    If we all lived in a perfect world, or intended to, Nick Bradbury would be right. Unfortunately, we don’t, so Mark Pilgrim’s approach just works. On the one hand, the Web developed organically, with “standards” coming after impl…

  17. My 2 cents on Postel\’s Law

    For those who aren’t deep into the blogging community (or “blogosphere,” as it is sometimes referred to), you may not have seen or heard of the Great Postel’s Law Debate now ensuing.

  18. Validate this!

    Much Ado about Validation, and Stuff: The creators of FeedDemon and NetNewsWire have both stated their tough-love position on Atom validation: bad feeds ought to be met with errors and a refusal to parse the page. Hilarity does not ensue….

  19. Well-formed Atom Feeds Only, says Nick

    And I agree with him. Nick Bradbury, FeedDemon’s author, makes a great point about Atom feeds vs. RSS feeds in a couple of good posts:When I started coding FeedDemon, I…

  20. FeedDemon and well-formed Atom feeds

    This from Nick Bradbury on the use of RSS feeds using ATOM, and whats required for Feeddemon, which explains why a colleagues feed doesn’t work. Nick Bradbury: FeedDemon and well-formed Atom feeds NetNewsWire creator Brent Simmons recently announced th…

Comments are closed.