OPML Validator Beta

It’s great to see that Dave Winer is working on an OPML Validator – this is a big help to aggregator developers who wish to ensure OPML import/export compatibility with each other, so my thanks to Dave for investing his time in this.

I have to confess that validating FeedDemon’s OPML export has been an eye-opener. Although FeedDemon’s OPML can be imported by every aggregator I’ve tested with, it still contained a few problems (all of which will be corrected in the next build).

I also tested FeedDemon’s OPML import with the OPML exported by a wide array of other aggregators, and here are the most common problems I’ve found (some of which FeedDemon is also guilty of):

  • Using the title rather than text attribute for folder <outline> nodes (see first observation below)
  • Failing to include a <head> section
  • Unescaped ampersands (& instead of &amp;) in the text or title attributes
  • Failing to include type="rss" on subscription list <outline> nodes
  • Using xmlurl (lowercase ‘u’) instead of xmlUrl (uppercase ‘U’)
  • Likewise for htmlUrl

Observations:

  • Several aggregators expect title rather than text as the identifying attribute on <outline> nodes. For this reason, aggregators may wish to include both of these attributes in their OPML export for compatibility, even though doing so would cause the OPML to fail validation.
  • The OPML Validator accepts XML that contains unescaped ampersands (test case).
  • The OPML Validator accepts <outline> nodes that come after a closed <body/> element (test case).
  • The OPML Validator treats unknown attributes as an error, but I believe they should be permitted if they’re correctly namespaced. It would be nice to see namespaces “officially” supported, since an aggregator may wish to include per-feed information in its OPML export which isn’t part of the spec but may be of use to other aggregators (ex: number of times the feed has been visited, the update frequency, unique ID in a service such as NewsGator or Bloglines, name and version of the aggregator which exported the OPML, etc.)

Update: The OPML validation guidelines have been updated to accommodate the observation about duplicating text/title.

22 thoughts on “OPML Validator Beta

  1. I’m working through your list of comments.
    Let’s start here: “Several aggregators expect title rather than text as the identifying attribute on nodes. For this reason, aggregators may wish to include both of these attributes in their OPML export for compatibility, even though doing so would cause the OPML to fail validation.”
    The problem with including both is that title one of the attributes for type=”rss”. So if a defensive exporter decided to follow your advice, there might be two title attributes in one outline element. I’m not sure if this is legal XML, but I suppose it might be. But it’s a horrible mess for OPML. Better to call it an error, imho.
    http://www.opml.org/guidelinesForValidation#subscriptionLists

  2. Just to be clear, I’m not suggesting that the OPML Validator accept duplication of title/text, nor am I suggesting that title occur twice in the same node.
    The problem is that some RSS readers rely on ‘title’ rather than ‘text’ when determining the title of a feed, so aggregators may wish to include both attributes for best compatibility.
    Ex: if a feed is named ‘MyFeed’ the outline node would be <outline text=”MyFeed” title=”MyFeed”…>

  3. Yes, I understood that. But title already means something. So if you’re trying to accomodate people who are looking for titles you’re going to end up with two title attributes on some nodes, one for the klooge, and one because title is already one of the optional attributes for nodes of type rss. I included a link to the section of the guidelines where this was explained, I think it’ll be pretty clear if you read it.

  4. Yep, I read that, but I guess I’m not clear on what ‘title’ is supposed to mean in an RSS node. The section you referenced says “title is probably the same as text, and if so should be omitted,” but doesn’t say what ‘title’ means when it’s *not* the same as ‘text.’
    I haven’t seen any OPML documents which use ‘title’ for something other than the title of the feed, so I assumed (perhaps incorrectly) that it was safe to use the same content for ‘title’ as ‘text’. This appears to be what you’re doing in your OPML, since ‘title’ and ‘text’ are the same in the subscription lists in the section you referenced.

  5. Interesting. We’re getting somewhere.
    I think I was wrong in saying it should be omitted. In the case of a type=”rss” node, while text and title likely have the same value, they serve different purposes. text is what’s displayed by an outline editor, and title is the title of the feed.
    You were right, sorry for the confusion.
    How do you feel about me changing the guidelines?

  6. Ah! OK, so as far as an aggregator is concerned, ‘title’ would be the actual title element from the feed, while ‘text’ would be the title the user gave to the feed (ie: they renamed the feed in their subscriptions)? If so, then this is useful information to have in a subscription list.
    I’m 100% in favor of changing the guidelines if doing so adds clarity.
    Thanks again for taking time to do this work.

  7. Validateur OPML

    Les “import/export” des listes de flux RSS entre agrégateurs se font généralement au moyen du format OPML.
    Dave Winer a mis en ligne quelques “Guidelines” ainsi qu’un validateur OPML qui permet de vérifier la conformité d’un fichier OPML.
    Nick Brad…

  8. GNC-2005-11-1 #113

    First of the month so as always we ask you a single time each month to vote for the Geek News Central over at Podcast Alley Click to Vote Here! Are you coming to the Podcast Academy? The agenda is…

  9. Changes to OPML Validator Beta

    Dave has updated his OPML Validator Beta in response to comments; among the changes are ones that clarify the difference between error and warning conditions as I discussed here and in the comments to the announcement and to the update. Thanks, Dave -…

  10. OPML validator

    Nick Bradbury suggested that the validator warn about outline elements outside the body of an OPML document.

  11. I’ve asked Dave about this before, but never got a reply. I’m hoping you may be able to get an answer out of him, or at least provide an answer yourself.
    As you’ve mentioned here (and the guidelines have recently been updated to reflect) it’s generally a good idea to include both text and title attributes in subscription lists so that RSS readers that only recognise one or the other will still work. However it’s not clear whether this applies to non-RSS nodes (namely “folder” nodes) in a structured subscription list.
    I would assume that if an RSS reader only recognised title attributes for RSS nodes, then it’s a fair bet they would treat folder nodes in the same way, but I may be wrong. Last time I checked, Dave’s validator treated a folder node with both title and text attributes as invalid. However, in his example file for structured subscription lists, that’s exactly how the folder nodes are written. So which is correct?

  12. James, technically I believe ‘text’ is the correct attribute to use on folder nodes, but I’ve always duplicated the content of ‘text’ in the ‘title’ node since some aggregators expect that.

  13. Thanks for the confirmation Nick. Now if only you could persuade Dave to allow that in the OPML validator we might just be able to produce OPML that is both interoperable and valid.

  14. RSS readers rely on ‘title’ rather than ‘text’ when determining the title of a feed, so aggregators may wish to include both attributes for best compatibility. Yeh you have to do double.

Comments are closed.