A big reason behind RSS’s wide adoption is its simplicity, but the ambiguity of the RSS 2.0 spec has also led to problems. Case in point: the RSS 2.0 spec states that “entity-encoded HTML is allowed” in the <description> element, which has caused more than a little confusion about how to properly encode < and >. For example, if a feed does not use entity-encoded HTML, how should < and > be encoded so that they’re not interpreted as HTML tag brackets by aggregators such as FeedDemon? And if a feed does contain HTML, how should the tag brackets be encoded so an aggregator knows to treat it like HTML? As you can imagine, this ambiguity has led to more than a little hair loss by developers of aggregators who have had to deal with countless badly-encoded feeds.
This issue has come to the surface again due to concerns about silent data loss, resulting in lengthy comment threads (see here and here) featuring the usual suspects.
Behind the scenes, a number of aggregator developers bounced around ideas for a solving this once and for all, and I’m happy to report that this has resulted in a proposed clarification for the RSS 2.0 spec. The proposed clarification states very clearly that “<description> contains entity-encoded HTML.”
Of course, this won’t magically fix every badly-encoded RSS feed, but it does resolve the ambiguity of the existing spec. It’s hoped that this will lead to less guesswork down the road, especially since the proposal comes with a set of examples for properly encoding descriptions.