Dave Winer writes about relying on titles for de-duping items across feeds:
“You can do a decent job of figuring out if you’ve seen an item before and not show it to the user if you look at the title of the story….It ain’t perfect, but then neither is anything else in the world.”
Dave, I understand where you’re coming from – as an aggregator developer, I’ve certainly seen enough to know that many feeds are far from perfect, and it’s up to developers such as myself to handle imperfect feeds without involving the end user. But I disagree that
title is reliable enough to use for de-duping. If you’re subscribed to a lot of blog or news feeds, then yes, this likely works well enough (for the exact reasons you stated). But there are many other feed sources out there, some of which use the same title over and over again (and they often don’t use guids, so you have to rely on various combinations of
link to determine uniqueness).
For example, I’m subscribed to a bug report feed in which all edits to the original bug come through with the same title. I’m also subscribed to a feed from a web-based forum in which all replies to a specific forum post have the same title prefixed with “RE:”. We’ll see many more feeds like this as RSS continues to break out of the blog/media world, so I don’t think
title alone is good enough for de-duping.
In other words, not only is the world less than perfect, it’s even more less than perfect than we thought ;)