An Attention Namespace for OPML

In a recent post I said that OPML would be a great format for sharing attention data, but I wasn’t sure whether this would be possible due to uncertainty over OPML’s support for namespaces. This afternoon I talked with Dave Winer and Steve Gillmor, and to make a long story short, I’m happy to report that namespaces will be supported by OPML. So an attention namespace for OPML seems like a fine idea at this stage.

As I mentioned previously, FeedDemon already stores attention data in OPML, but it uses a proprietary fd: namespace which relies on attributes that make little sense outside of FeedDemon. What I propose is that aggregator users and developers have an open discussion about what specific attention data could (and should) be collected by aggregators.

Although there’s a lot of attention data that could be stored in OPML, my recommendation is that we keep it simple – otherwise, we risk seeing each aggregator support a different subset of attention data. So rather than come up with a huge list of attributes, I’ll start by recommending a single piece of attention data: rank.

We need a way to rank feeds that makes sense across aggregators, so that when you export OPML from one aggregator, the aggregator you import into would know which feeds you’re paying the most attention to. This could be used for any number of things – recommending related feeds, giving higher ranked feeds higher priority in feed listings, etc.

Although user interface and workflow differences require each aggregator to have its own algorithm for ranking feeds, we should be able to define a ranking attribute that makes sense to every aggregator. In FeedDemon’s case, a simple scale (say, 0-100) would work: feeds you rarely read would get be ranked closer to zero, while feeds you read all the time would be ranked closer to 100. Whether this makes sense outside of FeedDemon remains to be seen, so I’d love to hear from developers of other aggregators about this.

Beyond rank, what other attention data do you think aggregators should collect? And how should they use that data to serve you better?

40 thoughts on “An Attention Namespace for OPML

  1. I just have a note for aggregator developers: you need to make sure that high-traffic feeds get treated properly. E.g. when I’m subscribing to an aggregated feed such as http://del.icio.us/rss/tag/rails I’ll only click on a few of the feed’s items, yet that doesn’t mean this feed isn’t as important to me as a feed where I read every single article.
    I’m not sure how one could properly handle this situation on the OPML level though — maybe there need to be both a metric of clicks/item (which is the rating you propose) and a metric of items/time; or something else that would allow aggregators to cope with high-traffic feeds.
    See also: http://dekstop.de/weblog/2005/11/searchfox_attention_arithmetics/
    Nick, I’m looking forward to your implementation, I think it’s a great idea.

    Like

  2. That’s an excellent point, Martin, and it’s one that a very sharp FeedDemon user named Radek has raised in our forums:
    http://www.newsgator.com/forum/shwmessage.aspx?forumid=7&messageid=9245
    Taking clicks/item into account when determining rank makes a lot of sense. Perhaps the attention namespace should include attributes for (1) the number of items in the feed when it was exported (2) the number of clicks that feed received (3) the date the user subscribed to the feed. These values could help aggregators determine attention without forcing them to use a specific algorithm.

    Like

  3. Oh, one thing I forgot to mention in my previous comment is that the newspaper views used by many aggregators (including FeedDemon) make tracking item clicks tricky at best.
    For example, in FeedDemon I’ll often read an entire page of news items, and I never click on any of them – but I did read them. This is why FeedDemon also tracks clicks at the feed level, and includes that in its attention algorithm.

    Like

  4. I’m wondering if there’s any way to track changes in rank over time. Putting it in physical terms, if rank is equivalent to speed, I’d like to measure acceleration (and deceleration.)

    Like

  5. Ed, how rank has changed over time would certainly be useful to know. I’m wondering, though, how it could be expressed in a way that’s not tied to a specific application or algorithm?

    Like

  6. Nick, do you please you have an example of the data you’re using (proposing)?
    I’m particularly interested in seeing what advantage there is in putting attention terms in namespaces inside OPML, compared to say RSS. The reason I have doubts is that unlike RSS, which generally follows the XML+namespaces approach to structure and semantics, OPML has its own interpretation/extension point, the type attribute (see http://dannyayers.com/archives/2005/11/18/opml-revisited-2/ ). )

    Like

  7. I was discussing this the other week with my brother. We both read our news in a newspaper format and have difficulty with ensuring our collections are completely read. I use bloglines and prioritise by moving feeds up and down between collections as my ranking system. Sometimes I’ll read a collection that has many unread feeds only to get through half of them. I want to tell my reader the point at which I’ve managed to reach in that session.
    So, I’d like to see a news reader that has the ability to mark feed items in my collections as read as I scroll and view them on screen, Or an option to click and say “I’ve read to here”. I see this viewed item information combined with a feed subscription date, a “last read” date, as well as a click through counter, and my own human ranking – like blogbridge does – all used to generate my rank. On top of this the ability to filter by rank methods.
    I’d also like the scroll view counter for items to be intelligent and know when I’ve skipped content by scrolling too quick to read items.
    As for the namespace. Rank, yes. Definately. Please.
    As for number of items in a feed on export, I don’t believe is needed. The aggregator that imports can get that info from a “last read” date stamp for a feed instead.
    The number of clicks that feed received should be related to its rank aswell. But for the sake of tracking what users value most maybe its worth including. But on the flipside, partial feed content getting more click thoughs doesn’t mean it’s any more important and deserving in attention than those I might simply read and not click through to.
    I’m just not sure clicking through to comment or to learn more about a site can be seperated in attention terms to clicking through to simply read the entire article.
    A definate yes to click through counting only on full feeds for me though. Not easy to do I’d think.
    The date the user subscribed to a feed, definately. Useful for plotting users feed reading history. Imagine tracking your interests over time and charting that by topic. :) I could remanisce all the feed reading phases of my life. lol.
    So my feed namespace recommendations would be; Rank, Last Read Date and Subscription Date.
    Sorry for the long comment. :)

    Like

  8. Nick, no problem with being public. I just worry about details being lost when doing this kind of thing in a discussion thread. But I will gladly post here and in my own blog. I’m working on the info.

    Like

  9. Danny, as you recall from our previous discussion, I’m proposing this not because of its technical merits (I freely admit that there are technically superior solutions), but instead because existing aggregators already support OPML. This means that users don’t need to import attention separately – that to me is the biggest benefit.
    At this stage I don’t have examples, as I’m simply asking for input on what attention data should be stored in OPML.

    Like

  10. Rank should be a float between with a range of 0 to 1. This way any aggragator could scale the number to whichever local scale they wished.
    5 stars would be: int (rank * 5)
    Zagat Rating would be: int (rank * 30)
    This way more thank just simple newsfeeds could be ranked. Any URI / web resource can be rated. This could include an end point for a web service.

    Like

  11. Er, yes Nick, but existing aggregators already support RSS ;-) But you’re right – figuring out what data points are needed (in the attention domain model) is definitely the first job.
    Ok, “rank” seems worth having, but I’d suggest the definition needs to be clearer – i.e. what is the ranking value a measure of?
    Another question: how do the characteristics listed in Attention.xml line up against the ones you’ve actually been using in FeedDemon? If there’s even a modderate match, those terms might be the best starting point (even if a different format is used).
    Another reference for you: “MeNow” – attention/presence stuff:
    http://crschmidt.net/semweb/menow/

    Like

  12. Using attributes already defined by Attention.xml makes a lot of sense, and would certainly simplify transforming between the two formats.
    Of the attention.xml attributes, the obvious ones to use at the feed level are etag, tags, lastupdated, lastread and dateadded. Dateremoved could also be useful, since that could tell the importing aggregator not to recommend feeds you removed from the exporting aggregator.
    Of course, it could be reasonably argued that we might as well use all of the attention.xml attributes, but IMO a simple subset is a good starting point.

    Like

  13. Yeah, the definition of ‘rank’ does need to be clearer, but it’s tricky to do that without trying to attach it to a specific algorithm (which I believe would be a very bad idea).
    In my mind, ‘rank’ simply expresses how important a feed is to the user. The higher the value, the more important it is. It’s up to each aggregator as to how ‘rank’ is calculated, but it must be within a specific range of values.
    BTW, I agree with Ted that a float between 0 and 1 makes more sense than an integer value.

    Like

  14. Isn’t the best algorithm highly personalized? With iTunes, I spend my time tweaking my Smart Playlists for what I value rather than depending on Apple to give me some baked-in solution.

    Like

  15. Ok, but if your application has rank by page-view count, and my application has rank by cat-photo count (according to our local measures of importance), and we exchange data, is a ‘neutral’, rank measure in the format going to tell either of anything useful? i.e. the value is 100. 100 what?

    Like

  16. I want to reiterate what Danny is saying (at least what I think he is saying). Storing attention data from which you can calculate a rank would be far more useful than the rank itself.
    Rank is only meaningful to the application that created it. Trying to exchange rank between applications would be like trying to compare apples and oranges.

    Like

  17. More on attention formats (and OPML namespaces)

    It was great to see Nick Bradbury start to talk about the same idea I was talking about earlier; a discussion about the attributes of attention wed like to see collected:
    What I propose is that aggregator users and developers have an op…

    Like

  18. James, I agree 100% that it would be more useful to store attention data that could be used to calculate rank, and that’s actually part of my goal here – to determine exactly what data should be captured.
    However, each aggregator may have application-specific attention data that makes little sense elsewhere. In FeedDemon’s case, drag-and-dropping an item into a news bin increases its rank – but this action only makes sense in FeedDemon. Likewise, flagging an item increases its rank, but many aggregators don’t support flagging items.
    Since each aggregator will have its own method of determining rank, a generic ‘rank’ attribute could help the importing aggregator determine how important each feed is to the user. If the exporting aggregator includes enough attention data in the OPML for the importing aggregator to calculate rank, then the rank attribute could be ignored.
    I don’t believe exchanging rank between applications would have to be like comparing apples and oranges – I think of it along the same lines as sharing a list of your favorite artists between music stores. If one aggregator knows which feeds you reads the most, then surely that’s useful information to another aggregator?

    Like

  19. Thanks for you comment: re vote attribute. I agree you you when you wrote:
    “subscribing to a feed is automatically a vote for it.”
    I want to clarify. I wrote:
    “A vote would work at the item level. (I repeat: by item I mean RSS item, webpage, blog post, podcast, or video or whatever – if it has an url it can be voted for). Voting would be explicit, requiring a user action, maybe a quick check of a box. ”
    So, I’m not talking of a ‘vote’ and the feed level, I’m talking at a more granular level – specific content the feeds point to that have value to you.

    Like

  20. I agree with Alex – The blog world is getting so large that reading all the items within each feed is really starting to hurt. I want to keep up with various feeds, but can no longer keep up with all the items within the feeds.
    Before we starting thinking of rank, we have to start thinking of categorization, and the ability to find/seek what we are looking for. Being around for while, I remember the web before Search (Altavista, Google,etc.), and how difficult it was to find something when you didn’t know were to look. Now that it has been categorized (still problems, just ask Scoble), we are now turning our attention to rank, and context to help find information from the clutter.
    We all know that blogs and the contribution to blogs are growing like wild fire – and we are now producing so much that we can’t find what we are looking for, or the inverse, weed out what we don’t want.
    OPML is not the answer, nor do I think a centralized search engine is as well. Thinking outside of the box for a second, I would like to think of DNS. It contains information that allows us to find what we are looking for, and it’s not centralized.
    Could we not do for tags, that we have done for .net, .com, .org, etc. Could we not have tags that are registered within a library/DNS that we could then point to an “item” (defined by Alex in previous post) rather than a computer with a numerical address?
    Tags is also something that I’m working on to categorize marketing messages, but I don’t want to “subcribe” I want to publish what I’m interested in – “items” delivered to me – now that would be aggregation at it finest (IMHO).

    Like

  21. We spent some time the last two days looking exactly at our OPML stuff. http://www.blogbridge.com/archives/2005/11/geek_preliminar.php is a writup with our contribution to the opml discussion.
    The most important thing is that we will definitely rely on an application specific namespace, like I am sure others will. In our writeup are indicated a few new, what seems to me to be very generally useful new tags. We offer these as ideas to see what people think. They are:
    rating = user supplied rating (a la NetFlix) of the feed
    customTitle, customCreator, customDescription = each a local override to the title, creator and descriptor that may or may not be part of the feed.
    Limit = number of items or posts to keep around
    Anyway, see http://www.blogbridge.com/archives/2005/11/geek_preliminar.php for further details.

    Like

  22. I have to go to bed, but something popped into my head re: “rank” when I read this post in NNW.
    Rank (0-1, 0-100, or on any scale) is a normalized quantity that will have little meaning from one aggregated context to another. However, you could instead store two pieces of information which allow that rank to be calculated:
    number of views / number of refreshes
    Or something similar. I refresh manually in NNW; I don’t always read all of my RSS “Groups”. I focus on my “Local” group, a group dedicated to blogs from my alma mater, and … well, one or two others. However, I often hold my “Political” group until the weekend, and go through the political rantings of the week in one go.
    This means that I may do a “refresh all” several times in the week, but view some feeds less frequently. The only way to calculate “rank” is to actually store both pieces of information (# refreshes and # views) and carry both numbers from one aggregated context to another.
    Better suggestions may have been made in the comments to this post, or you may have come up with other ideas already… I just think that a single, normalized value that looses context will be impossible to (accurately, consistently) move from one aggregator to another. I could be wrong.
    G’night from GMT!

    Like

  23. Now that it’s the weekend, I’ll toss some more things into the pile to be mulled over.
    Rank should be the value that is used to sort the feeds from top to bottom within a given client. The variables used, will also be useful and should be passed. The reason I’m saying this about rank is that really rank is based upon so many other variables that mean so many different things to different people. For example the idea that Views and Refreshes are the variables used to calculate Rank could be meaningless when compairing 2 feeds that update at much different intervals.
    What I really want to see from the OMPL file that I get is what you find important to rank at the top. In the end, I don’t care how you calculate that field, because in the end those feeds will soon be integrated into my listings and my ranking systems will take over. It might be valuable for me to sync up my feeds and a global ranking system be calculated on an arbitrary formula. With these variables, I have My Rank, Your Rank, and Global Rank. When a feed is imported from a global / Community Store, My Rank is set to Your Rank or Global / Community Rank.
    Example: Rank to me is (x + y) * (z/2). The values of x, y, and z are inmaterial. Rank to you is (x / y) / z. Again the formula and actual values of X, Y, and Z are inmaterial. Now lets say the Global / Community Store Rank is: (Total_Subscribers_Rank / Number_of_Subscribers). Now when I subscribe to an OMPL list from the community of say Rank = 0.999999. Now for me Rank is (x + y) * (z/2). Now I don’t have any X, Y’s, or Z’s. I will have a rank of 0.999999 until I get a minimum threshold values of X, Y, or Z. Say that minimum threshold is 10. So I keep the Global Rank until I have 10 X, Y, and Z’s.
    Now these forumla’s need not be user entered, though that would be cool. They could be the difference between one reader and another. The whole point is that I don’t need to care what the rank forumla is that you have once I have enough data to calculate -my- Rank. Therfore I’m suggesting one major thing, and one cool thing.
    The major suggestion is that client’s which consume OMPL / Feeds set a threshold value as I described above. And of course the cool thing would be for a client to allow for custom Ranking formulas.

    Like

  24. Hi! Just wished to add to what Pito have already said about extra attributes we use in BlogBridge.
    A while ago we added another display type — photo display — for the feeds which simply list images with some descriptions, like Flickr does. So, it would be of a big help to us and everyone, who is going to follow this path, if there was such an attribute of a feed outline in an OPML to define a sub-type of a feed. This way, the type could be ‘rss’ and the sub-type (or item type):
    * articles — traditional list of articles
    * images — items are images with descriptions
    * links — feeds with links (like from del.icio.us or simpy).
    It looks like a meaningful addition, which readers could interpret to provide better rendering off the bat.

    Like

Comments are closed.