Dynamic RSS Feeds and Bandwidth Consumption

Scoble has been writing about RSS bandwidth concerns lately, so I thought I’d once again post on this topic. I’ve posted before about using conditional HTTP Get (If-Modified-Since) to decrease RSS bandwidth consumption, but here’s a simple recap of how this works:

Almost all aggregators store the date/time that a feed was last updated, and they pass this to the HTTP server via the If-Modified-Since HTTP header the next time they request the feed. If the feed hasn’t changed since that date/time, the server returns an HTTP status code 304 to let the aggregator know the feed hasn’t changed. So, the feed isn’t re-downloaded when it hasn’t changed, resulting in very little unnecessary bandwidth usage.

This sounds simple enough, but there’s a big problem here: many high-traffic RSS feeds are created dynamically through server-side code, and the HTTP server won’t automatically support conditional HTTP get for dynamic feeds. So, all too often the feed is rebuilt each and every time it’s requested – which is obviously a huge waste of both bandwidth and CPU time. One solution is to write your own code to return a 304 based on the If-Modified-Since header, but in many cases it makes more sense to use a static feed that’s rebuilt only when new information needs to be added to it. For example, my FeedDemon FAQ feed is a static RSS file that’s rebuilt whenever I add a new entry to the FeedDemon FAQ. This way, my HTTP server takes care of the If-Modified-Since comparison, and there’s no unnecessary regeneration of the feed.

However, while this works well for feeds that don’t require many updates, it’s not the best approach for feeds that need to be updated more frequently. This is the problem I faced with my support forum feeds, which are created dynamically from information stored in a SQL Server database. Since new forums posts are often made every few minutes, I decided to use server-side code to limit how often aggregators can download the feeds. Almost all aggregators support conditional HTTP get, so I simply check the If-Modified-Since date/time, and if it’s within the last 15 minutes I return a 304 to tell the aggregator the feed hasn’t changed – even if it has. This prevents aggregators from downloading the entire feed more often than once every 15 minutes.

Here’s a snippet of the ASP.NET code I use to do this:

  Dim dtNowUnc As DateTime = DateTime.Now().ToUniversalTime
  Dim sDtModHdr = Request.Headers.Get("If-Modified-Since")
  ' does header contain If-Modified-Since?
  If (sDtModHdr "") And IsDate(sDtModHdr) Then
    ' convert to UNC date
    Dim dtModHdrUnc As DateTime = Convert.ToDateTime(sDtModHdr).ToUniversalTime
    ' if it was within the last 15 minutes, return 304 and exit
    If DateTime.Compare(dtModHdrUnc, dtNowUnc.AddMinutes(-15)) > 0 Then
      Response.StatusCode = 304
      Response.End()
      Exit Sub
    End If
  End If
  ' add Last-modified to header - FeedDemon stores this with cached feed so it's
  ' passed to the server the next time the feed is updated
  Response.AddHeader("Last-modified", dtNowUnc.ToString("r"))

Now, I’ll be the first to admit it’s not the most elegant hack, but so far it has worked very well for me. I considered checking the date/time of the most recent forum post and using that for the If-Modified-Since comparison, but that would’ve required a database hit each time the feed was requested, so I opted for the less precise but more CPU-friendly solution.

23 thoughts on “Dynamic RSS Feeds and Bandwidth Consumption

  1. Extremely minor but as long as you’re posting code… I’m curious as to why you don’t use string.Empty instead of “”.

    Like

  2. Heh, the idea sometimes touches two brains at the same time. I was thinking along the lines of query string params to pass date, but this is generally non-standart, so http header idea is probably the best.
    I’m in Russia on paid-per-Mb cable, so I really feel the pain as my blog list grows bigger ;)

    Like

  3. Heh, the idea sometimes touches two brains at the same time. I was thinking along the lines of query string params to pass date this morning when I stumbled on your article, but this is generally non-standart, so http header idea is probably the best.
    I’m in Russia on paid-per-Mb cable, so I really feel the pain as my blog list grows bigger ;)

    Like

  4. Nick,
    I’ve been thinking about scoble’s recent blog entry as well. In fact I went back to check my log entries, and he has a point….
    Here’s a thought for feed deamon that maybe you haven’t considered. Use an approach similar to Bloglines, and aggregate content for all your users. You can then push the status to your desktop clients using what ever method you like. You could use a more efficient protocol between the client and your server.
    This allows the feed demon server to make one status poll for all the users. If you look at your logs you’ll see that this is exactly what bloglines does, and I think is a pretty decent approach.
    This changes your model of selling only desktop software a bit, but it is worth giving it some thought.

    Like

  5. Excellent! I’m working on a ASP.NET app that consumes loads of feeds, but I need to optimize the retrieval process. Atm, I’m using Atom.NET and RSS.NET (sourceforge projects) to load each feed and check it’s last modified date. I guess it would be much more efficient to manually check each If-Modified-Since header before actually loading it. However, this field is null on every feed I’ve tried so far. Any idea why? Here’s how I load the ModifiedSince header (c#):
    HttpRequest r = new HttpRequest(null, feed.FeedURL, null);
    string lastmodified = r.Headers.Get(“If-Modified-Since”);
    -kenny

    Like

  6. The flaw in this approach is that you’re relying on the agreggators supporting conditional GETs, which was the problem in the first place. Those clients that don’t support conditional GETs will never send an If-Modified-Since header and therefore will always receive a freshly generated copy.

    Like

  7. mmj: I’d not be suprised to see cases where unconditional GETs for rss feeds get returned either an error or some static data suggesting that an updated aggregator is required.

    Like

  8. I can see a scenario in which this would break quite badly.
    A shared cache (i.e. an ISP’s proxy server) requests the feed for the first time, and stores the result.
    Somebody else at the same ISP requests the same feed within 15 minutes. Since they checked the feed longer than 15 minutes ago, the cache will see that its own copy is fresher, so it validates its copy (i.e. sends a request with a Last-Modified matching its own copy). It receives a 304 response, and sends that to the client. It updates its own copy to reflect how recently it checked for freshness.
    A third person requests the feed, again within 15 minutes. The cache notices that it’s got a fresher copy, validates it again, and the same thing happens again and again.
    As long as at least one user of this proxy requests the feed in each 15 minute period, no users will ever receive an up-to-date feed.
    You can’t fix this by switching off public caching, as that would undermine your efforts to save bandwidth completely.
    It would be better to set an Expires header for 15 minutes into the future. That way nobody should be requesting feeds more often.

    Like

  9. Well that’s not quite the same issue, as Nick’s proposal here isn’t about banning identical IP addresses. It’s the same underlying principle though; the wrong people are being left out because the system can’t deal with shared caches effectively. It’s just the “identifying mark” is the Last-Modified header rather than the IP address.

    Like

  10. mmj, I agree with Gywn. If I was shelling out $$$ to pay for bandwidth consumed by aggregators that fail to support essential features like conditional HTTP get, I’d ban them from retreiving my feeds. I wouldn’t be surprised to see this happen with some high-profile feeds before too long.

    Like

  11. Random Intel.

    RSS Scoble discusses RSS traffic. Even on my small site, I’ve have noticed an increase in web traffic as a result of news aggregators. But then again, my site is fairly small compared so it’s not an issue. Anyways, looking for some information, I stumb…

    Like

Comments are closed.