Sanitizing CSS: 10 Tips for Aggregator Developers

Earlier this week I wrote about sanitizing CSS, and I’ve been thinking about it a bit more. Like many RSS aggregators, for security and presentation reasons the current version of FeedDemon strips all inline styles before displaying a feed, and I thought this was the best approach. But after seeing the Wikipedia feed that Sam Ruby pointed me to, I’m rethinking that.

Just so you know what I’m talking about, take a look at the screenshots linked below. The first shows the Wikipedia feed in FeedDemon with all inline styles removed, while the second shows the same feed with styles intact:

Screenshot 1: styles removed | Screenshot 2: styles intact

As you can see, the feed is far more useful with the styles intact. So rather than blindly strip all inline styles, the next version of FeedDemon will use a "whitelist" of allowed CSS properties and values. FeedDemon’s whitelist will be based on the same rules that Bloglines uses, as outlined in the Sanitization Rules wiki. However, I may make FeedDemon’s whitelist even stricter, since I’m not convinced that it’s wise to enable things like background images and CSS cursors in feed content.

At this point, you might be wondering why RSS aggregators need to bother whitelisting inline styles – why not just leave all the inline styles intact? Beyond the security issues, one problem is that some people will use things like excessively large font sizes to make their posts stand out. Other people will deliberately insert "prank" CSS, like a page full of offensive images designed to ruin the reading experience.

These annoyances aren’t really a problem when the post is viewed by itself or within its feed – after all, if you subscribe to a feed that annoys you, you’ll simply unsubscribe from it. But it’s a different story when it’s combined with posts from other feeds in a "river of news" view, or in a search feed from Technorati or Google. The latter issue is the one that concerns me the most, since theoretically someone could ruin a ton of RSS search feeds by littering their blog with popular keywords, and then injecting some nasty CSS into the blog’s feed.

Luckily for me, I’ve already got a ton of CSS parsing code which I wrote for TopStyle, so it won’t be a big deal to add inline style whitelisting to FeedDemon. But if you’re an aggregator developer who’d also like to whitelist inline styles and you don’t have a background in CSS, you might appreciate a few tips I learned the hard way:

  1. Assuming valid CSS is an invalid assumption. Trust me: just like HTML and RSS, plenty of people use completely invalid CSS. Things like unclosed quotes and declarations without colons can trip up your parser if you assume that inline styles will be correctly written.
  2. Quotes can be escaped. Although rarely used in practice, characters inside CSS values may be escaped with a backslash. This is most commonly used in the box model hack, which relies on escaped quotes to trick outdated browsers into ignoring specific styles (ex: <p style="width:400px;voice-family: "\"}\"";voice-family:inherit;width:300px;">). In other words, your parser can’t assume that quotes always mark the start or end of a value.
  3. Quotes are optional, and single quotes are allowed. Although XHTML requires attribute values to be inside quotes, browsers don’t enforce this requirement. In addition, it’s fine to use single quotes instead of double quotes around values. So make sure your parser handles all three variants (ex: <p style="color:red">, <p style='color:red'> and <p style=color:red>).
  4. Negative values are sometimes allowed. Unlike values for padding properties, values for margin properties can be negative (ex: <p style="margin:-10px">).
  5. Pixels are the default length unit. One of the things I’m doing in FeedDemon is stripping excessively large font sizes (ex: <p style="font-size: 800px">), which requires enforcing a max size based on the length unit. If you plan to do the same thing, keep in mind that when the length unit is missing, browsers may assume that pixels (px) were intended. So <p style="font-size: 12"> is the same thing as <p style="font-size: 12px">.
  6. Font sizes can get you into trouble. If you "flow" multiple posts in the same newspaper page (like FeedDemon), you have to be careful that a font size declared in an unclosed tag in one post doesn’t affect subsequent posts. The problem gets worse with relative font sizes (ex: <p style="font-size: smaller">), since improperly nested relative font sizes could result in a tiny single-pixel font size (or a huge font size when "larger" is used).
  7. Floats can also get you into trouble. If your aggregator uses a multi-column newspaper view, be careful that floated elements don’t overlap posts in adjacent columns (ex: <img style="float:right" src="http://nick.typepad.com/images/basil.gif" />). And you might want to consider only permitting images to be floated, to avoid having floating DIVs, etc., causing problems.
  8. Strip class and id attributes. If your newspaper view relies on classes and/or ids to identify items in the page, I recommend removing class and id attributes from the actual posts – otherwise a post could use the same class names that you use in your newspaper, potentially creating all kinds of havoc.
  9. Remove top-level tags. Although they shouldn’t be there, I’ve seen some feeds that contain top-level tags such as BODY and HTML in their posts. Imagine the impact on your river of news if some prankish feed author inserts a styled BODY tag into their feed.
  10. If your aggregator embeds IE, get out of the local zone. This applies more to script than it does to CSS, but it bears repeating: if you’re embedding the WebBrowser object, don’t allow locally displayed content to operate in the local zone. If you’re not sure what I’m talking about, refer to my earlier post on this topic.

In addition to the above tips, the W3C’s rules for handling CSS parsing errors may also be of help.

Response: On Stripping Styles for Security

Adrian Sutton blogs about the lack of CSS support in RSS aggregators, and concludes:

"There has been a huge push in recent years to move away from the old habits of early HTML and to leverage CSS for presentation – the fact that it doesn’t work in feed readers is a major pain for people trying to do the right thing. It’s good that we identified a security threat and dealt with it quickly – but it’s not acceptable to stop there. We need to work to get the functionality that we used to have back without reintroducing the security risks. It’s not simple, but it is important."

That’s a valid point, and I’m glad Adrian raises it. As the author of both an HTML/CSS editor (TopStyle) and an RSS aggregator (FeedDemon), this is something that I’ve wrestled with quite a bit. On the one hand, I’ve promoted the power of CSS by creating a web authoring tool tailored for building CSS-based sites, yet on the other hand I’m taking that power away by creating an RSS reader that removes CSS from feeds. What gives?

It all started back in 2003, when Mark Pilgrim’s "platypus prank" illustrated how feeds containing CSS could be a problem. Most RSS aggregator developers (myself included) tackled this problem by completely removing all styles from feed content. Since then, I’ve experimented with stripping only "unsafe" CSS from feeds, and despite Adrian’s claim that doing so requires a lot of work, it’s actually quite easy to do (especially for me, since I already have code in TopStyle that could do this, and it would be painless to plunk it into FeedDemon).

The real problem isn’t security, though: it’s presentation (ironically). Leaving styles intact makes sense if you’re reading one post at a time, but it makes less sense in a river of news where posts from multiple feeds flow down the page. The purpose of a river of news isn’t to retain the presentation of any single post, but instead to provide a common presentation for all posts, making it easy to pick out the ones that interest you. If each post had its own style, you could end up with river of news that looks like a ransom note. Given how some bloggers and MSM outlets will do anything to grab your attention, I’ll wager that outcome is far from unlikely.

Another problem – and this is one that bothers me when I don the TopStyle hat – is that if I followed Bloglines’ approach and permitted a whitelist of inline styles, then feed authors couldn’t use classes defined in an external style sheet. In other words, they’d be forced to resort to using style attributes on individual HTML tags, which kills the maintenance benefit of using CSS in the first place. To me, the best thing about CSS is that it enables storing a site’s presentation in a single file – just change the external style sheet, and that change will be reflected site-wide. This benefit is lost when you use inline styles.

So, perhaps the real question isn’t whether RSS aggregators should support inline styles, but whether they should also support external styles as well? Despite my love for CSS, my vote would be no – not because it would be hard to do, but because of the potential impact on the feed-reading experience.

And if only inline styles are supported, which ones make the cut? Personally, I’d want a smaller whitelist than the one Bloglines supports, and I’d also want to make sure that properties such as "float" don’t impact subsequent posts in a river of news view.

Update: Sam Ruby points out that there’s a Sanitation Rules wiki devoted to this topic.

Is Steve Jobs Giving Up on the Desktop?

Apparently a few people were surprised when I blogged about being disappointed when I heard that third-party developers wouldn’t be able to build native iPhone apps.  After all, I’m a Windows desktop developer, so why would I care about the iPhone?

The truth is, even though I’m a desktop developer, I think the future of computing is mobile.  Before long, mobile devices will provide the ability to carry around the equivalent of the Hitchhiker’s Guide to the Galaxy.  You’ll be able to find out whatever you want to know regardless of where you are.  That’s insanely powerful.

But so far I’ve hated every mobile device I’ve seen.  They’re too clunky, too geeky, and generally just too user-hostile.  I’ve stuck with a woefully outdated, underpowered cell phone for years because of this.

So I got pretty excited when I saw the iPhone.  Finally a mobile device which recognized the importance of the user experience.  This was something I could develop for.

Then Apple announced that native iPhone apps wouldn’t be possible.  Why not?  According to Steve Jobs:

“You don’t want your phone to be an open platform.  You need it to work when you need it to work. Cingular doesn’t want to see their West Coast network go down because some application messed up.”

Wow.  Is it just me, or does that sound like Steve Jobs is willing to give up on the desktop because it can’t be secured?  After all, the iPhone runs OS-X, and Jobs basically said that Apple can’t secure their OS enough to trust third party developers to write native apps for it.  So we’re supposed to believe that they can secure their browser enough to run web apps on the iPhone?

Sure, there’s a lot of power in combining a great mobile device with a great web app like Google Maps, but even successful web developers realize the importance of native apps.  For the iPhone to really take off outside of the geekosphere, it has to be able to access data that’s not on the web, and it has to provide a seamless user experience. 

For that to happen, Apple needs to open up the iPhone to outside developers.

Simplicity Ain’t So Simple, Part VI: Simple = Secure

If you want to create software that’s used by a lot of people, you already know you’ve got to make it simple. But if you’re designing a desktop application which connects to the Internet, you’ve also got to make it secure.

To some that may seem obvious, but plenty of developers – including myself – have been blindsided by security holes that we should’ve seen a mile away. All too often we look only at the positive side of the technology we’re involved with, failing to see that if we get lucky and our software gains mainstream acceptance, it then becomes a possible point of attack.

Somehow you’ve got to keep your customers safe without making them feel limited by your security restrictions, and unless your target audience is security experts, you’ve got to do it in a way that doesn’t require a special “security” page in your options dialog that’s stuffed with cryptic settings.

Bottom line, if you want to appeal to a mainstream audience, you’ve got to make your software simple and secure, which is extremely hard to do. How you go about doing that depends on the type of software you’re working on, but there are some basic things you can do:

  • Don’t think of security as a “feature” that you can add later: think about it as part of the foundation of your software. Adding security later may complicate your software, as well as require changing its behavior in ways existing users won’t like.
  • Don’t require customers to do anything extra to make your product secure. If you need to add any security-related settings, make them default to the most secure values possible – and then think about whether you can drop those settings entirely (after all, do you really want to enable customers to make your software less secure?).
  • If you have to warn customers when a specific action has a security risk, make sure your warning dialog uses clear, ungeeky language. You can’t expect customers to make an educated decision if they have to decipher your jargon.
  • Take off your white hat and wear a black one for a while. If you were tasked with exploiting your application, how would you do it? And I don’t mean this in an “old school” hack-the-EXE way: I mean how could you exploit existing features to make customers susceptible to phishing, XSS attacks, etc.
  • Always, always, always think about how spam could affect the technology that your software relies upon. Anything that requires showing end users information from the Internet is susceptible to spam-like behavior, even if nobody is doing it yet.
  • If your software connects to the Internet, don’t read this list and think it doesn’t apply to you. It does.

Making your software secure is your job, not your customers’. So don’t let securing your software complicate their lives – make it simple for them (and yourself) by baking security into your application from day one, and do it in way that doesn’t require them to keep a technical manual by their side.

Why do firewalls have to be such a PITA?

I’m in a ranting mood today, so it’s the perfect time for me to complain about the state of firewalls.  Specifically, about how they’re an incredible pain for desktop developers and support technicians to deal with.

Here’s the deal: every single time a new version of FeedDemon is released, we get complaints that it no longer connects to the Internet.  And every single time the culprit has been a firewall which silently blocks the new version.  Now, I can certainly understand why a firewall would warn the user that an executable has changed – it should do that – but I fail to understand why it would block a changed application without informing the user.  As far as the end user is concerned, the application just doesn’t work.

Even worse is that some software firewalls continue to block applications even after they’ve been disabled.  So savvy end users who disable their firewall in an attempt to determine whether it’s blocking an application are led to believe that the firewall isn’t the problem, so it must be the application’s fault.  And unbelievably, we’ve even seen ZoneAlarm continue to block applications despite the fact that it has been uninstalled (figure that one out, folks)!

This is so clearly insane that I have to think it’s on purpose, like it’s part of a vast Web 2.0 conspiracy to get people to stop using desktop applications by making them impossible to support.

OK, so maybe that’s a stretch, but visit the support forum of any desktop application that connects to the Internet, and I’ll bet you’ll find people complaining that they upgraded to the newest version of the application and now it won’t connect.  This situation is wasting countless hours for end users, programmers and support staff alike.

Surely firewall developers can do better than this?

Kudos to Microsoft

I’ll echo Sam Ruby’s comments:

“When the IE team screws up, it makes front page news everywhere. If life were fair, items like this one would get equal coverage.”

Microsoft takes a lot of crap about security, but they deserve credit for putting security first in IE7’s RSS implementation. I also want to thank Microsoft’s Sean Lyndersay for reaching out to aggregator developers to help make our products more secure.

And while I’m at it, I’ll also give credit to Microsoft for figuring out why Marc Orchant kept having trouble with FeedDemon and IE7. I was involved in a lengthy email thread between Marc and members of Microsoft’s IE team, and it was impressive to witness Microsoft’s debugging efforts. (BTW, it turned out that the problem was caused by having an older version of Google Desktop installed.)

Feed Security and FeedDemon, Part III

Last month I promised to talk about the exploits that James Snell uncovered which left feed readers vulnerable to some very annoying script-based attacks. I didn’t want to provide details of the exploits until other feed readers had patched them, but now that James has published his test suites, I figure it’s time to open the kimono. But before I go any further, I should first make two things clear:

  1. Every one of the vulnerabilities was fixed in FeedDemon 2.0.0.25.
  2. FeedDemon’s newspapers operate in the Internet Zone rather than the local zone, so any script that makes it into a feed would not be able to access your local machine (ie: your data).

Now that that’s out of the way, some details…

Like most feed readers, FeedDemon has always stripped potentially unsafe content from feeds, but James found several ways to get around this and was able to trick FeedDemon (and other feed readers) into displaying popups, toggling read status, and performing other annoying actions. The big “gotchas” that bit me were:

  • I failed to handle whitespace in the word “javascript:” when used inside HTML attributes. Ex: javas cript:window.alert('gotcha')
  • I failed to handle HTML entities that aren’t separated with semi-colons. Ex: &#106&#x61&#118&#x61&#115c
  • I failed to cleanse the xml:base attribute in Atom feeds. Ex: xml:base="javascript:window.alert('gotcha')"

At first glance it may seem that people would simply unsubscribe from any feed that contains an annoying exploit, but the risk goes beyond that. For example, an attacker could hijack a popular site (yes, it is possible) and inject malicious script into that site’s feed. That way everyone subscribed to that feed would be exposed to the attack (ruining the site’s reputation in the process).

The current crop of feed readers (including FeedDemon) protect from this with ultra-aggressive parsing and cleansing of feed content, but that’s a never-ending battle. People will always find holes and exploit them, forcing aggregator developers to put out a steady stream of patches. I don’t think any aggregator developer looks forward to that future, so let’s come up with a better solution.

The real problem for desktop feed readers is that most of them can’t simply disable script since they rely on JavaScript to interact with the browser (for example, FeedDemon’s newspapers use JavaScript to change an item’s appearance, among other things). However, it is possible to access the DOM through your application code directly, which means there is a way to block all script-based attacks without limiting the feed reader from interacting with the browser.

So, if you’re developing a Windows-based feed reader which uses Internet Explorer’s WebBrowser object, here’s what I recommend:

  • Enable the local zone lockdown feature to prevent script from accessing the local machine.
  • Use the DLCTL_NO_SCRIPTS flag to completely disable scripting when viewing a page generated from your application. You can intercept the BeforeNavigate2 event to re-enable scripting before navigating to an external page.
  • Access the DOM through application code via the various IHTMLDocument interfaces rather than through JavaScript (since script will no longer work after taking the previous step).

I’ve talked with a couple of aggregator developers about this approach, and they agree that it should work (and testing here shows that it does). You can expect FeedDemon to follow this approach soon.

As an aside, it may seem odd that I’d help my competitors by sharing this information, but I figure security flaws are an industry-wide problem, and not something that each developer should tackle alone. If we’re going to prevent RSS from becoming as annoying as email, we need to work together on this.

I Missed my Calling

Yesterday I was talking with Brent Simmons and Brian Kellner about feed security, and how you really have to think like a hacker to find vulnerabilities in your software. That reminded me of my own brief experience as a software cracker, which I told them about.

See, back in early 1990s I had a short consulting stint with a large financial institution, working on the desktop piece of a client-server application that transferred millions of dollars over the wire. I was concerned that our login dialog might be vulnerable to password-sniffing, and when I raised this issue with my Program Manager, he tasked me with thinking of ways this could be accomplished. So I made it my calling to figure out how to get the PM’s username and password in a way that wouldn’t require physical access to his computer.

Much to my surprise, it took very little time. Here’s what I did:

I wrote a small program in Visual Basic which sat in the background waiting for a window with the same title as our login dialog to appear (this was back in the Windows 3.1 days, when it was simple to do things like that). After the login dialog was detected, I’d start monitoring the keyboard and record any keystrokes that were entered into the username and password controls. When the dialog was OK’ed, my app would write the user’s login to a text file stored on a network share.

Next I had to find a way to get my app onto the PM’s system without him knowing it. I figured out that if I gave my program the same icon as MS Word, it would look like a Word document when it was attached to an MS Mail message (this was before email clients started blocking EXEs). So I modified the program to load a document into MS Word when it was executed – that way, when the “victim” double-clicked the attachment icon, it would act just like he’d double-clicked a Word document.

When I was confident that my little program worked, I emailed it to the PM. Later that day I checked the network share for the text file containing his login, and sure enough it was there. Suffice to say, his eyes got bigger than should be humanly possible when I showed him his username and password.

I have to admit, I was pretty pleased with my cracking skills when I pulled that off. And after Brent heard this story, he said that I missed my calling :)

Side note: after discovering how easy it was to sniff passwords with a simple VB program, I emailed one of the editors of the Visual Basic Programmer’s Journal, which was the leading magazine for VB developers back then. He got in touch with someone at Microsoft about this, and they told him something along the lines of “yeah, that’s a known problem with Windows 3.1.” Oh, and the name of the editor I contacted was Robert Scoble, who later joined Microsoft and became their most famous blogger. I wonder if he remembers my email?

Feed Security and FeedDemon, Part II

In my previous post I wrote about FeedDemon’s security features, the most important of which is the fact that FeedDemon’s newspapers operate in Internet Explorer’s “Internet Zone” instead of the less secure local zone. This means that even if someone finds a way to trick FeedDemon into running script, it can’t access the local zone (so it can’t touch your hard drive, for example).

It’s a good thing that FeedDemon has this feature, because while I was on vacation, Sam Ruby and James Snell talked about ways to get feed readers to run script – some of which FeedDemon is vulnerable to.

I want to stress that none of these vulnerabilities compromise your local machine, but as James Snell discusses in a subsequent blog post, the fact that script can be run inside FeedDemon is still a problem, and it’s one I take very seriously. If nothing else, these vulnerabilities could be very annoying if exploited. For example, if someone hacked a popular feed so that it contained an exploit which forced a JavaScript popup to appear to all subscribers, there would be a lot of unhappy feed consumers out there.

I also want to add that every feed reader I tried is vulnerable to the same exploits, but I realize that’s no excuse for my own code and it’s small relief to FeedDemon users.

I’ve spent the past week fixing these flaws, and James Snell has kindly tested a private FeedDemon build and found that every vulnerability has been addressed. We plan to release this new build (v2.0.0.25) as soon as we’ve completed testing it (which may take a few days).

In the future I plan to write about how the specific vulnerabilities were resolved, but I don’t want to do that until I’m sure that other feed readers have patched them. In the meantime, if you’re the known author of a feed reader and would like details on the solutions, please feel free to contact me – I’d be happy to share the logic behind the fixes.

As a side note, I’d like to thank those who let us know about the problems before making them public. This was a responsible way to get the vulnerabilities fixed without putting customers at risk, and we appreciate it.