This weekend much of the geekosphere was buzzing about the “Web 3.0” article in the NY Times, but from where I stand, Web 3.0 does not validate.
Apparently, Web 3.0 is the latest re-branding of the Semantic Web, an attempt to turn the Web of documents into a Web of data. Don’t get me wrong – the goals of the Semantic Web are good ones, and I believe many of those goals will be met in my lifetime. But too much of the Semantic Web relies on data being valid – that is, valid XML, XHTML, RDF, etc. – and too many of us will never publish valid data.
Unless the world comes up with a way to punish those who publish invalid data, invalid data will always exist. Yeah, companies like Google could be the punishers by refusing to index data that isn’t valid, but what are the chances of that happening? Google’s Web search is successful in part because it makes sense of the chaos of the invalid Web. Why mess with that formula?
If the Semantic Web hopes to exist, it’s going to have to deal with invalid HTML, badly-formed XML, and RSS with vague entity escaping. It’s also going to have to filter out every new variation of spam, and be smart enough to know when people lie.
The Semantic Web may happen, but if it does, it’s going to be a helluva lot messier than the architects would like.