Google Web Authoring Statistics

As part of their work with the WHAT (Web Hypertext Application Technology) Group  (external link), Google have released the results of an analysis of a billion HTML documents  (external link) in the wild. It makes interesting reading, and there are some horrors and surprises in there - the widespread use of class names like 'smalltext', 'white' and 'link', for example.

I hope this is a baseline for the start of a longitudinal study which will let us see how the web is evolving over time. There's no analysis of doctypes, which would have been useful, and of course with every element taken in isolation any generalisations made from such stats are wholly invalid. My suspicion is that the web standards:tag soup ratio is still pretty darned small, but that matters are improving, but I can't prove it. Yet.


It always amazes me that Google's main search results pages don't conform to any decent web-standard, such as XHTML. It's about time that a big company led by-example!

Posted by: Dave at January 26, 2006 10:38 PM

Post a comment

Personal information