Google Web Authoring Statistics
As part of their work with the WHAT (Web Hypertext Application Technology) Group , Google have released the results of an analysis of a billion HTML documents in the wild. It makes interesting reading, and there are some horrors and surprises in there - the widespread use of class names like 'smalltext', 'white' and 'link', for example.
I hope this is a baseline for the start of a longitudinal study which will let us see how the web is evolving over time. There's no analysis of doctypes, which would have been useful, and of course with every element taken in isolation any generalisations made from such stats are wholly invalid. My suspicion is that the web standards:tag soup ratio is still pretty darned small, but that matters are improving, but I can't prove it. Yet.