Between the Devil and the Deep Blue Sea
[This piece was written for Public Sector Forums and is cross-posted here to allow comments from those who don't have access to that site.]
Or SOCITM and SiteMorse vs. the ODPM and Site Confidence...
There's a pithy saying, much loved by researchers and statisticians, that goes something like this:
Be sure to measure what you value, because you will surely come to value what you measure.
Sage advice, which you would be wise to follow whatever business you're in. In reality, a greater danger comes from the likelihood that others will come to value what you measure, or worse still, what others measure about you.
We're all familiar with the arguments for and against automated web testing. It should form an integral part of any web team's quality assurance policy, and can save enormous amounts of time pinpointing problems buried deep in your site. By itself an automated testing tool can be a valuable aid in improving the quality of your website. But when automated tests are used to compare websites the problems start to come thick and fast. The recent disparity between the 'performance' tests from SiteMorse and Site Confidence are a case in point.
Who can you trust? SiteMorse will tell you that their tests are a valid measure of a site's performance. Site Confidence will tell you the same. Yet as previously reported on PSF the results from each vary wildly. SOCITM have offered this explanation for the variation:
"The reality is that both the SiteMorse and Site Confidence products test download speed in different ways and to a different depth. Neither is right or wrong, just different."
And therein lies the real problem. If both are valid tests of site performance then neither is of any value without knowing precisely what is being tested, and how those tests are being conducted. The difficulty is that no-one is in a position to make a judgement about the validity of the tests, because no-one outside of the two companies knows the detail.
It's worryingly easy to pick holes in automated tests. Site Confidence publishes a 'UK 100' benchmark table on its website, and at the time of writing it has Next On-Line Shopping sitting proudly at number 1, with an average download speed of 3.30 sec for a page weighing 15.33kb. The problem is that the Next homepage is actually over 56kb. At number 5 is Thomas Cook , with a reported page size of 24.92kb, when in fact it's actually a whopping 172kb. Where does the problem lie in this case? Are the sites serving something different to the Site Confidence tool? Is the tool missing some elements, perhaps those referenced within style sheets, or those from different domains? The real problem is that we can't tell from the information provided, and the same holds true for SiteMorse league tables.
A few associates and I have been in correspondence with SOCITM for some months now about the use of automated tests for Better Connected. To date the responses from SOCITM have not completely alleviated our concerns. While some issues have been addressed by SiteMorse, many remain unanswered, and perhaps the greater concern is the attitude of SOCITM. For example, when pressed on why SOCITM hadn't sought a third party view of SiteMorse's testing methods, the response was:
You wonder why we have not done an independent audit of the SM tests. To date when detailed points have been raised, SM has found the reason and a satisfactory explanation, almost always some misunderstanding of the standard, or some problem caused by the CMS or by the ISP. In other words, there has been little point in mounting what would be an expensive exercise. You may, of course, not be satisfied with the explanations in the attached document to this set of detailed points.
I'll leave you to draw your own conclusions from that response, other than to say that I wasn't the slightest bit comforted by it.
Our concerns extend beyond Better Connected to the publication of web league tables in general. The fact is that we know very little about how SiteMorse conduct their tests, or what they are actually measuring. In some cases SiteMorse, or any testing company, will have to assert their interpretation of guidelines and recommendations to test against them, and have to make assumptions about what effect a particular problem might have on a user. For example SiteMorse will report an error against WCAG guideline 1.1 if the alt attribute of an image contains a filename, despite there being legitimate circumstances where such an alt attribute might be required. The fact is there are only two WCAG guidelines which can be wholly tested by automated tools .
While SOCITM make no use of the accessibility tests from SiteMorse, there are similar concerns about performance tests based on no recognised standard, or which have no impact on users. For example SiteMorse raises a warning for title elements with a length of more than 128 characters, citing the 1992 W3C Style Guide for Online Hypertext as the source of the guidance. This guide is at best a good read for those with an interest in the history of the web, but for SiteMorse to use it as the basis for testing sites over a decade later is highly questionable. To quote from the first paragraph of the guide:
It has not been updated to discuss recent developments in HTML., and is out of date in many places, except for the addition of a few new pages, with given dates.
SiteMorse justifies the use of this test in league tables by saying that many browsers truncate the title in the title bar. But this ignores the fact that the title element is used for more than just title bar presentation (for example for search engine indexing), and that the truncation can depend on the size of the browser window (at 800x600 on my PC, using Firefox, the title is truncated at 101 characters, for example). While it may be useful as a warning to a web developer, who can then review the title for the use of the clearest possible language, it certainly should not be used as an indicator in the compilation of league tables.
From our correspondence with SOCITM it became clear very quickly that SOCITM don't know much about how SiteMorse tests either - as evidenced above there has been blind acceptance of the explanations given by the company and no independent expert view sought.
In most other arenas league tables are based on clear and transparent criteria. Football, exam results, olympic medals - all rely on known, verifiable facts. Unfortunately the same cannot be said of the current LA site league tables.
Our main assertion is that SOCITM should be working with local authorities and UK e-standards bodies (if there are any left) to produce a specification for the testing of websites using meaningful, independently assessed measures which are based on consensus, rather than blindly accepting the existing, opaque tests offered by SiteMorse, Site Confidence or any other private concern. There needs to be public discussion about precisely what we should be measuring, how those measures are conducted and what conclusions it would be valid to draw from the results.
In the end it all comes down to a question of credibility - for Better Connected, SOCITM, the testing companies, and most importantly those of us who are responsible for local authority websites. It's likely that league tables are here to stay, but unless we are prepared to question the numbers behind the tables, and the way those numbers are produced, we're probably getting what we deserve.