First published: 30th June 2008
McAfee's second annual report of malicious websites worldwide, "Mapping the Mal Web, Revisited" finds that the .hk top-level domain (TLD) has become the "most dangerous" place to web surf, jumping from 28th place last year. Predictably, this result has attracted some attention in the local press, and anger among legitimate .hk webmasters who feel their sites are being unjustly maligned. What is the reality behind the report?
Obviously, and this is mentioned explicitly in the 2007 report, .hk does not equal Hong Kong:
Individual domains can be owned by persons from any nationality. For example, .com's are registered to people of almost every nationality. This data should not be used to infer riskiness of nationality.
Many of Hong Kong's best-known, and perhaps, highest traffic, websites, such as netvigator.com and scmp.com, are not in the .hk TLD.
Unfortunately, the published report omits some key information that makes it difficult to understand what is going on:
- What is being measured? The report is based on "9.9 million site reports" of "the most trafficked web sites", accounting for over 95% of web traffic. However, the precise meanings of these are not explained and the detail could introduce subtle biases. Is that 95% of page views, or bytes? How was it measured? If it is by software installed on users' computers, like the Alexa toolbar, then the sample population is "people that do not mind being monitored", and the monitoring software might be more heavily installed by some language groups than others, particularly if it is only available in a few languages. This will create a bias in the choice of "popular" sites for a TLD - the sites that are really most popular will probably be in the local language, but the sites most visited by outsiders are likely to be the dodgy, spamvertised sites. On the other hand, if traffic was measured directly, where were the monitoring stations established? All methods of measurement will introduce some bias, why has McAfee omitted to explain their methods so that we can understand the potential bias ourselves?
- How many sites were measured in each TLD? We know 9.9 million sites were measured in total, and the report states that the rankings were restricted to TLDs with at least 2000 tested sites, but exactly how many sites were tested in each TLD? Two thousand is not a large sample, this suggests that, for the "smaller" TLDs, the ranking could be easily skewed by a small number of dodgy sites. The "change in risk" statistic shown is a large positive spike for .hk, and a large negative spike for .tk, last years' "most risky" TLD, this is a classic warning sign that the statistic reflects random variations of a small sample rather than an underlying trend in the data.
- When was the data collected? The report does not specify whether the data was collected over the whole year preceding its release, or a limited testing period, or some other time period. The Web is not a static entity, and the exact time period could make a huge difference to the results, particularly in the light of events in Hong Kong over the last year described below.
In addition, in the discussion the report refers to reports by Sophos and Sunbelt as confirming the dramatic increase in the risk of .hk during the last year. This is a gross mis-representation of those reports. The Sophos report was published January 2007, and related to data from 2006, covering some of the same period as McAfee's March 2007 report, so, if anything, it merely confirms that Hong Kong was "dangerous" before the sudden rise claimed by McAfee. Secondly, the Sophos report (which is about spam relay locations) aggregates Hong Kong with China, so little or nothing can be inferred about the specific situation in Hong Kong. The Sunbelt report related to one specific case of one .hk domain, endeny.hk that was used by the Storm worm. While that was a significant case, it does not reflect the general riskiness of the TLD, indeed, it is an example of an incident that could skew the statistics for a small TLD. The domain no longer exists, and can be registered with HKDNR.
A significant event during the last year that impacts on this report is the delisting of over 8000 .hk domain names by HKDNR, as previously reported in this newsletter and at the AVAR Conference. The delisting was a result of cooperation between OFTA and the HKDNR on combating spamvertised domains reported to OFTA following the introduction of the Unsolicited Electronic Messages Ordinance (UEMO). This event links directly to the three significant ommissions in McAfee's report, listed above: The sites were heavily spamvertised to victims outside of Hong Kong, and could therefore be over-represented in geographically-biased traffic statistics. The number of delisted sites was over 8000, a lot more than the 2000 cut-off point for TLDs to be ranked, so their inclusion or exclusion would have a large effect on the results. The sites were delisted around June to September 2007, so McAfee's data collection dates are highly significant for .hk in particular.
What is the final conclusion? Without the missing methodology details, McAfee's report is questionable and almost useless. The biases cannot be understood, and the web changes quickly. It is doubtful that choosing sites by their TLD will significantly alter the riskiness of your surfing. If users want to make their surfing safer, they should stop following links in dodgy, unsoliticted emails.
OFTA and HKDNR should be praised for their actions in shutting down many spamvertised domains, but it should also be remembered that HKDNR's efforts to increase the number of .hk registrations attracted the spammers and malware distributors in the first place. The .hk TLD is a valuable resource for Hong Kong, and HKDNR should remember that it needs to protect that value for all of us. On a related point, a puzzling omission from the factors constituting a "Hong Kong Link" for the purposes of the UEMO was the involvement of a .hk domain name. Although the idea was proposed during the consultation period, it was left out of the Bill because .hk domain names could be registered by non-Hong Kong entities. It is clear that, if an entity chooses to register a .hk domain, it is claiming an association with Hong Kong, so it is entirely reasonable to require compliance with Hong Kong laws. Amending the UEMO to make a .hk domain constitute a Hong Kong Link would put the delisting of abused domains on a stronger legal footing.