February 29, 2024
Health scam websites: identifying related domains using clustering techniques
Health scam websites: identifying related domains using clustering techniques

Introduction

A recent study reported the emerging trend of the use of large numbers of cheap domain registrations to promote bogus health products such as ‘keto’-related dietary supplements. Frequently, the referring sites spoof the appearance of popular news websites to build credibility, whilst actually presenting fake news articles or featuring false endorsements. In some related cases, the sites may instead direct users to legitimate product pages, and are intended to generate click-through revenue via affiliate schemes[1].

It is often the case that these scams make use of new-gTLDs (generic top-level domains) such as .sbs and .cloud, as domains on these extensions can be purchased at very low cost (typically around $1). In one reported cluster, the scam involved the registration of very large numbers of randomly-generated domain names between around March and June 2023. The domain names all began with ‘keto’, followed by a string of (typically six or seven) random alphabetical characters, followed by three random digits[2].

With this trend in mind, we investigate the use of domain ‘clustering’ analysis techniques to identify examples of the domains in question – these types of approach can potentially be used in ‘real time’ to alert brand owners of new registrations relating to similar scams as they arise.

 

Analysis

For the initial analysis, we consider the full set of .cloud domain names, based on zone-file data from ICANN’s CZDS service (as of 12-Jan-2024). Currently there are around 364,000 domains registered on this TLD. Of the .cloud domains, there are over 3,400 with names beginning ‘keto’. Only around 1% of these resolve to any live website content as of the time of analysis (between 18 and 19-Jan-2024); a handful of these were found to be promoting dietary supplement products, though did not appear to constitute part of the ‘fake news article’ cluster referenced above.

In total, 1,611 of the ‘keto’ domains were found to end with a three-digit string. All of these were between 11 and 15 characters in total SLD length[3] and follow the format of the domain names reported above, and accordingly are all potentially part of the set of registrations carried out for the original scam campaign (although none resolved to live sites as of the date of analysis).

Significantly, however, this set of 1,611 domains also share a number of other characteristics which could be use to ‘cluster’ them together (and potentially form part of an ‘early warning’ algorithm to alert of the appearance of new associated registrations of interest):

All domains sit within a narrow range of domain-name entropy values (a measure of the length and amount of ‘randomness’ of the domain-name string)[4],[5], as a consequence of their similar name structures. The domains within the cluster all have entropy values between 2.6 and 3.9, compared with wider distributions for the set of all ‘keto’ .cloud domain names and for the total set of all .cloud domains (Figure 1).

 

Figure 1: Distributions of domain-name entropy values for the 1,611 domains in the cluster, compared with the set of all ‘keto’ .cloud domain names (right-hand axis) and the total set of all .cloud domains (left-hand axis)

 

The vast majority (98.3%) of the domains in the cluster were registered in a ten-day period between 19 and 28-May-2023 (Figure 2).

 

Figure 2: Daily numbers of registrations for the domains in the cluster

 

Almost all (99.7%) of the domains in the cluster were registered through the same registrar (a retail-grade provider previously noted as being popular with infringers)[6] and using the same privacy-protection service provider (1,606 instances).

For certain examples of domains within the cluster for which historical (cached) copies of the former website content are available, it is possible to verify that these sites were indeed previously associated with the ‘fake news’ health scams (Figure 3).

 

Figure 3: Cached screenshot (from DomainTools[7]) of content from an example of one of the sites in the cluster (ketoekezat333.cloud)

 

Within the .cloud zone-file dataset, we also find what appear to be other clusters of related domains, probably also groups of automated registrations associated with other former scams or affiliate revenue generation schemes. Examples include sets of domains of the form acv-ketomirrorXXX.cloud (100 instances), am-sXXX.cloud (71), guangyaoXXX.cloud (62) and videomediaseoXXX.cloud (443) (where ‘XXX’ are strings of three digits in each case). Widening out the search, we also find a similar cluster of domains on the .cyou extension (another new-gTLD often linked to infringing content)[8],[9], comprising 90 domains with names of the form ketoAAAXXX.cyou (where ‘AAA’ is a string of alphabetical characters of variable lengths), all between 7 and 17 characters in SLD length and entropy values between 2.1 and 3.8. Other clusters feature slightly different patterns, such as a group of 200 ketoAAA.bar domains of 9 or 10 characters in SLD length, and 194 ketoXXXXmeto.buzz, 518 ketoXXXXdark.buzz and 178 ketoXXXXdark.todaydomains.

 

Conclusion

The findings illustrate how scammers make use of groups of low-cost domains with shared characteristics (often resulting from automated registrations), as part of high-volume scam campaigns. The large numbers are probably an indication that the scam sites are often short-lived (as borne out by the fact that none of the domains in the identified clusters currently resolve to live sites), presumably as part of an intention to generate revenue quickly and then deactivate the sites before they can be found and shut down through enforcement actions. However, the very fact that these domains do feature characteristics in common means that, when a new scam campaign is identified, it should be possible to design algorithms which, when combined with standard domain-monitoring techniques, can quickly identify additional associated registrations so that timely enforcements can be launched.

 

[1] https://www.techradar.com/pro/security/scammers-are-buying-up-cheap-domain-names-to-host-sites-that-sell-dodgy-health-products

[2] https://www.netcraft.com/blog/health-product-scam-campaigns-abusing-cheap-tlds/

[3] All domain lengths in this study are specified as that of the SLD (second-level domain) name; the part of the domain name to the left of the dot

[4] https://www.linkedin.com/pulse/investigating-use-domain-name-entropy-clustering-results-barnett/

[5] https://www.iamstobbs.com/opinion/the-randomest-domain-names-entropy-as-an-indicator-of-tld-threat-level

[6] ‘Website impersonations: a case study of domain names targeting the UK government’, forthcoming Stobbs blog post (link TBC)

[7] https://research.domaintools.com/research/screenshot-history/ketoekezat333.cloud/#1

[8] https://www.iamstobbs.com/opinion/expert-.watches-.new-.online-.website-.news-.lol-a-review-of-the-current-state-of-the-new-gtld-programme

[9] https://circleid.com/posts/20230117-the-highest-threat-tlds-part-2

Tags
Online Brand Enforcement /  Domains /  Pharma

Found this article interesting today?
Send us your thoughts: