Subscribe to News feed

Using Google AdWords on Content networks, a click-fraud investigation.

Posted at: 4:41 PM on 02 February 2010 by Muhimbi

THIEFS The following article is an ‘open Kimono session’ where I discuss some of the internals of my company as well as our marketing program. It is my opinion that we are the victim of click-fraud, however my investigation is not 100% scientific and I have had to make some assumptions based on observations and time constraints. Please draw your own conclusion and consider everyone innocent until proven guilty. The figures, charts and tables presented in this article originate from Google’s own Analytics and AdWords software.

Update: Latest developments and responses from Google can be found at the end of this post.

After witnessing unexpected browsing behaviour from visitors who arrived on our site via a Google AdWords campaign that we ran a year ago for our PDF Converter for SharePoint, I was pretty sure that we were the victim of click-fraud.

Unfortunately, due to a lack of time and detailed figures to back up my suspicions, I decided not to pursue the matter at the time. However, after recently analysing another campaign it became clear that something suspicious is going on. Naturally it is not Google who is committing the fraud, but they are not doing enough to prevent it either.

Note that Google has settled a click-fraud related class action lawsuit in 2006 for $90M, a drop in the ocean compared to their level of revenue. The problems appear to be ongoing, read on for my findings.

 

What is Google AdWords?

Ever wonder how Google make tens of billions of dollars each year? One word: Advertising! AdWords is the platform that allows customers to specify keywords, bids and budget for displaying adverts next to Google’s search results as well as in-line on any website that is willing to display adverts in exchange for a share in the revenue.

When creating a campaign you can specify where the ads appear:

  1. In Google’s Search Results: Based on the search terms and the keywords specified, relevant adverts are displayed next to and above the search results. The more you are willing to pay, the higher the advert will be displayed, increasing the chances of a user clicking it. Every time an advert is clicked, Google charges the advertiser a fee. As Google’s site is a trusted entity, this way of advertising is relatively fraud proof. In all fairness it appears to work exceptionally well and Muhimbi probably could not survive without it.
     
  2. On the content network: This is where I suspect the majority of fraud is taking place. Anyone who can host any kind of content, e.g. a blog, can sign up as an affiliate and place Google ads in their content. Every time an advert is clicked on the content network Google charges the advertiser and part of the income is paid to the ‘owner’ of the content.
     

Although Google is putting a lot of effort in preventing fraud, the engineer in me can think of many ways to abuse the content network program, particularly using cheap labour, proxy servers and spyware like applications to simulate real user clicks.

For more detail read Wikipedia’s definition of the AdWords platform.

 

Muhimbi’s market and products

What makes this investigation relatively easy is the fact that we serve a niche market. All our products are aimed at corporate IT departments for use in their SharePoint environment. The campaign discussed in this article is for our Workflow Power Pack, a product that allows SharePoint Designer developers to embed C# or VB code into their workflows. A great product, but I believe there is a box shot of our product next to the definition of the word niche.

We are a small, but extremely committed company, which makes it difficult to swallow that our hard earned money appears to be used for funding criminal activity.

 

Normal, genuine, users

Based on our experience with other campaigns as well as ‘organic visitors’ who visit our site via external links or regular Google searches, our normal audience has the following characteristics:

  1. When we send a newsletter to these users, the email rarely bounces.
     
  2. They visit during weekdays. During the weekend our site has 75% less visitors compared to weekdays.
     
  3. They arrive on our site via Google with relevant keyword searches or via links from external sites that are relevant to our niche.
     
  4. They browse around before going to the download page. Only 41% go from the landing page directly to the download page.
     
  5. Roughly half of interested visitors contact our support / sales department at some stage for further information.
     

clip_image002[7]Regular visits by day of the week. Guess which data points represent the weekend.

 

Evil users

The usage pattern of these alleged fraudulent users is completely different:

  1. A large proportion of newsletters sent to these users bounce.
     
  2. They visit every day of the week including the weekend.
     
  3. They arrive on our site via questionable, unrelated sites. More about this later.
     
  4. 75% of the users go directly to the download page from the landing page without getting any further information about the product.
     
  5. None get in contact with our support / sales department to request any kind of information. I would like to think the information on the site is crystal clear, however this does not match the pattern we see from other products and campaigns.
     
  6. They spend an average of 1 minute on the site. I am not sure if this is the minimum that Google Analytics reports, but these people are clearly ‘very committed’.
     

clip_image004[7]Visits by day of the week for pages related to this particular campaign. No weekend dips, clearly hard workers.

 

So why do these alleged fraudulent users go through the effort of downloading our software and registering for newsletters after clicking the advertisement, which is when they make their money? The reason behind this is that many Google campaigns as well as marketing professionals measure the success of their campaign based on conversions. For example:

  1. A user enters the site via an AdWords campaign and downloads the software. This is considered a conversion and a sign of a campaign being successful. The marketing executive will get a pat on the back from the CEO and everyone is happy (initially).
     
  2. A user enters the site via an AdWords campaign and subscribes to a newsletter. This is considered a conversion as well resulting in CEO –> pats marketing on back–> Happy –> time expires –> Sad –> Fired –> Divorce –> Death (See the pain these people are causing!)
     

Clever marketing people and, if configured that way Google AdWords, measure conversions and allocate more budget to sites that generate these conversions. A good reason for fraudsters to simulate some activity after clicking an advertisement.

 

Which sites are the worst offenders?

It should come as no surprise that sites that allow anonymous users to host their own content and insert Google advertisements are the worst offenders as it is almost impossible to trace these people. From a geographic perspective it appears that Chinese web sites are the worst, but many other countries are just as bad.

Our advert has been displayed 2.5 million times on 767 sites over the course of one month. 611 different sites have referred at least one visitor. Out of those sites I consider about 60 sites relevant in the loosest sense of the word (Intentionally or not, eggheadcafe.com for example is legit although until recently very mischievous in the way they presented and positioned their advertisements to make them look like clickable answers to questions. They still do it on some threads, but not as bad as it used to be).

76 sites had an amazing 100% click through ratio, 213 with a CTR of more than 20%, 345 with more than 10% and 450 with more than 1% (which is still an amazing rate considering mail.google.com has a 0.02% CTR).

Sites such as divxphoto.com (domain for sale) I consider to be irrelevant as the domain is for sale and doesn’t actually display any advertisements(!!!!!) Most of the other sites on the list can be categorised as domain for sale, dodgy software download site, driver download site or rubbish content aggregation site.

 

Listed below are the top 15 sites by highest number of advertising clicks.

Domain

Clicks

Impr.

CTR

thaimanga.net

249

485640

0.05%

Manga comics, not relevant to our advert.

softpedia.com

223

158636

0.14%

Download site, not relevant to our advert.

incoto.com

178

192949

0.09%

Some Chinese site

webs.com

159

19109

0.83%

Create your own website service, which makes it easy to host dodgy content.

conduit.com

139

7221

1.92%

Browser toolbar company. God knows what is going on here.

mail.google.com

74

399955

0.02%

Wow, a legit one

eggheadcafe.com

53

109068

0.05%

Legit, but sometimes misleading i.m.o.

csdn.net

50

146830

0.03%

Chinese programming site, maybe legit, probably not.

gyanii.com

48

7481

0.64%

Software download site, looks rubbish and full of advertisements.

blogspot.com

41

13915

0.29%

Host your own content. Partly legit.

csharpfr.com

38

12915

0.29%

French C# site, probably legit.

pin5i.com

34

15196

0.22%

Chinese programming site. Could be legit or just an aggregator. My Chinese isn’t what it used to be.

dotnet-news.com

33

2806

1.18%

Another French .net site related to csharpfr.com. Possibly legit, but I wonder why they are generating so many clicks

Green: Most likely legit - Amber: Likely to be illegitimate - Red: Almost certainly illegitimate

 

Listed below are the top 15 Sites by Click Through Ratio with more than 25 impressions (otherwise the table would contain 76 sites with a 100% CTR after a single impression, which is rather useless).

Domain

Clicks

Impr.

CTR

9mine.com

9

34

26.47%

Free games, not relevant to our advert.

hbrsd.com

19

87

21.84%

Domain for sale, no ads. Who knows where the clicks came from.

5dmail.net

6

36

16.67%

Chinese site, could be legit, could be aggregator.

boxsoftware.net

5

37

13.51%

Spanish software download site

meiying.com

6

48

12.50%

Dodgy site to display just SharePoint related ads without any content.

myalbums.tk

18

160

11.25%

Dodgy site to display just MS Development related ads without any content.

codehaus.org

6

54

11.11%

Some open source site. Could be legit, but not relevant so doesn't explain the high CTR.

micorcsolft.com.cn

8

76

10.53%

Site doesn't even exist.

douziwang.cn

4

40

10.00%

Similar to meiying.com. Dodgy site to just display ads for MS Dev tools

paramegsoft.com

3

30

10.00%

Arabic online games, not relevant to our advert.

kidwaresoftware.com

4

46

8.70%

Possibly legit site, but not relevant so doesn't explain the high CTR

thaiboxsoftware.com

3

38

7.89%

Thai software download site. Glad to see we are so popular in Thailand

netcsharp.cn

3

40

7.50%

Malware site as reported by Google Chrome, yet Google allow advertisements.

download3k.com

3

45

6.67%

Another software download site

technos-sources.com

2

30

6.67%

French tech site. Could be legit, could be aggregator. Doesn't explain the high CTR.

Green: Most likely legit - Amber: Likely to be illegitimate - Red: Almost certainly illegitimate

 
I realise that detecting and solving click fraud is much more difficult than actually causing it, especially without access to key information such as site demographics, visitor behaviour, click streams and conversion data. On the other hand, as Google Analytics tracks the Muhimbi Site, they actually have most of this data. I will present my findings to Google and give them a chance to respond and hopefully improve the situation for everyone. Perhaps some kind of validation system or list of ‘trusted sites’ could be created by Google.

As a Google shareholder I wonder how much of Google’s income actually comes from this kind of alleged criminal activity. According to Google: “…we manage the problem of invalid clicks very well. We have a large team of expert engineers and analysts devoted to it. By far, most invalid clicks are caught by our automatic filters and discarded *before* they reach an advertiser’s bill. And for the clicks that are not caught in advance, advertisers can notify Google and ask for reimbursement.”

This situation cannot continue any longer. I am naturally upset that my company appears to be the victim of fraud, but what about the thousands (millions?) of other advertisers who do not have the knowledge or resources to detect fraud? It should not be up to the customers to research and report fraud, Google should step up its game and clean up its act, no matter how difficult or painful it is. 

So, is Google guilty of fraud? I seriously doubt it, however they appear to be profiteering from other people’s criminal activity in a manner not dissimilar to the way illegal media sharing sites are behaving. “We are not doing anything illegal, we can’t help it that other people upload illegal movies / music / software / <insert excuse here> to our site even though it has clearly been designed for this purpose.”

… Not good enough. To be continued.

 

02-Feb-2010 - Update 1: We are clearly not the only party experiencing click-fraud. For more information visit the links below:

Report Google click-fraud here.

 

18-Feb-2010 - Update 2: Google have responded and claim that only a small percentage of the clicks are fraudulent. The remaining clicks are all part of normal user activity. It appears their response is largely automated so I have replied back asking for further details as I don’t accept their findings. I find it astonishing how much they downplay the issue of click-fraud. Apparently it is up to me to manually exclude domains that I consider not to be relevant. This is just laughable.

 

19-Feb-2010 – Update 3:

 

01-Mar-2010 – Update 4: Received another reply from Google AdWords support. They have disabled some of the accounts that have caused fraudulent clicks, but they are not allowed to tell which clicks were the fraudulent ones. Apparently it is up to us to police Google’s content network, painstakingly go through all reports, check out each domain and then take Google’s word for it about them taking the appropriate action. In light of my findings, Google’s word is not worth much to me at the moment, even if they have the best of intentions.

 

.

Labels: ,

5 Comments:

  • FYI - those "domain for sale" pages like hbrsd.com are parked domains that expired at the registrar or belong to a domain speculator. Most registrars park expired domains, putting google or yahoo ads on them.

    The rest of them sound pretty fishy though. Maybe you can find a pattern to the IP addresses, useragents, or click-paths of the traffic from the heavy-traffic ones like thaimanga.net?

    Good luck!

    By Anonymous Anonymous, At 02 February, 2010 22:01  

  • Thanks, but advertising on parked domains is one of the things Google should prevent from happening.

    By Blogger Muhimbi, At 03 February, 2010 07:48  

  • Google does a poor job letting its users know that parked domains span *both* the search and content networks. You can, actually, block them via the site and category exclusion tool. Look for parked domains on the page types tab on that tool.

    By Anonymous Richard Ball, At 22 February, 2010 22:36  

  • Thanks Richard. They should exclude this by default as parked domains are dodgy in nature anyway.

    By Blogger Muhimbi, At 23 February, 2010 08:51  

  • I've argued for a few years that they should isolate the "AdSense for Domains" traffic on the AdWords side to a new network. Kind of like there's a content network, there should be a domain network. As it is implemented now, it's like a shadow network that spans both the existing search and content networks. That's a poor design, IMHO.

    By Anonymous Richard Ball, At 25 February, 2010 02:26  

Post a Comment

Links to this post:

Create a Link