Monday, December 31, 2007

The DoubleClick bind

It's been an ongoing struggle to reconcile DoubleClick's (DFA) stats with what we get elsewhere, for example from Google Analytics, WebTrends SDC, and server log files. Most of us trust the data provided by the tagbased analytics and mistrust DoubleClick's numbers, but it would be nice to know what really causes the differences. And they are BIG differences, because DoubleClick's numbers (clicks) can be 100% higher than Google Analytics or WebTrends (page views).

It becomes an issue with people who are invested in spending money on interactive advertising. They are, justifiably, wanting to take the DoubleClick numbers at face value. It makes the clickthrough rate look a lot better.

We're trying to look into it, with a lot of help from WebTrends custom reports, server logs, SDC logs, and Google Analytics as a sort of backup.

We know this:

DFA's numbers are 'way higher than the tagbased data, as much as 100% higher.

DFA's numbers are very similar to the numbers in server logs.

The two tagbased reports, WebTrends SDC and Google Analytics, agree with each other but are usually only half the size of DFA stats and server logs.

When we carefully compare individual page view events between server logs and SDC (tagbased) logs, we can separate out the "extra" events that don't show up in the tagged logs. Close examination does NOT show a pattern of repeated User Agent or IP information. There's no obvious evidence of one or two entities doing a lot of clicking. No easy answer about bots or cottage industry clickfraud.

However, we are finding one interesting thing that points to a possible explanation, or at least a corroboration of the notion that DoubleClick's numbers contain a lot of non-humans. DoubleClick hits are, as you know, marked with extra parameters. So, a given banner destination page will have, in the logs, some hits with DC marker parameters (coming from banner clicks) and other hits without those marker parameters (coming from non-banner sources). We did separate analyses of these two groups - only analyzing hits that were also first hits in a visit.

We found that if the hits had DC marker parameters, the discrepancy between server and SDC/Google logs was a huge one - in the up-to-100%-more range. But if the hits did NOT have DC marker parameters, the discrepancy was really quite small --- server and SDC/Google more or less had the same numbers. In other words, the hits that appeared in server logs but did not appear in tag-based logs were mostly banner hits.

We all know that there are several reasons for an event to appear in server logs but not in tag logs. Two big ones are: 1) an entity that does not execute javascript will not appear in tag logs (rather, the identity of the page being requested won't, although the hit will appear as a site hit), and 2) an entity that does not request images will not appear in tag logs.

It looks to us like a big proportion of clicks on banners are by entities that don't request images or don't execute javascript - basically, bots.

Someday, I'll talk to an insider who understands what's going on and who runs these bots and why. Could be legitimate, or not.

In the meantime, I'm continuing to say that WebTrendsSDC/Google Analytics numbers for banner traffic are the ones to trust, because they are probably the human traffic. DoubleClick numbers contain vast amounts of non-human traffic.


Anonymous Anonymous said...

If you used Doubleclick Spotlight tags, you'd be able to track actual activity resulting from the online display advertising. These are cookie based counters that only report on visits from users that have been exposed to advertising. They are shown as post-click (those that clicked on the banners) and post-impression (those that never clicked). You're able to accurately attribute any activity within that campaign back to specific ads/creatives/etc. I would only use the site analytics data for directional use... to see how much traffic your campaign is driving vs overall traffic recorded on the site. Doubleclick's Spotlight counters should be all you need to evaluate the advertising campaign.

6:54 PM  
Anonymous Bernard said...

Been in the case where all DoubleClick banners from one site were tagged with WT.mc_id and had a custom hit-report on that param both in a serverlog as in a SDClog based profile for the destination site.
The first matches DC's clickthrough numbers quite close. The second indeed varies strongly. The agents in the logs don't make believe this is due to bots.
Now if I look at my own behaviour as a surfer; once every few hundred banner impressions, by accident I click one. In that case I try to close the window as fast as I can, so way before the WT tag gets the chance to execute.
So isn't the difference we see just plain natural? At least for me it is; I hate banners... :-)

8:44 AM  
Anonymous Maurette said...

Hi Chris,
Do you have any more updates on the DoubleClick issues. We're comparing post impression views from Omniture to DFA and we see a 30% difference. With DFA being 30% more than Omniture. Do you have any idea why this can be the case?

1:33 AM  
Anonymous Anonymous said...

The flaw in Double Click tags is that they record impressions regardless if an ad is being served. Even if you comment out the tag in HTML, an ad impression is being recorded in Double Click. Also, if for some reason you mistakenly use a tag with the same IDs more than once on a page, not only are you now double counting your impressions, but you're also double counting the visits in the site metrics.

9:07 AM  
Blogger Seonetworker said...


we have also a 50% difference between the banner clicks from an ad flash campaign in Germany and our Google Analytics stats. It is very difficult to find information about this in the word wide web. So thank you very much for these informations. It seems that from being closed the window, clicking back before loading the page, and bots that are not loading the javascript (how much can it be?), i am thinking that popup-blockers of the flash ad could be also factor?

Could it be, that the media industry are hidden some information, so getting money for non human visits or clicks not reaching the target site?

Where are the case studies?

Where we can find tools to prevent click fraud?

Thank you very much,
SEO Hamburg

3:41 AM  
Blogger cg said...

I've never found anybody who can really explain how Doubleclick counts a click. If somebody clicks twice because they get impatient, does Doubleclick count it as 2 clicks? If that's how Doubleclick works, is there a specific pause time that has to happen between clicks (1 second for example) or does it count a rapidfire double click as two clicks?

As for popup blockers for flash ads --- I don't think that can affect the click difference, since the user will never see the ad or click on it.

Bernard did a good test - he made a report counting hits (never confuse Doubleclick clicks with visits!) and found that the server logs corresponded closely, but not the SDC log. (At least, that's how I'm reading his response). He also checked the server logs' UA fields to see if the server logs were full of bots (which would ignore SDC tags) and concluded they weren't spiders or bots.

The idea of people closing the destination window/tab before the SDC, SiteCatalyst, or GA tag has a chance to fire is reasonable. Bernard's difference strongly indicates that clicks are being missed by the javascript tags, period. And that Doubleclick's numbers do correspond closely to something we can measure, namely server logs which miss virtually nothing.

So the question may boil down to --- the clicks that are seen by the javascript tags are those where the person hung around long enough to actually see the page and trigger the javascript tag. In my mind, those VISITS (I would definitely count the visit stat and not the hit stat in Analytics) are more legitimate as a measure of banner success than is the click statistic.

7:43 AM  
Blogger cg said...

Regarding Spotlight tags --- the post-impression stats, without a control group, are pretty useless. If the Spotlight program has control groups built-in, then fine. Otherwise, it's a way for Doubleclick to sell more stuff because if you triple the impressions you pay for, you're likely to triple your post-impression stats, with absolutely no conclusions possible about causality or attribution.

8:49 AM  

Post a Comment

Subscribe to Post Comments [Atom]

<< Home