Sunday, March 20, 2011

Google Analytics Now Has Word Clouds!

Word clouds. Still around, are they? They need to either improve or fade away. Here are my top six reasons for why this is Google pandering to a perceived need rather than acting as an analytics leader.

1. The size of type does nothing that a quantity-sorted list doesn't do
2. Adjacency means almost nothing. What good is alphabetic sorting?
3. Color (when used) means nothing; it's just for dazzle.
4. They irritate smart executives who know they are gratuitous and who lose respect for the analyst
5. They dazzle not-so-smart executives who then expect you to find tremendous insights in them
6. They frustrate good analysts who would rather use that big chunk of real estate for something truly worthy

A word cloud has several available dimensions, in other words --- size of type, location within the cloud, color, and color intensity. A good word cloud would use all of these dimensions but current word clouds use just one. Location (adjacency) could represent either similarity or co-occurrence. Color could represent a typology (problem words vs solution words, or whatever fits into your strategy). Color intensity could represent something quantitative, such as pages per visit.

I don't know of any word cloud generator that uses more than font size, do you? Please slam me if I have missed something here.

Google absolutely has the in-house talent to make word clouds into something better. What a shame they didn't bother.

Sunday, January 30, 2011

What, exactly, do you consider a SEARCH ENGINE?

The question above is just me being curious.

Consider this list:


Would you say the above are all "search engines"?

Do you want them in your search engines reports, or kept out?

Do you want to be paying for text ads on them as part of your paid search program, or not?

All of them have a search function. They may display results from their own site, or paid search ads, or organic results from the internet at large. Or all three.
Some of them are explicitly counted as "search engines" by SiteCatalyst, WebTrends or Google Analytics. Some of them are not counted as search engines but you and I probably would consider them as such.

In short, the search engines reports of our web analytics tools are messy because "search engine" just isn't easily definable these days.
Please think back to your thought process as you scanned my little list. Would you say you have mental rules about what is and what isn't a search engine? What are your rules?
Better yet, would you say you have mental buckets for these sites that aren't just yes/no for being a search engine? I, personally find it useful to chop up all these referring sites into a few groups. The following are the biggest groups:
  • Really-truly search engines, where people go to search the whole internet. I think we'd all agree that Google is one.
  • Shopping search engines, which also have value as price/feature comparison engines. ShopZilla and Nextag qualify for this.
  • Primarily within-site search engines that happen to show related paid content. is definitely in this category ... the paid results may lead you to buy something at another store, but Target seems willing to take that risk. Same for Amazon. They've probably done the cost-benefit math.
  • Same thing as the above, but you have to submit an xml file to get listed. Froogle comes to mind, may it rest in peace.
  • Sites with a whole other purpose, of which search is a tiny part. (Have you ever noticed that if you do a search on Facebook, below all the FB-related hits is a set of organic results powered by Bing?)
Getting analytics tools to give me results according to my subcategories is not easy. As far as they're concerned, a domain is a search engine, or it is not.
But I would claim that my subcategories give me more information about my audience, and my input into my marketing activities, than the simple search-engine unichotomy.

Monday, January 12, 2009

The tough task of operationalizing

The WebTrends blog ( asked "How do you define success?" this morning.

Yes, that’s THE question, and as Deborah (the author) said, the answer is usually pretty broad, just a starting point. More traffic, more sales, more customer engagement (ugh) are typical answers.

The FOLLOWUP question is, for me, a lot more difficult. The conversation usually branches after the first question, into 1) “Okay, but what does that mean?” or 2) “Well, aside from that, what kinds of things indicate partial success, or probable future success?”. And the one that seems to be the killer: "Which (of these two measures that we've worked out in the past hour) is more important?"

Getting people to operationalize, i.e. turn their generalities into specifics that are fairly finite and objective, is challenging and requires a really experienced person in the room (hopefully it's the metrics person). For a metrics person, the task is to turn a big generality into one or more littler, more objective ones, and continue that process until something pops out that can somehow be turned into reliable data and quantified. Sometimes the "data" version is right at the top level. Sometimes it's several levels removed. Sometimes it's nowhere. And often (another topic) it leads to surveys (because it's really just opinions) or crossing to other channels.

Surprisingly often, these “define success” working sessions for web analytics are the first time that the people in the room have actually had to work through, in their own minds, what they are after and how they’ll know it when they see it. Seriously. It’s amazing to me how people can do their jobs without operationalizing their goals, but they do it every day, all day, and seem successful.

The flip side of operationalizing is living with ambiguity, which is a valuable skill in itself. The curse of a mentality that likes to operationalize is the danger of being compelled to operationalize. "What do you mean by 'I love you'?"

One reason I like being in analytics is because the metrics person is often the one who pushes others to do this kind of mental exercise. In fact, I’ve seen metrics people labeled as “pests” about it (by those who are comfortable with ambiguity) … eventually accompanied by “that session was hard but it was incredibly helpful and productive.”

It’s pretty neat when, in one of these sessions, we reach a point where the operationalized objectives start to easily turn into actions … “we need to do more of such and such because it’s clearly related to an important objective” … “we have to think about whether we should be doing such and such because its relationship to the objectives is really pretty thin; it's cool but coolness isn’t enough.” Or, simply, "we won't be able to measure that unless we do such and such." It is NOT easy to get to this point in the discussion.

I dislike, I mean REALLY dislike, multi-syllabic jargon but “operationalize” is one to keep.

Friday, October 24, 2008

The few blogs I like

I have almost a hundred analytics-related blogs in my reader and I'm realizing there are only a handful that I refer people to, usually for individual posts. So I'm going to list them here.
the MymoTech blog - I have no idea what "mymo" means but this blogger, Michael Helbling, is awfully good at telling it like it is ... when he actually posts, that is. His description of his experience with the TCO (Total Cost of Operation) of Google Analytics is one of the few done anywhere and it lays it out beautifully.
The Big Integration - one of two blogs by Jacques Warren, who consistently seems to add something to the general conversation. I give his blogs points for a fresh perspective almost every time. In other words, his writing tends to yank my leash a little bit.
The WebTrends Outsider - written by a group of non-WT employees who act like product insiders. Clearly there is a ton of WT experience here, an ability to think outside the WT box, and some kind of crazy urge to help people. And a little bit of poking at WebTrends. Somebody's gotta do that.
Gary Angel's blog. The postings tend to be really long and multi-parted. But he's the most original thinker in analytics, IMHO. Not because he's coming up with original ideas (he does that), but because he pulls existing concepts and thoughts from other fields and gets them to work with web analytics. His fault, if any, is in not referring to those other fields and therefore maintaining our delusion that we are doing something special and unique. I think a lot of web analytics people are bottom-up --- they migrated into analytics and flourished there. I have a feeling that Gary Angel is one of those people who was already flourishing in other, bigger areas and migrated to analytics where he overlays intelligent adaptations of what he has absorbed elsewhere. It takes a special kind of mind to be able to do that. I save his blog for when I have the time to read it more than once. It makes me walk around to other people's offices at work saying "hey, you know, we oughta ..."
The Bruce Clay Inc. analytics blog. When I grow up I want to work there, because they probably know everything. If I can keep up with their blogs (analytics, search stuff ...) then I feel like I have a chance of keeping up, period.
Eric Peterson's blog. Eric has two things going for him that put his blog on this list. 1) He's very very smart which means he thinks and writes really well. 2) He's a true insider in the industry, possibly even "an elder statesman" and deservedly so. #1 is good enough all by itself. #2 adds the spice - early knowledge, the confidence to make some controversy (in a good way), and a great ability to see things with seasoned perspective. His recent take on Google Analytics is a perfect example.
Kevin Hillstrom's blog. He comes out of the catalog world and, please believe me in this, the catalog world is incredibly important, and different. And old. He knows data and selling and writes wonderfully and he doesn't pull punches. My new go-here-first blog as of February 2009.

Sunday, August 24, 2008

Google Trends for web sites? No thanks

Here's why I stay away from this potential source of great competitive information, at least for now: I don't trust it at all.

I compared GTfWS data to our own data for the most prominent site for which I have solid web traffic info (but not Google Analytics). I analyzed seven months of our data, looking for a week-by-week stat, any stat, that would produce a trend line that resembles what GTfWS produced for the same period, same site. Unh-unh. Nope. No way. Zip. Nada. Neither the trend line or the quantities resembled each other. No stat I tried matched, certainly not the particular stat that Google Trends says it's showing - Daily Unique Visitors (per week).

I don't know from where they get their estimates. I know I've seen other people's similar analyses that corresponded very well - but they are sites that use Google Analytics. I know that my Google Toolbar connects to its home base every time I click on a page, even though I have "Send usage statistics to Google" turned off in the settings. Maybe these are related. Don't know, don't care. For competitive data, I'm staying away.

Thursday, February 21, 2008

One True Thing

If you know me personally at all, you know that I have more fun on my job than a person has a right to expect. A lot of the reasons boil down to: talented and conscientious coworkers; smart and ethical management; a tool that lets me stretch my web-analytics legs; and my own quirky personality that's addicted to getting insights from quantification in general.

There's another piece - the clients and the fact that, as an agency, we have a variety of them.

This means we also have a variety of types of relationships with those clients in the analytics corner of the business. Our "deliverables" run the gamut:
  • setting up their reporting tool and letting them look at the tool's output (with ongoing help as needed),
  • delivering Excel scorecards of varying degrees of detail that allow them to skip looking at the tool's output,
  • delivering the above plus written interpretation or summary or recommendations,
  • deep-dives into a single, one-time analysis,

and my favorite:

  • the free-ranging relationship that has come to be known here as "One True Thing" reporting.

I want to talk about the last one because it's just too enjoyable to keep to myself.

The name "One True Thing" comes from a smart and creative client person with a limited budget who wants it all. He appreciates that the detailed scorecard is important but wants to skip the other time-consuming deliverables while still getting at important stuff that he can actually use, whether to influence decisions or impress his organization. (Hey, both of these are really important!)

He said, "If I can get just One True Thing every month out of analytics, something that's interesting and helpful, then the whole effort is worth the money we put into it."

What resulted was very simple. We have a four-person phone call every month, a few days after the detailed scorecard is delivered. By the time of the call, he's looked at the data we send and may have questions or head-scratching puzzlers. He and his site manager also talk about conundrums they're dealing with in his business (where analytics might help), or recent events and changes for which they want to measure effectiveness. On our end, we might talk about new metrics or methods we've been playing with.

He and his site manager know his business needs, I know how to express his issues as analytics problems (I hope), and in addition our engagement manager and I each have tons of web experience so we throw that into the discussion as well. There's a lot of "what the heck," "I wonder why," and "wouldn't it be cool if" going on, and by the end of the phone call we've scoped a "One True Thing" topic that our analytics team will investigate before the next phone call. And, of course, we go over the outcome from the last One True Thing question, usually with just an informal email and some graphs instead of a formal report (saves billable time). The best part is if the analysis gets them to say "That's great! Just what we needed!"

A lot of mutual education goes on. We've all gotten better at these calls over time. The two of them have become more analytics-savvy and for our part we understand their business better than ever before. They are getting a lot for their money and we are getting to poke into analytics questions that we wouldn't have thought of. Along the way we get to invent new analyses and maybe add them to our overall analytics practice. We're also proposing this kind of relationship to other clients and the One True Thing Phone Call has become a bit of a standard. Or at least we'd like it to be.

The web analytics world tends to be a little too formal, IMHO, emphasizing quantity of data and dashboards as our output when what might help most of all would be these kinds of conversations that stress questions, answers, and insights without running up a big bill. Obviously, the point of this blog entry is to suggest that more people get back to basics like this. Partly because it's a good thing in itself, and partly because it's so much fun.

Saturday, January 19, 2008

Web 2.0 and usability

I just discovered a piece by Jakob Nielsen, the sometimes controversial usability pundit, who wrote a provocative article on Web 2.0.

His main point: “If you focus on over-hyped technology developments, you risk diverting resources from the high-ROI design issues that really matter to your users …”

Early in the article, he provides a working definition of Web 2.0 as a structure for the rest of his article:

- "Rich" Internet Applications (RIA)
- Community features, social networks, and user-generated content
- Mashups (using other sites' services as a development platform)
- Advertising as the main or only business model

The rest of the article is about the usability risks for each of these facets of Web 2.0, in the context of five kinds of web sites – marketing, e-comm, media, intranets, web applications.

Whether you agree or disagree, it’s worth reading. My main reaction was a big hurray, simply because of the refreshing clarity with which he defined his terms and deconstructed the issue. He's a hero to me just for that reason (another writer who jazzes me in the same way is Gary Angel over at SEMphonics). If I could find more people who work within a logical structure when they write, I would be a lot less confused in general.

It's almost a footnote that I agreed with most of his conclusions.

Monday, December 31, 2007

The DoubleClick bind

It's been an ongoing struggle to reconcile DoubleClick's (DFA) stats with what we get elsewhere, for example from Google Analytics, WebTrends SDC, and server log files. Most of us trust the data provided by the tagbased analytics and mistrust DoubleClick's numbers, but it would be nice to know what really causes the differences. And they are BIG differences, because DoubleClick's numbers (clicks) can be 100% higher than Google Analytics or WebTrends (page views).

It becomes an issue with people who are invested in spending money on interactive advertising. They are, justifiably, wanting to take the DoubleClick numbers at face value. It makes the clickthrough rate look a lot better.

We're trying to look into it, with a lot of help from WebTrends custom reports, server logs, SDC logs, and Google Analytics as a sort of backup.

We know this:

DFA's numbers are 'way higher than the tagbased data, as much as 100% higher.

DFA's numbers are very similar to the numbers in server logs.

The two tagbased reports, WebTrends SDC and Google Analytics, agree with each other but are usually only half the size of DFA stats and server logs.

When we carefully compare individual page view events between server logs and SDC (tagbased) logs, we can separate out the "extra" events that don't show up in the tagged logs. Close examination does NOT show a pattern of repeated User Agent or IP information. There's no obvious evidence of one or two entities doing a lot of clicking. No easy answer about bots or cottage industry clickfraud.

However, we are finding one interesting thing that points to a possible explanation, or at least a corroboration of the notion that DoubleClick's numbers contain a lot of non-humans. DoubleClick hits are, as you know, marked with extra parameters. So, a given banner destination page will have, in the logs, some hits with DC marker parameters (coming from banner clicks) and other hits without those marker parameters (coming from non-banner sources). We did separate analyses of these two groups - only analyzing hits that were also first hits in a visit.

We found that if the hits had DC marker parameters, the discrepancy between server and SDC/Google logs was a huge one - in the up-to-100%-more range. But if the hits did NOT have DC marker parameters, the discrepancy was really quite small --- server and SDC/Google more or less had the same numbers. In other words, the hits that appeared in server logs but did not appear in tag-based logs were mostly banner hits.

We all know that there are several reasons for an event to appear in server logs but not in tag logs. Two big ones are: 1) an entity that does not execute javascript will not appear in tag logs (rather, the identity of the page being requested won't, although the hit will appear as a site hit), and 2) an entity that does not request images will not appear in tag logs.

It looks to us like a big proportion of clicks on banners are by entities that don't request images or don't execute javascript - basically, bots.

Someday, I'll talk to an insider who understands what's going on and who runs these bots and why. Could be legitimate, or not.

In the meantime, I'm continuing to say that WebTrendsSDC/Google Analytics numbers for banner traffic are the ones to trust, because they are probably the human traffic. DoubleClick numbers contain vast amounts of non-human traffic.