Sunday, August 27, 2006

Getting hands dirty - log files

Somebody asked me about my previous log file comment. Hands-on work directly with logs is a best practice that I feel strongly about. It will help avoid problems later in setup, help troubleshoot, and teach you about the inner workings of your analysis tool.

Here is a minor rant:

Open a log with Excel (space delimited), sessionize it by sorting on IP/UA or cookie or whatever, and look at individual visits carefully. There are a lot of things to look at, but landing page redirects, GET/POST actions, status codes, and cookie consistency are four big ones that pay off most of the time. Of course, check whether the important fields are being logged, perid.

For maximum benefit do this: go to the site yourself, perform all the important-to-know-about actions, and examine your own visit to make sure that the logs look like what you expect. How to find your own hits? You can find your own IP (use ipconfig from command window, usually) or do it this easy way: when you arrive at the site, refresh the home page --- but FIRST add something to the home page URL in the address window - like the parameter "special=chris-g-tracking-visit". Then just search for that string in the logs and you'll be able to pull out the rest of your visit using the cookie or IP field.

A variation on log-diving is creating a test log, small and 100% understood by you. It's the fastest way I know to debug the more difficult reports like campaigns. It's also the fastest way to understand the more bizarre statistics in WT like the "Most Recent" ones. (If you make one, make sure you have an extra last line that is a day later than the next-to-last line. WT won't analyze this fake line but it's critical to have it there.)

End message: garbage in, garbage out. Open that trash can and get dirty.

Wednesday, August 16, 2006

Back to basics ... again

The best first thing you can do to understand web analytics is to open a log file and mess around. It will either help you understand your site, or it will allow you to skip over the first fifty or so stupid questions about your analytics tools.

Then when you get really good at analytics, the best next thing you can do when you get stuck is ... open a log file and mess around. You never get away from it.

Wednesday, August 09, 2006

Functionalism - why I like it

Gary Angel is doing some writing on a concept he calls functionalism. Ordinarily I steer clear of people slapping names onto things, especially when the names are academic retreads. Yes, I guess it's better than inventing a new word. But here's the thing. Gary Angel truly is working on delineating an out-of-the-ordinary approach and the name "functionalism" even sorta fits.

In any field, an emerging construct has to have a name in order to turn into a building block. Analytics Functionalism has that kind of potential.

What Gary Angel is talking about is, at its simplest level, grouping pages by functions then looking at the traffic patterns of those groups. It's a drill-up way of doing analytics, as opposed to the drill-down catechism that drives some analytics vendors. But it's a lot more than that. First of all, the groupings or page types he has chosen to share are thought-provoking and he even has little corollaries for them here and there. Second, when I say "looking at the ... patterns" I mean he has specific analyses in mind, as opposed to just "smart eyeballing" as my friend Lou puts it. More about that from me after I do some more thinking.

You MUST read his white paper and his blog. More than once, please. Then we'll talk.

Now, to digress a bit. You may have figured out after, what, three whole posts? that I am not fresh out of school. Too true. My analytics road has gone through four different early-stage, human-factors-oriented fields to get here --- environmental psychology, white collar productivity research, facility management, office ergonomics, and now it's web site traffic analytics. I must be attracted to small fields still in the creative phase or something. I have helped start up professional associations when the referenced professions didn't even have a common name yet (I'm taking a pass on the current Web Analytics Association). I've worked on the first textbooks in these fields, have helped write federal, state, and international standards, and more. And best of all I've had the chance to hang out with the thinkers and movers in those fields - the ones exuding the above-mentioned creative juice. People I can't name because I don't want this blog to show up when people google them for term papers.

So I'm saying that I think I know structure-emerging-from-entropy when I see it, and this is the genuine article. If enough people get it and use it and take it further, we'll someday have a real profession instead of a vocation.

Friday, August 04, 2006

I want my within-visit associations!

Yesterday I had a brief correspondence with Robert Allison of SAS, who has been posting several really good dashboards on Dashboard Spy ( The one that I wanted to know about was this one because it had to do with web analytics and because some of the stats were really cool yet simple. I wanted to know the identity of the program that had produced, from raw site activity, the underlying data that his dashboard was based on. As it turned out, he got the data from Stephen Few's book. I have that book on order so I'll check further when I get it.

Meanwhile, the point of this is my frustration with web analytics programs today, which are for the most part dressed-up tabulators with a sessionizer tacked on. You get the equivalent of cubes for data within a hit, hit data with first-hit extras (referrer etc), and some visitor history dimensions.

I want associations more than anything else. I got out of grad school umpteen years ago and this was an old idea back then.

Allison's (and presumably Few's) dashboard had, for example, these two lists:

Top 10 [pairs of] Products Purchased Together but Not Displayed Together
Top 10 [pairs of] Products Displayed Together but Not Purchased Together

Those two tables are worth a lot not only to web managers but product managers. Try getting them out of a web analytics program.

PostScript: In response to my question, Rob Allison remarked that the stats didn't seem very sophicated to him. He's right. But then, he's in R&D and let's not forget that he actually works for SAS (!) which means what is sophisticated for him is some kind of nirvana for us web analytics drones. But let's at least get web analytics out of the within-hit crosstab world.

Wednesday, August 02, 2006

Convergence on Excel --- who knew?

Having cut my analytic teeth on multivariate analysis tools, it's kind of a shock to realize that I've ended up focusing on a spreadsheet program. And that I'm not too unhappy about it. Of course, it means tossing out just about every consideration of statistical significance, interactions, or assumptions about normal distributions. Luckily with samples the size I'm working with now --- millions --- I don't feel too guilty about TOTALLY BETRAYING EVERYTHING MY PROFESSORS TAUGHT ME. haha, they're still struggling to get samples in the low four figures.

Anyway, at AdTech last week I "taught" a little session on some of the small things I do with Excel to explore the data that fall out of web analytics programs like WebTrends, Hitbox, Omniture. The response was quite nice even though none of the tricks were earthshaking or anything. The Powerpoint is here: