ClickTracks
 
JavaScript vs Log Files - The Great Debate

Introduction
ClickTracks is a web analysis program heavily biased towards the presentation of marketing data. This document examines how the implementation choice will determine the quality and type of data made available for common marketing analysis needs.

The architecture of ClickTracks is extremely flexible. The software can gather the raw data from log files, from JavaScript or from a combination of the two. Since the software uses simple flat files and an open XML hierarchy, the implementer can easily devise complex systems that optimize use of each data source and combine them.

This document outlines the differences in how ClickTracks processes log files and JavaScript data. We assume that you, the reader, are already familiar with the general concept of log files (a file generated by the server and read later) and JavaScript (code inserted into each page of the site that pings a different server, often hosted by a third party.)

Log files
Log files have been around since the beginning of the web, though they've just recently become interesting to marketing folks. Basic analysis of web data started with open source programs like Analog (note: the author of Analog is Dr. Stephen Turner—who is also the CTO of ClickTracks).

Within these early programs, log files could easily yield useful data like:

  • the bandwidth consumed on the server
  • 404 errors
  • peak usage
  • etc.

More complex data requires more sophisticated analysis. Modern log analyzers like ClickTracks can also determine:

  1. Visitor sessions: These can be determined with acceptable accuracy if the analysis software is able to strip graphics files, then join distinct pages into a single visitor session, actively managing issues caused by dynamic IP addresses and session timeouts. ClickTracks uses well established heuristic algorithms for managing all this.
  2. Session accuracy: If a session cookie is available, session accuracy is improved. ClickTracks Analyzer can handle standard session cookies generated by JSP, ASP and PHP. A custom session cookie can be configured in ClickTracks Pro.
  3. Unique visitors: ClickTracks Pro can calculate unique visitors if a unique persistent cookie is available, and the Pro server is configured (see the ‘cookie tracking’ in the Pro Server). A database is built into Pro that can manage the cookie and map back to an original campaign, even if many weeks have elapsed between clickthrough and purchase.
  4. Heuristics: Tracking visitors across multiple sites/domains is usually done by falling back on heuristics when the cookie is dropped. Cookies are often not transferred when the user moves from domain to domain, so a fallback mechanism is needed.
  5. Robots and spiders: Robots and spiders are automatically filtered/placed into other reports. In ClickTracks, a robot is identified both through simple useragent checks, and also using more detailed pattern recognition within the session.
Problems with log files
Much has been said about log files and their disadvantages. To summarize:

They aren't plug and play: Although ClickTracks can take advantage of both session and persistent cookies, the fact remains that these must be present in the log file…and it’s the responsibility of the web site manager to set them. While there’s nothing complicated about doing this, some companies can't gather IT resources needed. In the long term, online businesses should do this themselves, but other factors delay/prevent them.

They aren't 100% accurate: Caching of pages by ISPs and proxies can distort the data and lead to inaccuracies. For a while this was a major differentiator promoted by vendors selling only JavaScript solutions, which suffer from caching problems less. In general the amount of cached pages has declined as the cost of maintaining the cache hardware has outweighed the cost of the bandwidth saved. Nevertheless, log files are somewhat inaccurate.

JavaScript
JavaScript (sometimes knows as ‘client side tagging’, ‘page tagging’ or erroneously as ‘cookies’) requires some code to be inserted into each page that will allow that page to be tracked. When the page is loaded in the end user's browser, a request is sent to a server (often part of a third-party service) and the data is collected.

JavaScript quickly gained popularity because of the ease with which one can generate reliable visitor session data. Since the script is able to set its own cookie, companies that need good session data can get it, without needing the IT department to set a session cookie. When IT resources are already spread thin, it’s useful to have responsibility for setting the cookie pushed to other places. Extending from this, JavaScript can also set a persistent cookie.

JavaScript is also able to more easily parse data from the contents of the page when this is often not available in the URL. Shopping cart total purchase value is a good example.

JavaScript-based tracking also nicely sidesteps the problem of tracking over multiple domains, since the session cookie exists inside the domain where the data is gathered, and not the domains of the site.

Problems with JavaScript

  1. It can't capture everything: Some server activity like redirects, PDF downloads etc. are opaque. There’s nothing there.
  2. It doesn't offer technical stats: Log file analysis is still needed for technical stats like bandwidth/404s. You always end up needing both.
  3. It's trapped in third-party Neverland: In almost all cases your data is trapped on a third party service. As you grow and become frustrated with your present system, you must weigh up the problem inherent in switching.
  4. It's not 100% accurate. While more accurate than log files, it’s still not perfect. For example, JavaScript errors, DNS failures and other glitches result in no data being recorded, while a log file would be fine.
  5. It causes instability: Pages become more unreliable as more JavaScript is added. The problem manifests as less reliability, rather than easily identified failure points.
  6. It presents cookie issues: The cookies issued are ‘third party’ in that they do not originate from the domain hosting the web pages. For session cookies this is OK but persistent cookies require special handling through P3P and compact privacy headers. Our present JavaScript code handles this, but implementers should be aware that future changes to IE and other browsers might clamp down on third party cookies.
ClickTracks combines both
ClickTracks is one of a handful of vendors that supports both the log file and JavaScript approaches, and gives the customer total flexibility. Log files are simple, effective and inexpensive to process, and in truth most of our customers choose this approach.

Some customers also need JavaScript because of problems with tracking across all their domains, or their need to parse ROI data from a complex shopping cart, or simply because they prefer the convenience. In this situation we supply our customers with the JDC ­ JavaScript Data Collector

The JavaScript Data Collector is a set of simple open scripts that permit the customer to use our well proven JavaScript code, and implement and host this themselves. This eliminates the concern of being dependent on our servers, and having data trapped on a third-party system. The JDC can be freely mixed with log files—with some subdomains tracked by log files and some by JDC—all combined into a single view and accurate tracking as the user moves from one domain to another.

The JDC runs as a CGIbin program and is written in Perl, with an admin interface in PHP. All the code is open and customizable by the customer. A typical installation runs on an inexpensive co-located Linux server. It’s also possible to install this on a Windows server.

It’s your choice
Web analytics is a complex subject, but the underlying technology is relatively simple. There are, after all, only two ways to get the data. As a company ClickTracks strives to describe our technology openly, and to avoid trying to confuse customers with complex claims of technological superiority. We claim superiority in the way the software presents the data, but the underlying process of counting visitor stats is mature and stable. The techniques are unchanged for the past 5 years.

Our open approach extends to your choice of how to gather the data. We aim to help you make the right choice for your business.


Which one is right for you?
ClickTracks Analyzer ClickTracks Optimizer ClickTracks Pro



Questions?
1.877.773.2249
  international >>
Make the most of ClickTracks from the comfort of your own desk...for FREE!
How does each product stack up? See them side by side.
ComputerWorld Innovative Technology Award

more awards