


What is a Visitor Anyway? 1 2 3 4
Part III of IV: What is a Visitor Anyway?
By Dane Christensen
I, Robot
When you think "visitors", you think "people", right? Actually, a very large number of the visitors to your site aren't people—they're robots. Also known as spiders, crawlers or bots, these automated programs scour the web, searching and digging for information on behalf of their masters. There are many, many robots designed to perform various tasks-like indexing sites for search engines, harvesting e-mail addresses for spammers, or gathering data for price comparison sites. But there is one thing that all robots have in common: they never buy anything.
Since robots aren't people—they don't look at your pages, don't think about them, don't navigate around based on interesting links, and never click the checkout button—does it make sense to mix their behavior in with the rest of your human visitors' data? No. If their activity were mixed in, the quality and accuracy of your data would be severely compromised. Fortunately, ClickTracks goes to great lengths to ensure that robot visits don't pollute your data.
Search Engine Crawlers
The most common kind of robot is the search engine crawler, what we consider "legitimate" robots. These crawlers are considerate—they do everyone a favor by identifying themselves in logs as a clearly identified user agent (a user agent is the technical term for a web browser). So for example, requests by Google's robot show up in your logs with the user agent called "Googlebot". This means ClickTracks can easily filter 'Googlebot' out of your logs when analyzing data about human visitors. Conversely, it also gives ClickTracks the data needed to provide the Robot report available in ClickTracks Optimizer and Pro.
More Friendly Robots
Besides the search engines' bots, there are a host of other crawlers dutifully scurrying around the Web in search of useful data. Wherever they go, they politely identify themselves by their unique user agent name-names like libwww, HealthMon, and check_http. ClickTracks keeps track of these friendly robots—currently about 70 of them—and simply ignores them in analysis. They don't show up in the Robot report either—visits from these robots won't have much correlation to actual traffic from real people on your site, so there's not much point in trying to keep track of them.
Tech Bots
Another classification of bots are those that monitor web servers to make sure they are alive and well. They work by requesting a page every few minutes and paging a techie if the page fails to load or takes too long. Because these bots visit so frequently, it's easy to see how they could have a huge impact on visitor counts.
Furthermore, most marketing folks are often unaware that the IT guys have let loose such bots on the site. Be sure to ask your techies if they're using such services and if they are, ask them for the IP address that you can add to 'Exclusions' inside ClickTracks.
The Bad Bots
Finally, there are the illegitimate robots. These "rogue" bots identify themselves as standard user agents, like Internet Explorer or Mozilla, so they appear to be real people. Now, bear in mind that a robot doesn't really need a web browser&mdsah;after all, they can't even see! While they could just as easily identify themselves as another user agent (like all legitimate ones do) they don't. Why? Often, though not always, it's because they're up to no good.
These bad bots are doing things like trying to snoop through private data and other nefarious deeds. They know that if they identify themselves clearly, webmasters will refuse to serve pages to them. So they sneak around the web, disguised as real people, snatching up pieces of data wherever they find it.
Will the Real Visitor Please Stand Up?
But like a science fiction movie, telling the robots from the humans is possible when you look closely. One telltale sign is that robots tend to look at lots of pages for very short periods of time, while people tend to look at only a few pages for longer periods of time. ClickTracks uses various behavior traits to try and filter out the rogue bots so they don't pollute your data. It's very difficult to precisely profile a robot, so no one can get it perfect, but ClickTracks' robot detection algorithm is among the best in the industry.
Robots, JavaScript and Log Files
Another important thing to know about robots is that they don't execute JavaScript—they just read files. That means that if you have a JavaScript method of data collection, you'll never see any data from robots. (That's also why you can only get the Robot Report if you use ClickTracks with log files—not the hosted solution.) If you've been using a hosted solution and you then switch to log files, or vice versa, you could see a significant change in your visitor statistics and all the stats that derive from them.
Whether you are comparing reports generated from logs vs. hosted data, two different date ranges where one had more robot activity than another, or comparing two different web analytics programs that use different methods for screening robots, robots can make a significant difference in your results.
At ClickTracks, we take robots seriously, getting as much useful data from them as possible while not allowing them to throw off the results from your living, breathing, buying human visitors.
continued >>

What is a Visitor Anyway? 1 2 3 4
|
 |



 |
ClickTracks Pro 6.7.3 ClickTracks Pro 6.7.3 (software/log file edition) includes several feature updates, including: forensics for all campaigns, improved user and group controls, and an upgraded Campaign Manager.
Contact your sales rep for details on upgrading.
|
 |
 |



|