Simple fact: bad bots can screw up your web analytics data.Interesting post from Marshall Sponder this morning, himself referencing Jay Harper on SEOmoz and Judah Phillips. So I'll had my two cents from the "bot" side of things, specifically regarding the Web Analytics Solution Profiler (WASP).
No surpriseFrom a server-side perspective, IT has known since the early days of Yahoo! that crawlers would affect web server logs. If you want some historical tidbits fun, look at those early posts:
"I count the accesses to my page to see if it's being used . Similarly, I browse through the access logs to see _who_'s using the page" then someone replying "This ablity to know *exactly* what someone looks like is going to be very very sigificant down the road." (March 1995: Visitor counts?)
This is not new, it just seems marketing took a long time to find out :)
The web analyst roleData cleansing and validation is an essential activity of the web analyst job and whenever there's a spike in traffic you should be able to explain it. As Avinash said a while back: "data quality sucks, lets get over it".
Depending on your experience and skills, and of course the web analytics solution you are using, it might be fairly easy to identify misbehaving visitors and spot outliers. The next step is to segment the data to exclude what shouldn't be there. Not "delete", "exclude". I recently saw a post suggesting to create Google Analytics filters to completely get rid of non-US traffic for a US-centric website. Don't do that... you still want to know where your unqualified traffic is coming from! Be it outside your geographic market, bad keywords or referrers or anything else, what you think is "unqualified traffic" can still help you optimize your site and even discover new opportunities. Segment, segment, segment...
The upcoming version of WASP will include the following options: