WASP: a good tool to audit site tags
There's an interesting thread on the Yahoo! Web Analytics forum "What is a good tool/methodology to audit a large site for tags?". As often, my answer was getting long! It started like this:
That would be a very poor guy's (and very bad) way of doing quality assurance! Here's an analogy: would you trust your site works correctly just because you can see the company logo on the page?
It would only tell you the source code is there, but wouldn't guarantee it works and is actually sending the right data. I think it's how EpikOne SiteScanGA and other basic tag checking tools are doing it: they don't run the code, they simply look for a string in the source file.
Someone suggested that since most sites are based on templates (CMS, ecommerce catalogues, etc.), why not check just one page of each template?
Of course, you don't want to inflate your site statistics with a visit that would span thousand of page views. WASP already offers different ways to filter out your data: modify the User Agent string or block your IP address. Future release will include a "stealth mode" that will actually block the call from sending the data.
I'm already receiving lots of requests for the commercial version but I want to make sure it's stable and well tested before selling it. I can tell you I've run scans of 30,000 pages in a single pass without major issues.
Note: I'm putting the final touch on a video that will demonstrate WASP crawl in action. Stay tuned!
"I'm starting a new job with a big, complicated site, and want to get a handle on its data collection strengths and weaknesses. Any suggestions?"
Potential solutions
Even without lots of details, and as others pointed out, we can identify a couple of potential solutions. Of course, WASP comes to mind first... but let's look at other alternatives:- Maxamine's flagship product, now owned by Accenture, would come to mind. But the price tag is high and it seems it's not available as a standalone product anymore. So one would have to hire Accenture consulting, which is probably an issue for lots of smaller companies, web and analytics agencies, etc.
- EpikOne's SiteScanGA offers a Premium subscription. But unless your site is tagged with Google Analytics, this is of little use... Furthermore, SiteScanGA doesn't actually look at your site; it looks at whatever Google Search have in their cache. So it won't scan any "no-index,no-follow" areas of your site (secured sections, transactions, etc.) and since Google might not have your latest page version, this further lower the value of SiteScanGA.
- Debugging proxies and manual method: if you are a techy, Charles and Firebug are your friends. But who would want to walk through tens of thousands of pages manually?
Poor guy's solution
Someone suggested that "if one had access to the actual server, could you not literally download the entire site to a local machine, then do a grep for the relevant text?"That would be a very poor guy's (and very bad) way of doing quality assurance! Here's an analogy: would you trust your site works correctly just because you can see the company logo on the page?
It would only tell you the source code is there, but wouldn't guarantee it works and is actually sending the right data. I think it's how EpikOne SiteScanGA and other basic tag checking tools are doing it: they don't run the code, they simply look for a string in the source file.
Here comes WASP!
As you can see in a case study I made on a site using SiteCatalyst, there are things you would simply not be able to find if you don't actually load the page and run the code. From the start, WASP's goal was to run "in situ" of a real user. This makes WASP the most unique and robust way of doing quality assurance.Someone suggested that since most sites are based on templates (CMS, ecommerce catalogues, etc.), why not check just one page of each template?
- "You don't know what you don't know": ask a site owner if all their pages are tagged, at best they will say "yes", usually they will say "I think so"... WASP often tells otherwise by crawling all pages and finding whole sections that were missed! Static pages, but more surprisingly, transactions are usually the area were tags are missing or bad.
- "All pages using a template are alike": yes and no. Since tags are often populated automatically from the data filled in the template, there might be cases where unexpected values are set: special characters, blanks, missing values, wrong data type, invalid range of acceptable values, etc. Since the person who controls the template is not the same as the one who populates it with data, the likelihood of errors is significant.
Of course, you don't want to inflate your site statistics with a visit that would span thousand of page views. WASP already offers different ways to filter out your data: modify the User Agent string or block your IP address. Future release will include a "stealth mode" that will actually block the call from sending the data.
Ok, I need WASP now!
Hold on to your hat for a few more weeks. I'm literally working overnight to deliver WASP v1.0. In the meantime, you can get started with the beta release and help identify bugs, issues and feature requests in the WASP support forum.I'm already receiving lots of requests for the commercial version but I want to make sure it's stable and well tested before selling it. I can tell you I've run scans of 30,000 pages in a single pass without major issues.
Case studies wanted!
I'm also looking to write other case studies. Put WASP (and me) to the challenge! Send me a note with your web analytics implementation audit challenge and I will consider it for a future case study. For free! You just have to allow me to mention you/company/client in my case study. Good deal isn't it! :)Note: I'm putting the final touch on a video that will demonstrate WASP crawl in action. Stay tuned!


6 comments:
Hi Stephane - you know I'm a fan of your solution and of you, so I want to correct one thing. The former Maxamine/now Accenture solution is still available as a stand-alone solution. While we can now provide consulting services as well (something we could not do before), people can still purchase the software itself. For people looking for cheap or free, or for organizations with few and relatively small sites, you are right - it is not a good fit. It is an enterprise level product and our customers tend to be organizations with very large web operations.
So there's room for all of us :-)
Best,
Debbie
Debbie: thank you for following up on this! I'm glad to hear the product is still available.
I heard some people are using Maxamine and WASP concurrently since the two approaches are slightly different.
I agree, there's room for all of us! :)
Hi Stephane,
I only discovered WASP recently and it is a great tool indeed. I still have to see how I can potentially use it for doing large audit on our site - the recursive feature seems not to work correctly on our sites (still have to test further however). Looking forward for the final commercial version.
Now regarding quality audit - WASP is good to detect that tags are run and view/report values sent. Great. But what is the best way to run "stress"-testing against your tags? I mean to know in what % your tags are run correctly, especially in different conditions (slow connection, heavy workload on server, pages with heavy content,...).
Regards,
Michaël
Michael: depends if you want to stress your website/web application or the web analytics tool itself. For web testing, you could look at high end tools like Loadrunner, or use the built in feature in MS Visual Studio or some more open source alternatives (search for "open source web stress test").
Regarding stress testing the tool itself, other then checking your page load time/latency, I wouldn't really advise doing it. In fact, the thing you should watch for is to make sure your web analytics solution doesn't interfere with the "perceived" user experience, regardless of what the technical measurement tells you. Most web analytics vendors take great care in this respect, so unless you really mess your implementation, this shouldn't be an issue.
Well we put our tag scripts at the end of the page so interference is minimal (from a customer point of view).
But - maybe I'm saying something silly :-) - how can anyone evaluate in what % the script is correctly run in real life condition.
Is the script run correctly in 98%? 95%? 80% of the case? (when Javascript is activated).
So I can say to the business: data collected for the site is XX% accurate.
Or am I chasing the impossible? :-)
Cheers,
Michael
Good point! Some reasons why the tag wouldn't fire:
- JavaScript error on the page
- User exit the page before the onload event
You wouldn't be able to reproduce this from stress testing.
What I would suggest is to do an A/B test:
a) put the tags last
b) put the tags first (it shouldn't even be noticeable from the user)
Let it run for a while and then check the difference...
Or better yet, double tag the pages at the top and the bottom and send each one to a different report suite/profile/account.
You might want to check out a study that was made by StoneTemple Consulting at http://www.stonetemple.com/articles/analytics-report-august-2007.shtml
Post a Comment