Friday, 13 March, 2009

WASP case study: incomplete page HTML

Symptom:

The WASP crawler report shows several pages from a site with "None" as the solution name. However, these pages are actually tagged, and Google Analytics reports the data correctly.

In Firefox itself, the WASP status icon and sidebar show the pages as tagged with Google Analytics, but again the crawler reports "None" for the same pages.

Diagnostic:

When WASP crawl reports problems with tags on some pages, it indicates a potential issue that requires further investigation. Even if the page appears to be loading correctly at first glance, we need to seek for a plausible reason why WASP could have been unable to capture the tags only on some pages while others are being reported fine.

Resolution:

When loading some of the pages being reported in error, I noticed the page loading indicator shown in the status bar was taking a slightly longer time than usual even if the page itself appeared to be loading correctly. Also, the WASP status bar was effectively showing that Google Analytics was present. A simple "View Source" revealed something quite interesting: the HTML for the page is incomplete and stops right after the GA tags.

Conclusion:

This issue is a perfect demonstration of how a page might appear to be loading correctly from the user perspective, but still technically be in error. This seemingly minor issue does, in fact, have other side effects since a page that is not well formed HTML page can definitely impact your SEO.

2 comments:

Very interesting. So is there a way for the WASP status bar icon and the WASP crawler to display the same information?

Or are they *intentionally* using different methodologies to report a page's status? That is, is the status bar icon only meant to show the tag is present, while the crawler looks at a page in a slightly different way?

The fact they didn't show the same info in this case isn't intentional. The crawler use a timeout and will move on to the next page after a certain delay. Since the pages in error seemed to take longer to load, maybe that's why the WASP crawler didn't catch the tag.