As I work in search engine marketing – I frequently test my sites’ page load times just to make sure I’m getting my content to visitors as quickly as possible. I don’t spend a lot of time worrying about my blog – until I posted an embedded pdf on a post recently and noticed a significant performance drop off. I ran this site through the great tool at Webpagetest.org. I was appalled to see 10 second+ load times!!!1 I dug into the numbers and saw that my scribd embed was hogging bandwidth like a sonofabitch. I knew there are basically two players in this field – scribd and docstoc so i figured I would test page load time with each. The results are pretty interesting:
DocStoc loads a flash object – while scribd loads up a remote page via an iframe. Intuitively one might expect flash to perform worse – but in this case the biggest culprit is the sheer volume of third party scripts scribd is embedding. Depending on browser between 49-63 request compared to DocStoc’s consistent 12. Additionally scribd serves up over twice the total page weight as does DocStoc (257kb vs 552-743kb). Scribd clearly isn’t concerned enuogh about user experience with their embedded documents.
Some notes about the test – first off its clearly unscientific – but the difference in load time was such that further tests seem pretty pointless. I used the default embed code from each site – and uploaded the same simple text document to each (a simple robots.txt file). Each page was identical and loaded no other external resources. The pages can be seen here and here.
I’m amazed at how many times I’ve heard some variant of: “But Senator/Governor, your website said in 2006 that (you hated children|opposed access to birth control|approved of the individual mandate).” There’s a simple solution to letting some things slip down the memory hole… namely blocking the internet archive bot. If you are a politician you should IMMEDIATELY add the following to a robots.txt file on all of your server’s document roots:
ia_archiver is the name of the crawler for the internet archive. Luckily it is a friendly bot that obeys robots.txt directives. The other three bots listed are just remnants from my default robots.txt file. Dotbot blocks Seomoz’s opensiteexplorer, MJ12bot is MajesticSEO’s crawler and Ahrefsbot is for – you guessed it ahrefs.com. As these are primarily seo intelligence tools, blocking them makes sense if you are actively engaging in seo. Politicians mileage may vary.
In addition every page should carry the following header:
Finally – if you find something you need removed (on your own website) on Google you can always follow these directions and on Bing follow these directions.
While reading through the Google Analytics Event Tracking Guide, I came across this nugget:
In general, a “bounce” is described as a single-page visit to your site. In Analytics, a bounce is calculated specifically as a session that triggers only a single GIF request, such as when a user comes to a single page on your website and then exits without causing any other request to the Analytics server for that session. However, if you implement Event Tracking for your site, you might notice a change in bounce rate metrics for those pages where Event Tracking is present. This is because Event Tracking, like page tracking is classified as an interaction request.
Of course having read through several SEO related posts identifying bounce rate as a ranking factor (or at a minimum a quality signal), I devised a way to game it.
<div id="header" onMouseOver="pageTracker._trackEvent('bounce', 'bouncecheck', 'Look Ma No Bounce');">
I figure a mouseover on my header will probably be triggered enough to dramatically drop my bounce rate without looking too artificially manipulated. Plus I wonder if a body onLoad statement would be overkill, or trigger faster than the gif request. Anyway after implementing this for one day you can see the huge difference below.