Page Load Testing: Embedding With DocStoc and Scribd

As I work in search engine marketing – I frequently test my sites’ page load times just to make sure I’m getting my content to visitors as quickly as possible. I don’t spend a lot of time worrying about my blog – until I posted an embedded pdf on a post recently and noticed a significant performance drop off. I ran this site through the great tool at I was appalled to see 10 second+ load times!!!1 I dug into the numbers and saw that my scribd embed was hogging bandwidth like a sonofabitch. I knew there are basically two players in this field – scribd and docstoc so i figured I would test page load time with each. The results are pretty interesting:

Page Browser Load Time Fully Loaded Req Bytes in
DocStoc Chrome ✓ 0.616s 4.324s 12 258 KB
DocStoc Firefox 1.295s 3.681s 12 258 KB
DocStoc IE8 1.348s 3.900s 12 258 KB
Scribd Firefox 7.800s 8.556s 49 557 KB
Scribd Chrome 8.378s 9.198s 57 552 KB
Scribd IE8 15.269s 15.755s 63 742 KB

DocStoc loads a flash object – while scribd loads up a remote page via an iframe. Intuitively one might expect flash to perform worse – but in this case the biggest culprit is the sheer volume of third party scripts scribd is embedding. Depending on browser between 49-63 request compared to DocStoc’s consistent 12. Additionally scribd serves up over twice the total page weight as does DocStoc (257kb vs 552-743kb). Scribd clearly isn’t concerned enuogh about user experience with their embedded documents.

Some notes about the test – first off its clearly unscientific – but the difference in load time was such that further tests seem pretty pointless. I used the default embed code from each site – and uploaded the same simple text document to each (a simple robots.txt file). Each page was identical and loaded no other external resources. The pages can be seen here and here.

A few items of interest – scribd is including twitter, facebook and google+ external javascripts which is probably not as noticeable on many sites – but still a very heavy bandwidth decision. The one item that stands apart though is scribd’s decision to include a quantcast tracking code ( Looks like a bid to inflate their quantcast numbers – judging by their traffic there.

A Robots.txt Tip for Politicians

I’m amazed at how many times I’ve heard some variant of: “But Senator/Governor, your website said in 2006 that (you hated children|opposed access to birth control|approved of the individual mandate).” There’s a simple solution to letting some things slip down the memory hole… namely blocking the internet archive bot. If you are a politician you should IMMEDIATELY add the following to a robots.txt file on all of your server’s document roots:

ia_archiver is the name of the crawler for the internet archive. Luckily it is a friendly bot that obeys robots.txt directives. The other three bots listed are just remnants from my default robots.txt file. Dotbot blocks Seomoz’s opensiteexplorer, MJ12bot is MajesticSEO’s crawler and Ahrefsbot is for – you guessed it As these are primarily seo intelligence tools, blocking them makes sense if you are actively engaging in seo. Politicians mileage may vary.

In addition every page should carry the following header:

Finally – if you find something you need removed (on your own website) on Google you can always follow these directions and on Bing follow these directions.

Lower Your Bounce Rate With One Line of Code

While reading through the Google Analytics Event Tracking Guide, I came across this nugget:

In general, a “bounce” is described as a single-page visit to your site. In Analytics, a bounce is calculated specifically as a session that triggers only a single GIF request, such as when a user comes to a single page on your website and then exits without causing any other request to the Analytics server for that session. However, if you implement Event Tracking for your site, you might notice a change in bounce rate metrics for those pages where Event Tracking is present. This is because Event Tracking, like page tracking is classified as an interaction request.

Of course having read through several SEO related posts identifying bounce rate as a ranking factor (or at a minimum a quality signal), I devised a way to game it.

<div id="header" onMouseOver="pageTracker._trackEvent('bounce', 'bouncecheck', 'Look Ma No Bounce');">

I figure a mouseover on my header will probably be triggered enough to dramatically drop my bounce rate without looking too artificially manipulated. Plus I wonder if a body onLoad statement would be overkill, or trigger faster than the gif request. Anyway after implementing this for one day you can see the huge difference below.

Now while my method is obviously solely aimed at gaming the system, there are some legitimate uses. A few examples that come to mind where firing off an event make sense are video plays, if the end of a javascripted animation, newsletter signups, rss subscribes and there are probably a thousand more.