A Robots.txt Tip for Politicians

I’m amazed at how many times I’ve heard some variant of: “But Senator/Governor, your website said in 2006 that (you hated children|opposed access to birth control|approved of the individual mandate).” There’s a simple solution to letting some things slip down the memory hole… namely blocking the internet archive bot. If you are a politician you should IMMEDIATELY add the following to a robots.txt file on all of your server’s document roots:

ia_archiver is the name of the crawler for the internet archive. Luckily it is a friendly bot that obeys robots.txt directives. The other three bots listed are just remnants from my default robots.txt file. Dotbot blocks Seomoz’s opensiteexplorer, MJ12bot is MajesticSEO’s crawler and Ahrefsbot is for – you guessed it As these are primarily seo intelligence tools, blocking them makes sense if you are actively engaging in seo. Politicians mileage may vary.

In addition every page should carry the following header:

Finally – if you find something you need removed (on your own website) on Google you can always follow these directions and on Bing follow these directions.



Page Load Testing – Background vs Inline Images

I was recently working to speed up a website and was getting horrible load times for relatively small images (50k) off of a large amazon ec2 server. I decided to create two simple pages and test if using background-images was making a noticeable difference. My two test variants look like this:

inline image test page

Source for background image test page

You will notice that I kept the same number of elements (of the same type) on each of the pages. I left the source blank for the background image test variant to prevent polluting the test.

results: inline image test page

results: background image test page

The version with background images actually gave me a “DOM Ready” after .0225 seconds – while the fully loaded page load time was roughly the same as the inline image version. Could using all background images speed up firing of $.document(ready)? It turns out background-images are only downloaded after the element (containing div, span etc) is available. This prevents blocking from all the round trips required to get images.

Full test results can be seen here and here. Additional reading can be found here: