Friday, 13 May 2005

Robots.txt, The Whitehouse, Iraq And My Old Website

Even now, 6 months after transferring my blog from http://users.chariot.net.au/~jktaheny/blogger/blog.htm to http://taheny.com, Google still ranks my old website address above this one for most searches. It is annoying, as I want visitors to come here and read 'Joe's up to date Ramblings'.

After some contemplation, today I added a Robots.txt file to my old website. In theory, the following piece of script should stop all search engine robots visiting the old blog:
User-agent: *
Disallow: /

If the text works and the search robots (including Google) don't visit my old blog, I hope and expect it to drop out of the search rankings to be replaced with my current blog. We will see.

When looking for Robots.txt advice, I came across the Whitehouse's Robots.txt page with over 2000 lines of text. This many lines in a Robots.txt is not uncommon for large websites (whitehouse.gov has over 600,000 pages).

What I did find strange was almost all the pages contained in the text were Iraq-related. Here is a random screenshot:



I know Iraq has been a major issue, but surely, most of the Whitehouse web pages are not Iraq-related. If this is the case, why does the Whitehouse not want many Iraq pages spidered? Are they embarrassed by the mess they have made?


UPDATE: The Whitehouse/Robots.txt/Iraq issue has been covered many times before.
01:19    

Comments:


Post a Comment

Back Home

Hosted by Dathorn
Re-activated by Oopsilon.com

Privacy Policy

© 2003-08, Joe Taheny