Most of you know I’m a fan of the sitemaps.org project. I love that the major search engines (Google, Yahoo, MSN, and now Ask) are all collaborating on supporting this standard.
Google just announced that there have been some important updates to the project. Most notable to me is the fact that they’ve added another method of notifying the search engines of your sitemap XML file’s location.
Previously, we had these two methods for submitting sitemaps to search engines:
- Upload it via the search engine’s webmaster tools interface
- Send an HTTP request or ping to the search engine
A third method is now supported. The robots.txt file can contain a search engine directive specifying the sitemap XML file’s location, like this:
The addition of this one line of text will tell the search engine where to find your sitemap XML file.
The previous methods are still supported and there is value in using a combination of them. Our content management system, for example, has had support for sitemap XML built in for quite some time now so we’ll soon be adding support for this new robots.txt method.
Here is my breakdown of the value provided by each method.
Submit via the Interface
This gives you visual confirmation that the sitemap you uploaded was valid and received by the search engine. It also shows the date last uploaded. There are services out there that will generate a sitemap XML file based on your website URL which you can then upload into your webmaster account.
Send an HTTP Request
This is great for a systems-level approach because it can be done programmatically. It also has the advantage of being a “push” method which means no waiting around for the search engine spiders to show up and find your new sitemap file. Using this method requires some programming; it isn’t easily done manually.
For example, Tweak will update your sitemap XML file and send a ping to the search engine when you make changes to your website. This could be done manually but would be pretty cumbersome.
This is great for both a systems-level approach and a manual approach. It’s easy enough to crack open a robots.txt file, add the single line, and save your changes to the server. It has the disadvantage of being a “pull” method meaning you have to wait for the search engine spiders to pull a copy of your robots.txt file. Overall it’s the simplest, cleanest method though and unless your site is brand new the search engines are probably spidering it often enough to negate this concern.