Declined submissions and error messages like URL not under Sitemap path are often caused by an incorrect usage of multiple server names. Why are example.com and www.example.com different, when both serve the same content? How to avoid confusion?
Google Sitemaps KB ·
Index · Expand · Web Feed
Google Sitemaps Team Interview
Many Web hosting services configure Web sites in a way that they are accessible under different addresses. This is meant to please users, but very often the feigned convenience results in all sorts of troubles, because the setup stays half done.
So what is a canonical server name? It's part of the URL of your site, http://www.example.com/. The canonical server name, or host name, consists of two or more components, delimited by dots. From right to the left that's the TLD like com or net, the site's name (domain), and one or more optional sub-domain prefixes like www, mail, ftp, www.name.dept or city.state. Naturally the www prefix is used to serve Web pages, ftp to host downloadable files, and other prefixes stand for segments of a huge site or they separate development servers from the production system.
Each server name is a unique address, like a phone number, and points to different content by default. Many sites host each sub-domain on its own computer, or even run multiple server computers per prefix. Those huge sites define the standard for small sites too. To allow future scalability, one should make use of sub-domain prefixes, e.g. www for Web contents.
In the real life however, zillions of Webmasters don't think in large scale dimensions. They sign up at a hosting service, get the usual small business setup, and are happy that their sites respond to both example.com as well as www.example.com. However, technically both are still different servers, able to serve different contents. For a request from another address, for example by a visitor's browser or a search engine crawler, it's not transparent that both servers pull their contents from the same directory on the Web server's hard disk.
Search engines like Google have learned to deal with those incomplete setups. It works fine as long as they don't get confused by links containing both server names in the URL. An URL is a unique address, that is http://www.example.com/page.html and http://example.com/page.html point to two different pages. Those pages may or may not carry the same contents. Using both variants leads to dilution of link popularity (PageRank), unnecessary problems with duplicate content filters, and all kinds of other troubles resulting in lowered search engine visibility, that is lost traffic.
So what can I do to avoid those troubles? I'm pretty sure that I don't use both server names in my HTML code, but I cannot control external links and URL drops pointing to my site. Also, I want that a visitor can type in the shortened variant and actually lands on my site.
First, use only one server name in URLs, business cards, flyers, TV and radio spots ... Google Sitemaps, link submissions and internal links as well. Your decision makes one of both server names the canonical server name of your Web site. You must stick with the chosen server name, that is you cannot change your mind later on.
Second, make sure that all URLs containing the unused server name respond with a permanent redirect to the URL containing the canonical server name. If you've opted for the www prefix for example, the URL http://example.com/page.html must respond with a 301 error code telling the client (browser or crawler) that the page has to be requested from http://www.example.com/page.html.
If your site is hosted by a professional hosting service, you can ask for this setup. Unfortunately, many Web hosters have no idea why this is important, and will deny your request or simply tell you that's impossible. Then go get a real host, or do it yourself, it's easy.
Tuesday, January 03, 2006 by Sebastian
Google Sitemaps Team Interview
Google Sitemaps Knowledge Base ·
Index · Part 1 · 2 · 3 · 4 · 5 · 6 · 7 · 8 · 9 · 10 · 11 · 12 · Expand · Web Feed
Author: The Google Sitemaps Group
Last Update: December 10, 2005 Web Feed