No, Google Sitemaps is a robots inclusion protocol lacking any syntax for deletions. Remove deleted URLs in the XML file, and ensure your server responds 404 or 410 to Googlebot.

Google Sitemaps FAQ · Index · Expand · Web Feed

Previous PageCan a Google Sitemap destroy my search engine placements?

Will a Google Sitemap increase my PageRank?Next Page


Deleted and renamed pages must be removed from your Google Sitemap. Having invalid or redirecting URLs in a sitemap burns resources, and blows up the crawler problem reports. Once Google knows a URL, Googlebot will try to fetch it more or less until the server dies.

Google Sitemaps is an instrument to submit new Web objects and content changes to Google's crawler, as an addition to the regular crawling process. Google Sitemaps is in no way a catalogue of all URLs per Web server where Googlebot ignores URLs not included in the sitemap. Thus deleting an URL entry in a sitemap will not hinder Googlebot to request it again and again.

The only way to tell a search engine crawler that a page has vanished is via HTTP response code. Google provides an additional method to remove URLs in Google's index immediately, that means before the next (regular) crawl, but the removal procedure has disadvantages, e.g. it does not delete URLs forever.

The HTTP protocol defines return codes to tell a user agent (browser, crawler ...) the status of a resource (URL). If a page is found at the requested address (URL), and its content can be delivered to the user agent, the Web server sends a header containing the return code 200 OK to the user agent, before it sends the content. Otherwise it sends an error code. With static Web objects (HTML pages, images ...) this happens in the background, the Webmaster can configure specific return codes for particular areas or resources. With dynamic pages the Webmaster can manipulate the HTTP return code sent to the user agent per page. The most important HTTP error codes usable for moved and deleted resources (URLs) are explained below:


HTTP return code 404 - Not Found

The 404 return code is a generic error code, used if a resource is not available, and the server does not know, or does not want to reveal, whether the resource is permanently gone or just temporarily unavailable respectively blocked. Usually the Web server is configured to send a custom error page to the user agent, which responds with a 404 error code in the header, and provides the visitor with information (e.g. error messages) and options (e.g. links to related resources). On Apache Web servers this can be done in the .htaccess file located in the root directory:

ErrorDocument 404 /error.htm?errno=404

Because the user agent does not know whether the resource has vanished or not, it might request it again. Therefore the 404 code is not suitable to tell a search engine crawler that it should forget a resource.

Google provides a procedure to delete URLs responding with a 404 code in its databases. Go to Googlebot's URL Console and create an account. You should use the same email address and password as for the Google Sitemaps account. Once the account is active, you can submit dead links found in your Google Sitemaps Stats, server error logs etc. under "Remove an outdated link". It can take five days until the deletion is completed. Every time you log in, you get a status report stating which submitted URLs are not yet removed. Ensure that during this process the URL responds with a "404 not found" error code.


HTTP return code 410 - Gone

The 410 return code tells the user agent that a resource has been removed permanantly. Search engine crawlers usually mark resources responding with a 410 code delisted, and do not request them again. That's not always the case with Google's supplemental index, where dead resources can still appear in search results, even years after their deletion. A 404/410 return code may move a cached resource from the current search index to the supplemental index. However, if a page was deleted and there is no forwarding address (e.g. a new page with similar content), the Web server should send a 410 header. It's good style to make use of a custom error page for human visitors.


HTTP return code 302 - Found (Elsewhere)

The 302 return code tells the user agent that the requested URL is temporarily unavailable, but the content is available from another address. In the 302 header the server gives the user agent a new location (URL), and the user agent will then request this resource. For various reasons a Webmaster should avoid 302 redirects, they lead to all sorts of troubles with search engines. The most common cause for 302 responses is an invalid URL used in internal links and link submissions, e.g. missing trailing slashes etc. (see valid URLs). Unfortunately, 302 is the default return code for most redirects, for example Response.Redirect(location) in ASP, header("Location: $location") in PHP, RewriteRule as well as ErrorDocument 404 http://www.example.com/page(!!) in Apache's .htaccess directives. All server sided programming languages provide methods to set the redirect response code to 301.


HTTP return code 301 - Moved Permanently

The 301 redirect code tells the user agent that a resource has been moved and will never be available at the old address again. All intentional redirects (e.g. renamed URLs, moved URLs ...) must send the requesting user agent a 301 header with the new permanent address. Many scripts make use of redirects to 'link' to external resources, usually because this is simple way to track outgoing traffic. That's a lazy and wacky hack, but if not avoidable, the script should do a permanent redirect at least.
As for deleted pages, often it makes sense to 301-redirect requests instead of sending a dead page error (404 or 410), especially when there is a page with similar content available on the Web server and other sites link to the deleted page.


Examples of 301 - redirects

To ensure your redirects send a 301 response code to the user agent, you can copy and paste the code examples below. The first examples are for Apache's .htaccess files:

#1 301-redirects a page:
RedirectPermanent /directory/page.html http://www.example.com/other-directory/other-page.html

#2 Alternate syntax:
Redirect 301 /directory/page.html http://www.example.com/other-directory/other-page.html

#3 301-redirects all example.tld/* requests to www.example.tld/*:
RewriteEngine On
RewriteCond %{HTTP_HOST} !^www\.example\.com [NC]
RewriteRule (.*) http://www.example.com/$1 [R=301,L]

Please note, that with both Redirect(2) and RedirectPermanent(1) the first location parameter (source) is a URL relative to the Web server's root, and the second location parameter (target) is a fully qualified absolute URL. The third .htaccess example makes use of the mod_rewrite module and redirects all requests of URLs on example.com to the corresponding URL on www.example.com.

If you suffer from IIS, go to the "Home Directory" tab and click the option "Redirection to a URL". As "Redirect to" enter the destination, for example "http://www.example.com$S$Q", without a slash after ".com" because the path ($S placeholder) begins with a slash. The $Q placeholder represents the query string. Next check "Exact URL entered above" and "Permanent redirection for this resource", and submit. If you haven't both versions (with and without the "www" prefix) configured, create the missing one before.

Or you can use ASP to redirect page requests:

'VBScript:
Dim newLocation
newLocation = "http://www.example.com/other-directory/other-page.asp"
Response.Status = "301 Moved Permanently"
Response.AddHeader "Location", newLocation
Response.End

'JScript:
Function RedirectPermanent(newLocation) {
Response.Clear();
Response.Status = 301;
Response.AddHeader("Location", newLocation);
Response.Flush();
Response.End();
}
...Response.Buffer = TRUE;...
...RedirectPermanent ("http://www.example.com/other-directory/other-page.asp");

The ASP page script must be terminated after sending the 301 header, and you must not output any content, not even a single space, before the header. Everything after Response.End will not be executed.

The same goes for PHP:

$newLocation = "http://www.example.com/other-directory/other-page.php";
header("HTTP/1.1 301 Moved Permanently", TRUE, 301);
header("Location: $newLocation");
exit;


If you are on a free host and if you don't care whether your stuff gets banned by search engines or not, you can use META refreshs:

<META HTTP-EQUIV=Refresh CONTENT="0; URL=http://www.example.com/other-directory/other-page.htm">

or JavaScript:

window.location = "http://www.example.com/other-directory/other-page.htm/";

and intrinsic event handlers:

<body onLoad="setTimeout (location.href='http://www.example.com/other-directory/other-page.htm', '0')">

to redirect. Again, do not use any client sided redirects if you're keen on search engine traffic, especially not the sneaky methods from the examples above. Google automatically discovers sneaky redirects and deletes all offending pages or even complete domains from its search index, mostly without a warning.


Checklist "Delete a page"

  • Delete the URL entry in the Google Sitemap, then resubmit the sitemap
  • Ensure the server responds with the correct HTTP error code
  • Remove or change all internal links pointing to the deleted page
  • Ask Webmasters of other sites linking to the deleted page to change or remove their links


Before you delete pages, consider an archive. Archiving (outdated) content under the current URL preserves existing traffic and comes with lots of other advantages. Archiving is an easy task, just change the page template and/or the navigation links. Smart content management systems (CMS) will archive a page with one mouse click. Changing the URL is a bad idea, because all incoming links become invalid.


Recap: HTTP error codes 404 · 410 · 302 · 301 · Redirect code snippets · Checklist


Saturday, October 29, 2005

Will a Google Sitemap increase my PageRank?Next Page

Previous PageCan a Google Sitemap destroy my search engine placements?


Google Sitemaps - The How-To What-Is FAQ · Index · Part 1 · 2 · 3 · 4 · 5 · 6 · 7 · 8 · 9 · Expand · Web Feed



Author: Sebastian
Last Update: Friday, October 28, 2005 [DRAFT]   Web Feed

· Home

· Google Sitemaps Guide

· Google Sitemaps FAQ

· Google Sitemaps Info

· Google Sitemaps KB

· Web Links

· Link to us

· Contact

· What's new

· Site map

· Get Help


Most popular:

· Site Feeds

· Database Design Guide

· Google Sitemaps

· smartDataPump

· Spider Support

· How To Link Properly


Free Tools:

· Sitemap Validator

· Simple Sitemaps

· Spider Spoofer

· Ad & Click Tracking



Search Google
Web Site

Add to My Yahoo!
Syndicate our Content via RSS FeedSyndicate our Content via RSS Feed








Digg this · Add to del.icio.us · Add to Furl · We Can Help You!



Home · Categories · Articles & Tutorials · Syndicated News, Blogs & Knowledge Bases · Web Log Archives


Top of page

No Ads


Copyright © 2004, 2005 by Smart IT Consulting · Reprinting except quotes along with a link to this site is prohibited · Contact · Privacy