How to Make Use of Google SiteMaps · Index · Part 1 · 2 · 3 · 4 · 5 · 6 · 7 · 8 · 9 · 10 · 11 · Expand · Web Feed

Scheduling batch jobs to generate RSS feeds and similar stuff like the sitemap.xml file is a way to complex procedure to handle such a simple task, and this approach is fault-prone. Better implement your sitemap generator as dynamic XML file, that is a script reflecting the current state of your web site on each request1. After submitting a sitemap to Google, you don't know when Googlebot finds the time to crawl your web site. Most probably you'll release a lot of content changes between the resubmit and Googlebot's visit. Also, perhaps crawlers of other search engines may be interested in your XML sitemap in the future. There are other advantages too, so you really should ensure that your sitemap reflects the current state of your web site everytime a web robot fetches it.

You can use every file name with your sitemap. Google accepts what you submit, 'sitemap.xml' is just a default. So you can go for 'sitemap.php', 'sitemap.asp', 'mysitemap.xhtml' or whatever scripting language you prefer, as long as the content is valid XML. However, there are good reasons to stick with the default 'sitemap.xml'. Here is an example for Apache/PHP:

Configure your webserver to parse .xml files for PHP, e.g. by adding this statement to your root's .htaccess file:

AddType application/x-httpd-php .htm .xml .rss

Now you can use PHP in all .php, .htm, .xml and .rss files. behaves like any other PHP script. Note: static XML files will produce a PHP error caused by the XML version header.

You don't need XML software to produce the pretty simple XML of Google's sitemap protocol. The PHP example below should be easy to understand, even if you prefer another programming language. Error handling as well as elegant programming was omitted to make the hierarchical XML structure transparent and understandable.

$isoLastModifiedSite = "";
$newLine = "\n";
$indent = " ";
if (!$rootUrl) $rootUrl = "";

$xmlHeader = "<?xml version=\"1.0\" encoding=\"UTF-8\"?>$newLine";

$urlsetOpen = "<urlset xmlns=\"\"
$urlsetValue = "";
$urlsetClose = "</urlset>$newLine";

function makeUrlString ($urlString) {
    return htmlentities($urlString, ENT_QUOTES, 'UTF-8');

function makeIso8601TimeStamp ($dateTime) {
    if (!$dateTime) {
        $dateTime = date('Y-m-d H:i:s');
    if (is_numeric(substr($dateTime, 11, 1))) {
        $isoTS = substr($dateTime, 0, 10) ."T"
                 .substr($dateTime, 11, 8) ."+00:00";
    else {
        $isoTS = substr($dateTime, 0, 10);
    return $isoTS;

function makeUrlTag ($url, $modifiedDateTime, $changeFrequency, $priority) {
    GLOBAL $newLine;
    GLOBAL $indent;
    GLOBAL $isoLastModifiedSite;
    $urlOpen = "$indent<url>$newLine";
    $urlValue = "";
    $urlClose = "$indent</url>$newLine";
    $locOpen = "$indent$indent<loc>";
    $locValue = "";
    $locClose = "</loc>$newLine";
    $lastmodOpen = "$indent$indent<lastmod>";
    $lastmodValue = "";
    $lastmodClose = "</lastmod>$newLine";
    $changefreqOpen = "$indent$indent<changefreq>";
    $changefreqValue = "";
    $changefreqClose = "</changefreq>$newLine";
    $priorityOpen = "$indent$indent<priority>";
    $priorityValue = "";
    $priorityClose = "</priority>$newLine";

    $urlTag = $urlOpen;
    $urlValue     = $locOpen .makeUrlString("$url") .$locClose;
    if ($modifiedDateTime) {
     $urlValue .= $lastmodOpen .makeIso8601TimeStamp($modifiedDateTime) .$lastmodClose;
     if (!$isoLastModifiedSite) { // last modification of web site
         $isoLastModifiedSite = makeIso8601TimeStamp($modifiedDateTime);
    if ($changeFrequency) {
     $urlValue .= $changefreqOpen .$changeFrequency .$changefreqClose;
    if ($priority) {
     $urlValue .= $priorityOpen .$priority .$priorityClose;
    $urlTag .= $urlValue;
    $urlTag .= $urlClose;
    return $urlTag;

Now fetch the URLs from your database. It's a good idea to have a boolean attribute to exclude particular pages from the sitemap. Also, you should have an indexed date-time attribute storing the last modification. Your content management system should enable the attributes ChangeFrequency, Priority, PageInSitemap and perhaps even LastModified on the user interface. Example query: "SELECT pageUrl, pageLastModified, pagePriority, pageChangeFrequency from pages WHERE pages.pageSiteMap = 1 AND pages.pageActive = 1 AND pages.pageOffsite <> 1 ORDER BY pages.pageLastModified DESC". Loop:

$urlsetValue .= makeUrlTag ($pageUrl, $pageLastModified, $pageChangeFrequency, $pagePriority);

After the loop you can add a few templated pages/scripts, not stored as content pages, which change on each page modification or not:

if (!$isoLastModifiedSite) { // last modification of web site
    $isoLastModifiedSite = makeIso8601TimeStamp(date('Y-m-d H:i:s'));
$urlsetValue .= makeUrlTag ("$rootUrl/what-is-new.htm", $isoLastModifiedSite, "daily", "1.0");

Now write the complete XML. Dealing with a larger amount of pages, you should print the <url> tag on each iteration followed by a flush(). If you publish tens of thousands of pages, you should provide multiple sitemaps and a sitemap index. Each sitemap file that you provide must have no more than 50,000 URLs and must be no larger than 10MB.

header('Content-type: application/xml; charset="utf-8"',true);
print "$xmlHeader

Google will process all <url> entries where the URL begins with the URL of the sitemap file. If your website is distributed over many domains, provide sitemaps per domain. Subdomains and the 'www prefix' are treated as seperate domains. URLs like '' are not valid in a sitemap located on ''. The script's output should be something like

<?xml version="1.0" encoding="UTF-8" ?>
<urlset xmlns="" xmlns:xsi="" xsi:schemaLocation="">

Feel free to use and customize the code above. If you do so, put this comment into each source code file containing our stuff:

* Do not remove this header
* This program is provided AS IS
* Use this program at your own risk
* Don't publish this code, link to instead

Ask Googlebot to Crawl New and Modified Pages on Your Web SiteNext Page

Previous PagePopulating the sitemap.xml File

How to Make Use of Google SiteMaps · Index · Part 1 · 2 · 3 · 4 · 5 · 6 · 7 · 8 · 9 · 10 · 11 · Expand · Web Feed


On large sites it may be a good idea to run the script querying the database on another machine to avoid web server slow downs. Also, using the sitemap index file creatively can help: reserve one or more dynamic sitemap files for fresh content and provide static sitemaps, updated weekly or so, containing all URLs. The sitemap tag of the sitemap index offers a lastmod tag to tell Google which sitemaps were modified since the last download. Use this tag to avoid downloads of unchanged static sitemaps.

Author: Sebastian
Last Update: Saturday, June 04, 2005   Web Feed

· Home

· Internet

· Google Sitemaps Guide

· Google Sitemaps FAQ

· Google Sitemaps KB

· Sitemap News

· Simple Sitemaps

· XML Validator

· Google Sitemaps Info

· Web Links

· Link to us

· Contact

· What's new

· Site map

· Get Help

Most popular:

· Site Feeds

· Database Design Guide

· Google Sitemaps

· smartDataPump

· Spider Support

· How To Link Properly

Free Tools:

· Sitemap Validator

· Simple Sitemaps

· Spider Spoofer

· Ad & Click Tracking

Search Google
Web Site

Add to My Yahoo!
Syndicate our Content via RSS FeedSyndicate our Content via RSS Feed

To eliminate unwanted email from ALL sources use SpamArrest!


neat CMS:
Smart Web Publishing

Text Link Ads

Banners don't work anymore. Buy and sell targeted traffic via text links:
Monetize Your Website
Buy Relevant Traffic

[Editor's notes on
buying and selling links

Digg this · Add to · Add to Furl · We Can Help You!

Home · Categories · Articles & Tutorials · Syndicated News, Blogs & Knowledge Bases · Web Log Archives

Top of page

No Ads

Copyright © 2004, 2005 by Smart IT Consulting · Reprinting except quotes along with a link to this site is prohibited · Contact · Privacy