How to Make Use of Google SiteMaps · Index · Part 1 · 2 · 3 · 4 · 5 · 6 · 7 · 8 · 9 · 10 · 11 · Expand · Web Feed

As said before, it's not necessary to put each URL available from a web site into the sitemap, although Google encourages webmasters to submit even images and movies, what makes not so much sense without META data describing the content1. Google SiteMaps was launched to give webmasters an opportunity to tell Google which pages they consider valuable for search engine users. For example, if your contact page behaves dynamically depending on the referring page, you don't need to submit every permutation to Google. Also, don't bother submitting URLs excluded in your robots.txt. Actually a no-brainer, don't submit doorway pages, duplicated content and alike, chances are good that Google will ignore your sitemaps after a while if you cheat.

Concentrate your efforts on pages which are hard to spider, for example dynamic URLs having many arguments in the query string, pages linked from dynamic pages, and pages deeply buried in your linking hierarchy. If you're using session IDs, provide Google with clean URLs (all randomly generated noise truncated). In the sitemap you can use long dynamic URLs up to 2048 characters.

Mass submissions of URLs are not a new thing, but the possibility to suggest how a search engine crawler should handle them is new and pioneering. Google's sitemap protocol defines three optional attributes of URLs: priority, change frequency and last modification. If you can't provide a particular attribute for a page (yet), skip it. The <url> tag is perfectly valid containing the page location alone. Put in additional information as you can, but don't try to populate these tags with more or less useless values just because they are defined.

The most important tag is <lastmod>, telling Google when a page was indeed modified or created. This enables Googlebot to pick fresh content aimed, probably a long time before it finds the very first link pointing to it by accident. Changes of this attribute in the underlying database should trigger a sitemap resubmission by the way. It seems to be important to avoid abuse of <lastmod>, in the best interest of the webmaster. Minor changes of templates affecting a bunch of pages are no reason to submit all pages based on the altered template as modified. Modifications are different wording, additional text information and brand new content.

The <priority> tag is meant as a hint to balance crawling capacities. Say a sitemap contains 10,000 modified URLs, but Googlebot's time slot scheduled for the web site in question would allow the fetching of only 1,000 pages. Now Googlebot should extract 1,000 URLs ordered by priority and probably last modification from the sitemap, fetch these pages and return later on to eat the 9,000 remaining pages.

Google says 'Search engines use this information when selecting between URLs on the same site, so you can use this tag to increase the likelihood that your more important pages are present in a search index.'. This statement made many site owners hope, they may get influence on rankings on Google's SERPs. That's wishful thinking. It simply means, that possibly Googlebot will crawl high-priority URLs before low-priority pages.

Assign reasonable priorities from 0.0 to 1.0 to your pages. For example, a brand new article should get a higher priority assigned than the more or less static home page. Given priorities are interpreted relative to other pages on the same web site. The best advice is: honestly assign high priorities to often changed pages which are of a great interest for your users, and low priorities to static stuff.

The <changefreq> tag seems to be meant as an educated guess, just a hint to the crawler. The list of valid values is short: "always", "hourly", "daily", "weekly", "monthly", "yearly" and "never". Irregularly changes are not covered, thus assign your best guess or even skip it, then rely on <lastmod>. "Never" stands for archived content. Use "always" for frequently updated news feeds and other stuff triggering content changes on (nearly) every page view.

How to Create a Dynamic Google SiteMap XML FileNext Page

Previous PageUnderstanding the Google SiteMap Protocol

How to Make Use of Google SiteMaps · Index · Part 1 · 2 · 3 · 4 · 5 · 6 · 7 · 8 · 9 · 10 · 11 · Expand · Web Feed


META data describing non-textual content means title/alt text in image elements, anchor text in links, and surrounding text as well as META description tags. HTML pages get crawled more frequently than images or videos. Image/video-URIs harvested during regular crawls get queued into the specific crawling schedules. Since there is a relation between descriptive META data and non-textual content, it makes sound sense to submit all kind of content via sitemaps. It sure helps Google to make its image/video-search more current.

Author: Sebastian
Last Update: Saturday, June 04, 2005   Web Feed

· Home

· Internet

· Google Sitemaps Guide

· Google Sitemaps FAQ

· Google Sitemaps KB

· Sitemap News

· Simple Sitemaps

· XML Validator

· Google Sitemaps Info

· Web Links

· Link to us

· Contact

· What's new

· Site map

· Get Help

Most popular:

· Site Feeds

· Database Design Guide

· Google Sitemaps

· smartDataPump

· Spider Support

· How To Link Properly

Free Tools:

· Sitemap Validator

· Simple Sitemaps

· Spider Spoofer

· Ad & Click Tracking

Search Google
Web Site

Add to My Yahoo!
Syndicate our Content via RSS FeedSyndicate our Content via RSS Feed

To eliminate unwanted email from ALL sources use SpamArrest!


neat CMS:
Smart Web Publishing

Text Link Ads

Banners don't work anymore. Buy and sell targeted traffic via text links:
Monetize Your Website
Buy Relevant Traffic

[Editor's notes on
buying and selling links

Digg this · Add to · Add to Furl · We Can Help You!

Home · Categories · Articles & Tutorials · Syndicated News, Blogs & Knowledge Bases · Web Log Archives

Top of page

No Ads

Copyright © 2004, 2005 by Smart IT Consulting · Reprinting except quotes along with a link to this site is prohibited · Contact · Privacy