How to Make Use of Google SiteMaps · Index · Part 1 · 2 · 3 · 4 · 5 · 6 · 7 · 8 · 9 · 10 · 11 · Expand · Web Feed


A few weeks after Google's launch of SiteMaps, more and more webmasters complain about their sites disappearing from Google's index shortly after a sitemap submission. Did Google trick innocent newbies and not so savvy webmasters into a very smart (but, being a beta version, still errornous) spammer and scraper trap? Tired on countless approaches to abuse its search services, the Google empire strikes back! Seriously, Google launched SiteMaps to explore the 'hidden web', and to learn more about web site structures - including widely used 'helpers' like feeder pages and similar stuff.

Danny Sullivan asked Shiva Shivakumar, engineering director and the technical lead for Google SiteMaps, How will you prevent people from using this to spam the index in bulk? He said We are always developing new techniques to manage index spam. All those techniques will continue to apply with the Google Sitemaps. Analyzing a few of the disappeard web sites and their sitemaps, it seems that the causes for disappearing from Google's index can obviously be found in Shiva Shivakumar's answer.

A few examples cannot prove, that there is no bug causing removals of clean web sites in Google's new service. But before webmasters complain, they should be sure that their sites do comply to Google's guidelines. Shit happens. Even experienced webmasters can fail.

One circumstance commonly applies to web sites wiped out from Google's index after sitemap based deep crawls. The sitemaps were generated by foreign tools, which spider for links and/or collect URLs from the web server's file system. With large sitemaps, human reviews are limited, especially if the file names don't follow human readable naming conventions, and/or query strings are insignificant.

Google applies spam filters on unintentional spider food supplied in sitemaps too. Some scenarios of unintentional cheating:

  • Huge assorted links pages from spider traps, which were very popular in 1999/2000, were not deleted on the web server. The webmaster has only removed the links from the home page.

  • A developer playing with a vendor's data feed many months ago has generated zillions of interlinked product pages in a forgotten directory, all linking to the domain's home page, which Google sees as doorway pages.
  • A formerly spamming site was completely revamped and reindexed on a reinclusion request. The webmaster switched the HTML file name extension from .html to .htm in his devemopment tool, kept the directory structure, and forgot to delete the old stuff on the web server. Unfortunately the sitemap generator submitted the spammy .html files, packed with hidden links and invisible text.
  • A bunch of rarely crawled printer friendly pages without a robots NOINDEX meta tag gets submitted via sitemap. The primary versions of these pages were well ranked, caused by lots of deep inbound links from other sites. For some odd reason the duplicate content filter likes the more or less unlinked printer friendly pages better. Those cannot be found by site:+unique-word-appearing-in-every-bottom-line searches, because the printer friendly pages lack a bottom line containing the search term.
  • ...
  • If one of the mistakes above or a similar scenario applies, clean-up your web server, remove the offending pages in Google's index and send a reinclusion request to Google.



    Google Sitemaps Myths and FictionsNext Page

    Previous PageGoogle Sitemaps Crawler Stats


    How to Make Use of Google SiteMaps · Index · Part 1 · 2 · 3 · 4 · 5 · 6 · 7 · 8 · 9 · 10 · 11 · Expand · Web Feed



    Author: Sebastian
    Last Update: Saturday, June 04, 2005   Web Feed

    · Home

    · Internet

    · Google Sitemaps Guide

    · Google Sitemaps FAQ

    · Google Sitemaps KB

    · Sitemap News

    · Simple Sitemaps

    · XML Validator

    · Google Sitemaps Info

    · Web Links

    · Link to us

    · Contact

    · What's new

    · Site map

    · Get Help


    Most popular:

    · Site Feeds

    · Database Design Guide

    · Google Sitemaps

    · smartDataPump

    · Spider Support

    · How To Link Properly


    Free Tools:

    · Sitemap Validator

    · Simple Sitemaps

    · Spider Spoofer

    · Ad & Click Tracking



    Search Google
    Web Site

    Add to My Yahoo!
    Syndicate our Content via RSS FeedSyndicate our Content via RSS Feed



    To eliminate unwanted email from ALL sources use SpamArrest!





    neatCMS

    neat CMS:
    Smart Web Publishing



    Text Link Ads

    Banners don't work anymore. Buy and sell targeted traffic via text links:
    Monetize Your Website
    Buy Relevant Traffic
    text-link-ads.com


    [Editor's notes on
    buying and selling links
    ]






    Digg this · Add to del.icio.us · Add to Furl · We Can Help You!




    Home · Categories · Articles & Tutorials · Syndicated News, Blogs & Knowledge Bases · Web Log Archives


    Top of page

    No Ads


    Copyright © 2004, 2005 by Smart IT Consulting · Reprinting except quotes along with a link to this site is prohibited · Contact · Privacy