Web Log Archive · Index · Part 1 · 2 · 3 · 4 · 5 · 6 · 7 · 8 · 9 · 10 · 11 · 12 · 13 · 14 · 15 · 16 · 17 · 18 · 19 · 20 · 21 · 22 · 23 · Expand · Web Feed

Cloaking at all is not penalized by search engines. Search engines consider intentions. Search engines even encourage Webmasters to cloak for improved spider-friendliness. Here is a tiny guide to search engine friendly cloaking.

Hardcore cloaking is a still very effective way to generate huge amounts of very targeted search engine traffic. Search engines dislike it, because it screws their organic results. You'll find a "Do not cloak" statement in any engine's Webmaster guidelines. "Do not cloak" addresses so called black hat SEO tactics.

Despite the fact that those search engines cloak the hell out of their own pages, notwithstanding the fact that all major sites cloak and still rank fine on the SERPs, ignoring good professional advice stating that some legitimate usability goodies aren't achievable without cloaking, many fearful site owners suffer from cloaking paranoia hindring them to get the most value out of their Web site.

Cloaking is defined as "delivering one version of a Web page to Web robots, and another version to human users". Black hat cloaking means feeding crawlers with a zillion keyword optimized content rich pages, while human visitors get redirected to a sales pitch when they click the links on the SERPs. Examples of white hat cloaking are geo targeting, internationalization, browser specific page optimization, links and other content visible to registered users while robots and unregistered visitors get a sign-up form, crawler-friendly URLs with shortened query strings and so on. Search engines do not penalize legitimate cloaking, and they cloak themselves:

Look at Google's home page as a spider and as a user, then compare the pages. The homebred page delivered to bots contains the logo, the search form, and provides links to Google's advertising programs, business solutions, the about page, and currently a link to Hurricane Katrina Resources. Say you're located in europe and you have a Google Mail account. The page served to your browser contains your GMail address, a link to your personalized home page, a link to your account settings and more above the logo. The Katrina link is missing, but you get a link "Go to Google YourCountryHere". Different content served to a robot and a user is cloaking, but the URL is not penalized (Google once banned own pages for prohibited cloaking), it shows a PageRankô of 10 and appears in search results.

Back to the fearful site owners's concerns. Cloaking at all is not penalized by search engines. Search engines consider intentions. Search engines even encourage Webmasters to cloak for improved spider-friendliness. E.g. a Google rep. posted "... allowing Googlebot to crawl without requiring session ID's should not run afoul of Google's policy against cloaking. I encourage webmasters to drop session ID's when they can. I would consider it safe." at WebmasterWorld on Dec 4, 2002.

What consider search engines allowed cloaking for spider-friendliness? A few examples:

Truncating session IDs and similar variable/value pairs in query strings as described in this tutorial. If the script supposed to output a page discovers a sessionID in its query string, and if the user agent is a search engine crawler, it returns a HTTP header with a permanent redirect response (301) to its own URI without sessionID and other superfluous arguments, and quits. The crawler will then request the page from the URI provided in the 301 redirect header. Again the script identifies the user agent as crawler and behaves differently from a user request. It will not make use of cookies, because bots don't accept cookies. It will not start a session and it will not prompt for 'missing' user specific values. Instead it prints out useful default values where suitable, and provides spider-friendly internal links without sessionID or similar user dependent arguments.

Reducing the number of query string arguments, that is forcing search engines to fetch and index URIs with the absolute minimum number of variables necessary to output the page's content. For example crawlers get redirected to URIs with a query string like "?node=17&page=32" whilst, depending on previous user actions, the query string in a browser's address bar might look like "?node=17&page=32 &user=matt&stylesheet=whitebg&navigation=lefthanded&...".

Stripping affiliate IDs and referrer identifiers, that is hiding user tracking from search engines. If a site has traffic deals with other places and needs to count incoming traffic by referrer ID, or if a site provides an affiliate program, search engines will find links containing IDs in the query string during their crawling on the 'Net, and their spiders follow those links. At the destination site, crawlers will be redirected to 'clean' URIs. E.g. a crawler requesting http://www.domain.com/?ref=471 or http://www.domain.com/landingpage.php?aff=472 gets redirected to http://www.domain.com/ respectively http://www.domain.com/landingpage.php.

Preventing search engines from indexing duplicated content. User friendly Web sites offer multiple navigation layers, resulting in multiple pages providing the same content along different menu bars. The script detecting a crawler request knows one indexable version per page and puts 'INDEX' in its robots META tag, otherwise it populates the tag with 'NOINDEX'. This handling is way more flexible, and elegant, than hosting different scripts or aliases per navigation layer to control crawling via robots.txt.

There are lots of other good reasons for search engine friendly cloaking. As long as the intention to cloak is not spamindexing, and the well meant intention is obvious, search engines tolerate cloaking. In some rare cases the intention to cloak is not obvious, for example on membership sites: Inserting HTML comments providing a short explanation and outlining the different versions helps to pass human reviews by search engine staff (every once in a while competitors search for cloaking and report competiting sites to the engines).


Friday, September 16, 2005

How to Gain Trusted ConnectivityNext Page

Previous PagePingable Fresh Content is King


Web Log Archive · Index · Part 1 · 2 · 3 · 4 · 5 · 6 · 7 · 8 · 9 · 10 · 11 · 12 · 13 · 14 · 15 · 16 · 17 · 18 · 19 · 20 · 21 · 22 · 23 · Expand · Web Feed



Author: Sebastian
  Web Feed

· Home

· Internet

· Blog

· Web Links

· Link to us

· Contact

· What's new

· Site map

· Get Help


Most popular:

· Site Feeds

· Database Design Guide

· Google Sitemaps

· smartDataPump

· Spider Support

· How To Link Properly


Free Tools:

· Sitemap Validator

· Simple Sitemaps

· Spider Spoofer

· Ad & Click Tracking



Search Google
Web Site

Add to My Yahoo!
Syndicate our Content via RSS FeedSyndicate our Content via RSS Feed



To eliminate unwanted email from ALL sources use SpamArrest!





neatCMS

neat CMS:
Smart Web Publishing



Text Link Ads

Banners don't work anymore. Buy and sell targeted traffic via text links:
Monetize Your Website
Buy Relevant Traffic
text-link-ads.com


[Editor's notes on
buying and selling links
]






Digg this · Add to del.icio.us · Add to Furl · We Can Help You!




Home · Categories · Articles & Tutorials · Syndicated News, Blogs & Knowledge Bases · Web Log Archives


Top of page

No Ads


Copyright © 2004, 2005 by Smart IT Consulting · Reprinting except quotes along with a link to this site is prohibited · Contact · Privacy