How to make cluttered page areas like blocks with ads unsearchable. The class name robots-nocontent can be applied to everything not related to the page's main content.

Steering SE Crawlers · Index · Expand · Web Feed

Previous PageLink Specific Regulation: REL=NOFOLLOW

User and Crawler Friendly NavigationNext Page


Telling a search engine that particular page areas aren't related to a page's core contents was a problem, until Yahoo! introduced the "robots-nocontent" class name in May 2007. Perhaps other search engines will follow and support this mechanism too. Google has something similar called section targeting for the AdSense crawler, but puts crawler directives in HTML comments instead of the class attribute.

Yahoo's implementation of a great idea is based on the draft of a flawed microformat, and it is somewhat hapless. Using a class name to apply crawler directives comes with a lot of work for Webmasters of (not only) static sites. Introducing CSS-like syntax in robots.txt to apply crawler directives to existing class names and DOM-IDs as well would have been a way better approach. However, here is how it works:

<div class="css-class robots-noindex"> [any X/HTML] </div>

X/HTML classes are designed to get populated with multiple values in a space delimited list. That means that a class name cannot contain a space, and by the way it shouldn't contain other characters than (a-z, 0-9, - and _). Since today the class was used for formatting purposes only, so multiple class names per X/HTML element are somewhat uncommon even to CSS-savvy Web designers.

The class attribute can be used with every X/HTML element in BODY. The predefined robots-nocontent class name takes effect on child nodes when it is assigned to a parent node. Say you've a P element which contains several A, B, EM and STRONG elements. When the P element is tagged with the robots-nocontent class name, the A, B, EM and STRONG elements within the paragraph inherit this attribute value. Hence an elegant implementation would assign class="robots-nocontent" to DIV or SPAN elements (or table rows [TR] and cells [TD]) spanning a block of code and contents which is not relevant to the page's message or core content. For example on this page we could tell crawlers that the search box, ads on the sidebars and the footer are not relevant to this article.

According to Yahoo's specs the robots-nocontent class name marks the tagged page area as "unsearchable". That means words and phrases within a robots-nocontent block will not appear in text snippets on the SERPs, and they will not trigger search query relevancy. So if absolutely-unique-string-on-the-whole-internet appears in a paragraph tagged with robots-nocontent, a search query will not return the page when a searcher types in exactly this phrase into the search box.

Links within a block tagged with the robots-nocontent class name will be followed and they should pass reputation, so robots-nocontent does not come with an implicit rel=nofollow! This allows tagging of navigational page elements like unrelated site-wide links at the very bottom with robots-nocontent. Crawlers will follow the links and index the link destination, but the anchor text is not counted as main content.

Look at your pages and decide which page areas are useful for human visitors, but useless for indexing purposes. Advertisements certainly belong to this category, also repeated navigational text links and crawlable popup menus where every page links to every page should be tagged as unsearchable. TOS excerpts or terms of shipping on e-commerce sites aren't search query relevant, and the same goes for quotes from content licenses or copyright notices.

Bear in mind that not all search engines support the robots-nocontent class name, and that some search engines may implement it differently from the inventor's specifications. We've seen that happen with rel=nofollow and other standards as well.



User and Crawler Friendly NavigationNext Page

Previous PageLink Specific Regulation: REL=NOFOLLOW


Steering and Supporting Search Engine Crawling · Index · Part 1 · 2 · 3 · 4 · 5 · 6 · 7 · 8 · 9 · 10 · Expand · Web Feed



Author: Sebastian
Last Update: Monday, June 20, 2005   Web Feed

· Home

· Internet

· Steering SE Crawlers

· Googlebot-Spoofer

· Google Sitemaps Info

· Web Links

· Link to us

· Contact

· What's new

· Site map

· Get Help


Most popular:

· Site Feeds

· Database Design Guide

· Google Sitemaps

· smartDataPump

· Spider Support

· How To Link Properly


Free Tools:

· Sitemap Validator

· Simple Sitemaps

· Spider Spoofer

· Ad & Click Tracking



Search Google
Web Site

Add to My Yahoo!
Syndicate our Content via RSS FeedSyndicate our Content via RSS Feed



To eliminate unwanted email from ALL sources use SpamArrest!





neatCMS

neat CMS:
Smart Web Publishing



Text Link Ads

Banners don't work anymore. Buy and sell targeted traffic via text links:
Monetize Your Website
Buy Relevant Traffic
text-link-ads.com


[Editor's notes on
buying and selling links
]






Digg this · Add to del.icio.us · Add to Furl · We Can Help You!




Home · Categories · Articles & Tutorials · Syndicated News, Blogs & Knowledge Bases · Web Log Archives


Top of page

No Ads


Copyright © 2004, 2005 by Smart IT Consulting · Reprinting except quotes along with a link to this site is prohibited · Contact · Privacy