About RSS-Feeds and Search Engine Optimizing (SEO) · Index · Part 1 · 2 · 3 · 4 · Expand · Web Feed


What is a duplicate content penalty?

Hardcore spammers make use of scripts to produce zillions of fraudulent doorway pages made up by slight variations of sentences and text snippets pulled from a database. Each of these doorway pages is optimized for a particular keyword phrase, but the text content visible to search engine users is duplicated over and over. Search engines consider this approach to trick their users into viewing useless pages, and similar fraudulent techniques as well, spam and penalize the spammers by banning their domains.

Although only unethical webmasters have to live in fear of duplicate content penalties, this term is often used as a synonym for duplicate content filters. Filtering duplicate content stands for a set of methods search engines use to optimize their search results in the best interest of their users.

What is a duplicate content filter and how does it work?

When a search engine crawler fetches a page and finds out, that it's an exact duplicate of another page in the search engine's index, it changes the 'lastCrawled' attribute if the URL matches and moves on. If the URL doesn't match, the fetched page gets trashed. Comparing pages, the crawler performs a heuristic method for performance reasons, thus sometimes very similar, but not identical pages may be discarded 'by mistake'.

Otherwise the crawler puts the fetched page in a queue for indexing. During the indexing process some high sophisticated algorithms extract the text content from templates, navigation and advertising. Now the search engine has two versions of the page, a 'full' version and a 'core text' version. Further calculations are applied to both versions, and under some circumstances the indexing process may discard the page.

If the page makes it into the search engine's index, real time filters are used to avoid duplicates in the context of the user's search query. Say a user searches for a press release, it makes no sense to deliver all reprints on the SERP. However, it happens all the time that a search engine puts reprints of an article or press release from different places on the SERP. Analyzing these obvious duplicates, you'll find out how much surrounding text and different navigation it needs to prevent a page from being catched in a duplicate content filter.


This description of procedures used by search engines to filter duplicate content is extremely simplified.



Do SEs Apply 'Duplicate Content Penalties' to Web Sites Publishing Sydicated RSS Feeds?Next Page

Previous PageCan Syndicated RSS Content Improve Search Engine Rankings?


About RSS-Feeds and Search Engine Optimizing (SEO) · Index · Part 1 · 2 · 3 · 4 · Expand · Web Feed



Author: Sebastian
Last Update: Monday, June 27, 2005   Web Feed

· Home

· Internet

· RSS+SEO

· Web Links

· Link to us

· Contact

· What's new

· Site map

· Get Help


Most popular:

· Site Feeds

· Database Design Guide

· Google Sitemaps

· smartDataPump

· Spider Support

· How To Link Properly


Free Tools:

· Sitemap Validator

· Simple Sitemaps

· Spider Spoofer

· Ad & Click Tracking



Search Google
Web Site

Add to My Yahoo!
Syndicate our Content via RSS FeedSyndicate our Content via RSS Feed



To eliminate unwanted email from ALL sources use SpamArrest!





neatCMS

neat CMS:
Smart Web Publishing



Text Link Ads

Banners don't work anymore. Buy and sell targeted traffic via text links:
Monetize Your Website
Buy Relevant Traffic
text-link-ads.com


[Editor's notes on
buying and selling links
]






Digg this · Add to del.icio.us · Add to Furl · We Can Help You!




Home · Categories · Articles & Tutorials · Syndicated News, Blogs & Knowledge Bases · Web Log Archives


Top of page

No Ads


Copyright © 2004, 2005 by Smart IT Consulting · Reprinting except quotes along with a link to this site is prohibited · Contact · Privacy