Web Log Archive · Index · Part 1 · 2 · 3 · 4 · 5 · 6 · 7 · 8 · 9 · 10 · 11 · 12 · 13 · 14 · 15 · 16 · 17 · 18 · 19 · 20 · 21 · 22 · 23 · Expand · Web Feed

Yahoo's site explorer is a great tool for folks keen on linkage data. Here is a quick rundown on its Web interface and the API. [12/06/2005: Y!SE was updated and greatly improved]

On September/30/2005 the Yahoo! Site Explorer (BETA) got launched. It's a nice tool showing a site owner all indexed pages per domain, and it offers subdomain filters. Inbound links get counted per page and per site. The tool provides links to the standard submit forms.

The number of inbound links seems to be way more accurate than the guessings available from linkdomain: and link: searches. Unfortunately there is no simple way to exclude internal inbound links. So if one wants to check only 3rd party inbounds, a painfull procedure begins:
1. Export of each result page to TSV files, that's a tab delimited format, readable by Excel and other applications.
2. The export goes per SERP with a maximum of 50 URLs, so one must delete the two header lines per file and append file by file to produce one sheet.
3. Sorting the work sheet by the second column gives a list ordered by URL.
4. Deleting all URLs from the own site gives the list of 3rd party inbounds.
5. Wait for the bugfix "exported data of all result pages are equal" (each exported data set contains the first 50 results, regardless from which result page one clicks the export link).
Since December/06/2005 Yahoo provides a filter to exclude internal links (per domain and sub-domain).

The result pages are assorted lists of all URLs known to Yahoo. The ordering does not represent the site's logical structure (defined by linkage), not even the physical structure seems to be part of the sort order. It looks like the first results are ordered by popularity, followed by an unordered list. The URL listings contain fully indexed pages, with known URLs mixed in. The latter can be identified by the missing cached link.

Desired improvements:
1. A filter "with/without internal links".
2. An export function outputting the data of all result pages to one single file.
3. A filter "with/without" known but not indexed URLs.
4. Optional structural ordering on the result pages.
5. Operators like filetype: and -site:domain.com.
6. Removal of the 1,000 results limit.
7. Revisiting of submitted URL lists a la Google sitemaps.
8. [Added December/06/2005] Filtering of AdSense scraper sites like ODP and Wikipedia clones.

Overall, the site explorer is a great tool and an appreciated improvement. The most interesting part of the new toy is its API, which allows querying for up to 1,000 results (page data or link data) in batches of 50 to 100 results, returned in a simple XML format (max. 5,000 queries per IP address per day).

Good news for site and mass submission addicts: as per December/06/2005 Yahoo accepts RSS/ATOM feeds and HTML pages in addition to the already supported plain URL lists in text files, which were dumped after the fetch triggered by a manual submission, unfortunately.

Monday, December 12, 2005

Previous PageOptimizing the Number of Words per Page

Web Log Archive · Index · Part 1 · 2 · 3 · 4 · 5 · 6 · 7 · 8 · 9 · 10 · 11 · 12 · 13 · 14 · 15 · 16 · 17 · 18 · 19 · 20 · 21 · 22 · 23 · Expand · Web Feed

Author: Sebastian
  Web Feed

· Home

· Internet

· Blog

· Web Links

· Link to us

· Contact

· What's new

· Site map

· Get Help

Most popular:

· Site Feeds

· Database Design Guide

· Google Sitemaps

· smartDataPump

· Spider Support

· How To Link Properly

Free Tools:

· Sitemap Validator

· Simple Sitemaps

· Spider Spoofer

· Ad & Click Tracking

Search Google
Web Site

Add to My Yahoo!
Syndicate our Content via RSS FeedSyndicate our Content via RSS Feed

To eliminate unwanted email from ALL sources use SpamArrest!


neat CMS:
Smart Web Publishing

Text Link Ads

Banners don't work anymore. Buy and sell targeted traffic via text links:
Monetize Your Website
Buy Relevant Traffic

[Editor's notes on
buying and selling links

Digg this · Add to del.icio.us · Add to Furl · We Can Help You!

Home · Categories · Articles & Tutorials · Syndicated News, Blogs & Knowledge Bases · Web Log Archives

Top of page

No Ads

Copyright © 2004, 2005 by Smart IT Consulting · Reprinting except quotes along with a link to this site is prohibited · Contact · Privacy