Search Engine Optimization: Is the number of slashes in your URLs important?

Submitted by Hannes Schmidt on Tue, 08/03/2004 - 14:59.

In this patent assigned to Google, the number of slashes in URLs is used to score the documents to which the URLs point. The existence of this patent led some webmasters to believe that they need to optimize the number of slashes in their URLs in order rank well in Google, especially for hierarchical sites like directories.

My research on a small data set did not give any evidence to support this hypothesis. The data set consists of the first 500 results returned by each of eight two-word "widget city" (no phrase) searches against Google and Yahoo. In total the data set contains 8 x 2 x 500 = 8000 URLs. Each item in the data set is a tuple (r,c) where r is a number between 1 and 500 representing the rank of a URL in the result set of a search and c is the number of slashes in that URL. HTTP URLs contain at least one slash (not counting the two slashes in "http://"). I distilled the data set into three diagrams. The first diagram shows the tuple distribution with Google, cummulative for all eight searches. The second diagram shows the same for Yahoo wheras the third one illustrates the combined data-sets for both search engines. Each diagram contains one dot per tuple in the set. If the above hypothesis were true and the number of slashes were a significant scoring factor, there should be a correlation between rank and number of slashes. This correlation would materialize in the form of a non-uniform distribution of dots per horizonal line. For example, the density of dots on the lowest horizontal line with tuples (r, 1) - URL's with one slash - should decrease towards lower ranks (higher r numbers). This non-uniform distribution cannot be observed. In fact, the distribution is surprisingly uniform.

This leaves us with three possible explanations:

  1. The number of slashes is not a predominant scoring factor in either Google or Yahoo, hence its influence is hidden beneath the influence of other more important scoring factors, of which there are many.
  2. The number of slashes is not used by either Google or Yahoo to score pages.

  3. The data set is not representative.

The latter is rather unlikely, especially considering that the data set is based on two different search engines. Yahoo used to employ Google's index but earlier this year Yahoo started using its own engine and index. Also, if you think about it, why should a serious search engine algorithm assume that hierarchical sites like directories use URL slashes between categories just like file-systems use slashes to separate hierarchical folders (or backslashes) in file paths. Some directories do use slashes (e.g. Yahoo directory and DMOZ), others don't (Jayde).

You may also look at it from the perspective of site technology like content management systems. The structure of information on one particular CMS-based site and hence the "structural depth" of its pages is not necessarily reflected in that site's URLs. The open-source content management system Typo3 does not use slashes in its URLs at all; others do, e.g. Zope/Plone and Drupal.

The structural depth is indeed reflected in the link structure of a site. As a rule of thumb, the deeper the page, the more clicks it takes to get there. Google's PageRank is largely determined by the link structure of the site. As another rule of thumb, the deeper the page the lower its page rank, assuming there are no external links to this page. In other words, the PageRank algorithm already takes care of structural depth. A page's PageRank already reflects its depth. Why invent another scoring factor?

( categories: Webmaster )