Actually, at this point in time this is just a patent application and hasn’t been granted or assigned to Google yet, but that’s no indication that some of the factors haven’t been or won’t be applied at some point in time.

Information retrieval based on historical data

There’s a thread about it at Search Engine Watch Forums

Does New Google Patent Validate Sandbox Theory?

And randfish has posted a link in the thread to his analysis of the contents of the patent application

Google’s Patent: Information Retrieval Based on Historical Data

In a nutshell, it’s mostly about tracking the history of a site/domain name and various types of effects it could have on scoring. There’s so much territory covered that it’s almost like the group of people responsible for the invention had a roundtable brainstorming session and put in everything they could think of.

While most of the discussions around are focusing on age of sites and links, there are a couple of other points touched on that I find particularly interesting:

1. Section [0020] is interesting, with a fairly broad view of what constitutes a document. Included are web pages, forum posts, emails, etc. - and most interesting, looking at a web site as a document unit.

2. In Section [0037] mention is made about a minimum threshold number of pages being required to qualify as the inception date of a site. What this could mean is that one page placeholders just to get a headstart on getting a site into the index may accomplish that, but won’t qualify the site as far as longevity factors are concerned.

3. Beginning with Section [0045], the focus is on new associated pages and an update frequency score, with mention made in [0046] and [0049] about the frequency of updating affecting scoring. Section [0050] is very meaty, with several important points made about new pages associated with the document, rate of increase and proportion.

All in all, those are the sections that initially catch my interest the most, and while a lot else may be conjectural at this point in time, it’s particularly that last one, which the patent application goes into more detail about, that has the most possibility for implementation within what can be considered the “normal” realm of practice and practical application.

Comments

Leave a Reply

You must be logged in to post a comment.