Apr
6
Google’s Software Principles, the New Google Patent and Chicken Little
Filed Under Google | Leave a Comment
A minor uproar started in the SEO community with the announcement of the public publication of Google’s patent application
Information retrieval based on historical data
Chicken Little is indeed on the run, and for many it may seem like the sky is falling, either on themselves or on others. It may well be falling for some, but not for the most part in the way it’s being presented. Here are a couple of discussions on the patent application:
WebmasterWorld thread:
Google Patent Details Many Criteria Used For Ranking Purposes
Search Engine Watch Forums thread:
Does New Google Patent Validate Sandbox Theory?
And here’s where it goes from there, which more than a few are thinking about:
Crazy Idea - Bookmarks Seen by SE’s
Now how about let’s look at what Google has said publicly about software practices. From Google Corporate: Google Software Principles:
SNOOPING
If an application collects or transmits your personal information such as your address, you should know. We believe you should be asked explicitly for your permission in a manner that is obvious and clearly states what information will be collected or transmitted. For more detail, it should be easy to find a privacy policy that discloses how the information will be used and whether it will be shared with third parties.
Collecting and utilizing bookmarks without disclosure could fall into that category of application behavior, and what it boils down to is that either we trust Google to adhere to their publicly stated standards, or we don’t. Personally, I do - if only by reasoning that “do no evil” has far more long term value than utilizing sneaky, deceptive trickery and deception could ever possibly hope to accomplish.
There’s no shame or apology intended in admitting an unmitigated, undeniable bias toward Google, substantiated by at least but no less than one known black_hat accusation of being guilty of “Google spin.” But sometimes - in spite of it all, and in spite of the fact that I personally spent untold efforts for several years to protect their black hat asses from exposure in public posts - sometimes a reality check is in order, albeit some are pitifully void of truthful reality.
The folks at Google certainly are smart cookies and totally PR conscious (not PageRank), there’s no denying it. Somehow I think they didn’t all of a sudden go off the deep end and put their heads up their behinds.
Apr
5
xan picks up steam on patents - ah, the purity of search scientists!
Filed Under Search Engines | Leave a Comment
Apparently the discussion on the new Google patent application at Search Engine Watch Forums has touched a nerve, and xan, one of my favorite people to read, has blogged about patents - at length
I always enjoy how refreshing it is to read these search science people. There’s a purity of perspective and a candor about them that’s about a mile away from the acoutrements (sp?) that generally accompany the usual suspects who post at forums from the SEO mentality and viewpoint.
xan’s blog is gourmet fodder for anyone who likes to feast on the high-calorie, weightier matters of search.
Apr
5
The company was acquired by Google a while back, and the CIRCA White Paper was removed from the site, but this seems to be the original patent at the U.S. Patent Office
Meaning-based information organization and retrieval
Article at Resource Shelf about the acquisition, with a working link.
Apr
2
Actually, at this point in time this is just a patent application and hasn’t been granted or assigned to Google yet, but that’s no indication that some of the factors haven’t been or won’t be applied at some point in time.
There’s a thread about it at Search Engine Watch Forums
And randfish has posted a link in the thread to his analysis of the contents of the patent application
Google’s Patent: Information Retrieval Based on Historical Data
In a nutshell, it’s mostly about tracking the history of a site/domain name and various types of effects it could have on scoring. There’s so much territory covered that it’s almost like the group of people responsible for the invention had a roundtable brainstorming session and put in everything they could think of.
While most of the discussions around are focusing on age of sites and links, there are a couple of other points touched on that I find particularly interesting:
1. Section [0020] is interesting, with a fairly broad view of what constitutes a document. Included are web pages, forum posts, emails, etc. - and most interesting, looking at a web site as a document unit.
2. In Section [0037] mention is made about a minimum threshold number of pages being required to qualify as the inception date of a site. What this could mean is that one page placeholders just to get a headstart on getting a site into the index may accomplish that, but won’t qualify the site as far as longevity factors are concerned.
3. Beginning with Section [0045], the focus is on new associated pages and an update frequency score, with mention made in [0046] and [0049] about the frequency of updating affecting scoring. Section [0050] is very meaty, with several important points made about new pages associated with the document, rate of increase and proportion.
All in all, those are the sections that initially catch my interest the most, and while a lot else may be conjectural at this point in time, it’s particularly that last one, which the patent application goes into more detail about, that has the most possibility for implementation within what can be considered the “normal” realm of practice and practical application.