Google Needs a New Search Algorithm

Written by Stone Tapes

Google Search is important for every web publisher. It drives traffic to a site like no other entity on the web. It may have been reduced in importance by social media, but it is still the only (mostly) no-effort way of getting browsers to view your pages. You simply post the page, wait for it to be crawled by Google, and sit back and expect visitors.

As the quality of the pages you are offering is bound to be high, probably the best on the web of their kind, you really can expect Google to direct relevant searchers to your doorstep. Or at least you could if Google's algorithm was good.

How Google actually performs a search is therefore important to all of us.

Back in the early days, when Google just got started, it had a good algorithm. Page and Brin invented PageRank which is supposedly named after Page but when you think about it what else could it have been called. PageRank ranks a web page in a very logical way - not by quality, but by the probability that a browser will arrive at the page simply by randomly clicking links. The idea is that from a random start page the browser clicks a random link, then another random link on the page that appears and so on. This might seem like a silly idea but on reflection you should be able to see that how the probability of randomly finding a page by following links is a measure of how well connected a page is. As long as the connectedness of a page is correlated with its quality etc then everything works and PageRank is a meaningful quantity.

What was important about PageRank is that it gave the early Google search engine a math rational for being better. This wasn't just any search engine - Google was the one that did it right. It used PageRank, the ingredient the other search engines didn't have. They didn't have it because Stanford University owned the patent, Brin and Page did their work while students at Stanford, and Google was the only licensee.

So for a while Google had the magic spice of PageRank and the others just had to try to pretend that it didn't matter.

In truth, PageRank was a bit of a red herring. Google Search delivered a complete package. It was lightening fast - something it still prides itself on - and it was easy to use. Users adopted it because it seemed to work, or at least it seemed to work better than the other search engines.

PageRank suffered because SEO people started to find ways of spoofing it. Even today a common cry is "We can improve your PageRank". As a result, search engines had to go in for security by obscurity. Instead of publishing academic papers proclaiming how they worked, search engines had to hide their inner workings in an attempt to stop people from thinking up ways of getting better positions in search results.

This is where things started to go wrong.

Google attempts to cling to PageRank as something that separates it from the pack. It is still its magic ingredient and the focus of the SEO community helped to keep it a prime asset. Among those who knew, PageRank was the reason you used Google and it was the way you attempted to manipulate Google's results.

However there were, and are, some really big problems with PageRank. The most important is that it takes a lot of computer power to work it out. As the web grew ever bigger, the task of computing PageRank also grew ever harder. More machines were needed, but even with Google's extensive hardware, computing PageRank was a task that took months.

PageRank was available as a measure of the importance of a page via the Google Toolbar, Webmaster Tools, and as part of the now defunct Google Directory. With the increasing difficulty of computing PageRank, the frequency of update dropped. The last update was in early 2010 and many speculate that there will never be another update. It seems likely that PageRank has been deprecated - even though Google will probably never admit it.

Mean while Google has been working hard to emphasize how clever its indexing and search algorithms are. It has implemented all sorts of tweaks and makes a fuss about having found ways to eliminate various types of spurious content from search results. The effort extends to trying to convince web owners that PageRank really wasn't the metric they were looking for. However, old habits die hard and most web masters still get lots of spam emails promising to increase their PageRank.

The big problem for Google is that, while it can tweak the index and search algorithm without a "big idea" like PageRank, Google search is now like any other search facility. The playing field is level and the best man, or algorithm in this case, will win. There is no reason any more that Google Search is superior.

If you have spent a few minutes evaluating the results that Google returns, you will quickly come to the conclusion that things could be better. For example, look for a review of some product or other and you will be swamped with sites that are asking users to be the first to review the product. You can probably come up with your own example of how useless Google is at finding what you are looking for. As the web expands its range of offerings, and we come to depend on it for more and more things, search needs to get sharper.

What is the solution?

Perhaps someone will think up something like PageRank and do the job better - but this seems unlikely. PageRank was never a good way to do the job. Using links as a measure of what is important is clearly the wrong way to do it. At best it was a stop gap until something better was thought up. Google's current search algorithms are presumably based on probability models of how phrases go together - so called Statistical Artificial Intelligence. This is the approach used by Google Translate and it works well doing a job that normally needs full understanding of the sentences being translated.

However, for search, such statistical AI really only just gets you off the starting blocks. There is no obvious way of using statistics to gauge a page's importance or relevance. What is needed is some real AI, and this is where Wolfram Alpha and Apple's Siri come in.

Wolfram Alpha is a search engine that tries to use full AI methods to understand what you want and retrieve it. Siri mostly adds a voice input/output function to Wolfram Alpha and other similar search facilities. The point is that currently Siri is something of a success and is slowly but surely taking traffic away from Google.

News that Google is working on a competitor to Siri for Android should come as no great shock - but where is the Google AI based search engine to go up against the likes of Wolfram Alpha?

My guess is that it exists, is top secret, and hiding in Google X Lab - Google's secret lab.

Whatever the truth, we need something better than Google is offering at the moment.

Search Engines

Iris - Siri for Android proves Apple doesn't have an edge

More Information

"The anatomy of a large-scale hypertextual Web search engine" Brin, Page 1998

Google X Lab

Wolfram Alpha

To be informed about new articles on I Programmer, subscribe to the RSS feed, follow us on Google+, Twitter, Linkedin or Facebook or sign up for our weekly newsletter.

More from:

Last Updated ( Tuesday, 25 September 2018 )

Recent Articles

Recent Book Reviews

Popular Articles

Related articles:

More Information

More from: