View Single Post
Old 04-22-2011, 12:53 AM  
AdultKing
Raise Your Weapon
 
AdultKing's Avatar
 
Industry Role:
Join Date: Jun 2003
Location: Outback Australia
Posts: 15,601
Quote:
Originally Posted by cam_girls View Post
Google Yahoo etc. use economies of scale.

You can't search 5 billion pages in 0.02 seconds on 1 server. The query is split over the whole network, distributed processing.
Actually, you will find that the data sets are built in such a way that searches across big data sets can be achieved by relatively little processing power. This is all to do with how the data set is built at the indexing level. Very clever algorithms allow you to search very massive data sets very quickly using just one CPU as the back end systems do the hard work.

Quote:
It takes the same total computer power to handle 200 million queries a day, but by using parallel processing you get the results 1000 times quicker. Each server tackles under a million web pages.
Fairly simplistic, overly simplistic explanation. The data set is one complete entity. Preprocessing of the data set is what is key here, there are different approaches such as tokenization, term stemming, crossword indexing, weighting, ranking, the list goes on.

While a query on the index can consume the processing power of 1000 machines, according to early reports of the Google platform, this is misleading, as the machines are built such that you are really just dealing with component systems which may be built from many "servers".

There is a very good paper about the subject of using GPU's within a search architecture at http://koala.poly.edu/GPU.pdf
AdultKing is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote