This is my attemp to tackle the greatest question of modern times: how many blade PCs does Google run? The initial guess in 2003 was 10,000 and then recently the consensus seems to be in the order of 100,000. But these have remained guesses at best. Here, I'll try and see if this number can be deduced, from various requirements (network BW, storage, CPU, etc..)
First, let's see how much bandwidth Google might need. Then, we'll divide this by how much BW a PC can
handle, to get the number of PCs needed.
I'll constrain the analysis to North-America.
Similar analysis can be done for other geographies.
Again, since search is the main Bread&Butter of Google, this analysis is restricted to search.
OK, first some numbers that'll help us. U.S. pop: 300 million. Internet penetration in US: 70%. Google market share: 30%.
So assuming that, on an average, a person using google does 20 searches a day and clicks through 1.5 pages of results for each search (supposedly 90% of users find the results they want on the first page, so I think an average of 1.5 might be reasonable). Each web-page served up by Google is approximately 25KB in size (you can check by saving the web-page). So, an average person needs: 20x1.5x25 = 750KB of data from Google in a day.
The bandwidth served per day out of Google is therefore, 0.3 (mkt-share) x 0.7 (internet-pop) x 300 million x 750KB
which is around 48 million MB. The bandwidth persecond is then: 4,375 Mbits/sec (note conversion to bits)
which is basically around 4Gbps! If Google has just one data-center for whole of US (highly unlikely), they'll need a OC96 pipe. You can calculate how much Google would need to pay for leasing at OC96 link :)
Now, finally, how many PC's are needed to throw out 4.375Gbps? Most PCs today come standard with 1Gbps network interfaces, so Google would need only 5 PCs! :) This, of course assumes that a PC can sustain peak throughput of 1 Gbps and that all bits put out reach the target. A more feasible number, I think, (without getting into the details for this assumption) would be more in the range of 1Mbps. In this case, Google would need around 5,000 PCs.
Well, analyzing the bandwidth requirement showed a surprising statictic but not something that can be used to accurately guess (!) the number of PCs Google has: anywhere between 5-5,000? Too vague. So time to look at another metric: Storage.
In bandwidth requirements, I have completely left out another part of the equation, the bandwidth needed by Google's crawlers. I'll get back to this later. Other things that make the bandwidth estimate conservative is that this only looked at search results web-pages. If you had image-search, news pages, maps, Gmail, etc. the bandwidth requirement will be much much higher.
Again, lets restrict the discussion to just web-pages. The analysis here will assume that Google is
audacious enough to store all web-pages in-memory (in RAM). Search queries follow the usual 80:20 rule,
i.e. 80% of queries are for 20% of pages, so maintaning a cache of the 20% of most requested pages should
work quite well, but lets assume Google stores all web-pages in RAM.
The current estimate of the total number of web-pages is around 4 billion. Lets assume, on an average, each web-page is 100KB. This is pure conjecture but it is something to start working with. Also, lets assume that each blade uses 2GB of RAM to store web-pages. So simple math suggests Google should have: (4 Billion * 100KB) / 2GB blades which is approximately 200,000 blades.
Why have only 2GB on each blade to store the web-pages? Well, assuming each blade has a maximum of 4GB storage (32 bit machines), the remaining 2GB is to do the "rest of the stuff", like maintaining indexes of their data, running page-rank algorithm, etc. And what about redundancy to these in-RAM web-pages. Well that is what the hard-drives on each of the blades are for. If each balde comes with a 60GB hard-drive, then Google has close to 11 Petabytes (note, not terabytes) of 'slow memory' that can be used for backup, Gmail :), etc.