Google's Caffeine seems an apt name for its' new search index technology. Processing several hundreds of thousands of pages in parallel every second would give most anyone the jitters.
Google new search index Caffeine processes hundreds of thousands of pages in parallel every second
With the exponential growth of social media (e.g., blogs, tweets, etc.) and range of content added almost constantly (think video) to pages, content creators want their information visible in search in a more timely fashion.
Caffeine aims to cut the lag time by the constant parallel indexing of information - 50% fresher than their previous index Google claims. All while expanding the reach of content they add -- in both volume and variety.
Here's an excerpt from Google's official blog:
With Caffeine, we analyze the web in small portions and update our search index on a continuous basis, globally. As we find new pages, or new information on existing pages, we can add these straight to the index. That means you can find fresher information than ever before—no matter when or where it was published.
Caffeine lets us index web pages on an enormous scale. In fact, every second Caffeine processes hundreds of thousands of pages in parallel. If this were a pile of paper it would grow three miles taller every second. Caffeine takes up nearly 100 million gigabytes of storage in one database and adds new information at a rate of hundreds of thousands of gigabytes per day. You would need 625,000 of the largest iPods to store that much information; if these were stacked end-to-end they would go for more than 40 miles.
Realizing the goal of near-realtime search will be a moving target as the volume and variety of content grows over time. Along with the its potential for utility, this desire for speed will likely create new challenges for content producers (such as recalling certain content, etc) and search engines alike.


Comments