A Search Engine Architecture Based on Collection Selection

2 A Search Engine Architecture Based on Collection SelectionGoogle Tech Talks
December, 19 2007

ABSTRACT

We present a distributed architecture for a Web search engine, based
on the concept of collection selection. We introduce a novel approach
to partition the collection of documents, able to greatly improve the
effectiveness of standard collection selection techniques (CORI), and
a new selection function outperforming the state of the art. Our
technique is based on the novel query-vector (QV) document model,
built from the analysis of query logs, and on our strategy of
co-clustering queries and documents at the same time.
By suitably partitioning the documents in the collection, our system
is able to select the subset of servers containing the most relevant
documents for each query. Instead of broadcasting the query to every
server in the computing platform, only the most relevant will be
polled, this way reducing the average computing cost to solve a query.
We introduce a novel strategy to use the instant load at each server
to drive the query routing. Also, we describe a new approach to
caching, able to incrementally improve the quality of the stored
results. Our caching strategy is effectively both in reducing
computing load and in improving result quality. The proposed
architecture, overall, presents a trade-off between computing cost and
result quality, and we show how to guarantee very precise results in
face of a dramatic reduction to computing load. This means that, with
the same computing infrastructure, our system can serve more users,
more queries and more documents.

Speaker: Diego Puppin

Duration : 0:33:1


3 Responses to “A Search Engine Architecture Based on Collection Selection”

  • AskASearchEngineGuru says:

    Good idea, however …
    Good idea, however I find only one other approach to make it speedier.

  • vicaya says:

    Sorry, this …
    Sorry, this strategy doesn’t work well with long tail and personalized search load. The indexing cost (I’d consider cluster selection an indexing phase) is much higher as well. For aggregate performance, a much simpler caching strategy (multiple (for different types/languages etc.) doc.part + (pre-computed/trained) distributed query cache) can be built that match or outperform this complicated solution.

  • wildchildplasma says:

    The crusing …
    The crusing capabilities of ac tive data clouds you mean?
    One day it’ll know the kind of stuff i want and i won’t even have to make entries all the time. (Standard unified ratings data).
    I’ll also be able to talk to a bot wich wil adapt it’s data personality as to know me better.

Leave a Reply

  • Google Readying Google Drive, Dropbox-Like Cloud Storage February 11, 2012
    Google is readying a Dropbox-like service and will soon enter the cloud computing storage market. The Google Drive is expected to allow users to store documents, photos and videos on the cloud and make them accessible from any connected device. […]
    V3
  • LinkedIn Shares Mobile Advertising Ambitions February 11, 2012
    During its quarterly earnings call, CEO Jeff Weiner said mobile access represents an ever-growing share of time users spend with LinkedIn. He said the company is investigating ways to monetize those page views with mobile advertising. […]
    ClickZ
  • Yahoo Search Engine Market Share Slips in January 2012 February 10, 2012
    A month after Bing surpassed Yahoo to became the No. 2 search engine in the U.S., Yahoo’s search share continued to decline in the New Year, according to comScore. Meanwhile, Google climbed past a 66 percent share of the search market. […]
    Danny Goodwin
  • Craig Silverstein, Larry & Sergey’s First Hire, Quits Google February 10, 2012
    Craig Silverstein, the first employee hired by Google founders Larry Page and Sergey Brin in 1998, is leaving the company. Silverstein, who helped build the Google search engine, met Page and Brin while the three attended Stanford. […]
    Danny Goodwin
  • Google Retires Drop-Down Menu, Black Navigation Bar Returns February 10, 2012
    Google’s drop-down menu, which was introduced in late November as part of the “next stage” of its massive redesign, is history. Google will revert to an updated version of the black navigation bar that began appearing ahead of the launch of Google+. […]
    Danny Goodwin
February 2012
M T W T F S S
« Aug    
 12345
6789101112
13141516171819
20212223242526
272829