Integrating and Ranking Aggregated Content on the Web

The Theory and Practice of Aggregated Search and Whole-Page Composition

Jaime Arguello, Fernando Diaz*, and Milad Shohouhi

WWW 2012 Tutorial

Commercial information access providers increasingly incorporate content from a large number of specialized services created for particular information-seeking tasks. For example, an aggregated web search page may include results from image databases and news collections in addition to the traditional web search results; a news provider may dynamically arrange related articles, photos, comments, or videos on a given article page. These auxiliary services, known as verticals, include search engines that focus on a particular domain (e.g., news, travel, or sports), search engines that focus on a particular type of media (e.g., images, video, or audio), and APIs to highly-targeted information (e.g., weather forecasts, map directions, or stock prices). The goal of content aggregation is to provide integrated access to all verticals within a single information context. Although content aggregation is related to classic work in distributed information retrieval, it has unique signals, techniques, and evaluation methods in the context of the web and other production information access systems.

In this tutorial, we present the core problems associated with content aggregation, which include: sources of predictive evidence, sources of training data, relevance modeling, and evaluation. While much of the aggregation literature is in the context of web search, we also present material related to aggregation more generally. Furthermore, we present material from both academic and commercial perspectives and review solutions developed in both environments, which provides a holistic view for researchers and a set of tools for different types of practitioners.

Read the full abstract.

See the tutorial slides.

*Primary Contact