How Does Aggregated Search Work?


The following is an excerpt from Phill Jones’ recent white paper The Top Trends in Knowledge and Information Management. Download the full paper here

 

The goal of aggregated search is to provide integrated search across multiple, heterogeneous sources. There are broadly two technology approaches that have been used to achieve this; federated search and web-scale indexing, but those aren’t the only options.

Federated search

This approach passes the search query to multiple databases behind the scenes. Early versions used screen-scraping, which failed when a content provider changed their website. More recently web technologies like APIs have made this approach more robust.

Advantages:

  • Compatible with a broader variety of information sources. Many proprietary content providers won’t allow content to be indexed.

Disadvantages:

  • Can be fragile under some circumstances.
  • Search speed is limited by content provider systems.

Web-scale indexing

The best-known web-scale indexing service is Google. This technique involves creating a database of all the content needed to be searched. The index is searched like the index of a book and linked back to the content.

Advantages:

  • Fast and robust search creates a compelling user experience.
  • Computational techniques like indexed knowledge graphs can automatically surface connection between different sources.

Disadvantages:

  • Not all sources can be indexed. Google, for example, can’t index content behind a paywall.
  • Content needs to be regularly re-crawled to keep the index up to date.

A third way — fully aggregated search

Neither federated search nor web-scale indexing provides the perfect solution for a commercial knowledge and information management environment. A fully integrated approach can index content when possible and a knowledge graph of objects, concepts and connections can be created. Although the creation of real-time knowledge graphs is not computationally feasible, results that need to be retrieved in real time can be readily mapped onto a pre-existing graph. This hybrid approach can provide the best of both worlds.

Keep learning:

Topic:

Author: Ray Gilmartin

Ray Gilmartin was Director of Corporate Solutions for CCC., responsible for knowledge management products within the Corporate Business Unit. Before joining CCC, he served in several leadership roles at Akamai, Avid Technology, and HP after beginning his career in TV journalism roles at Hearst Broadcasting and the Christian Science Monitor.