What Does the Future Hold for Semantic Search in Science?


Many years ago, when I was learning about “command languages” used in online search tools such as DIALOG and BRS, I recall asking about certain strange (to me) groups of characters that made no natural-language sense to me, the first time I saw them. They appeared in only one of the databases we searched. My professor, a search language guru, explained: “Those are reserved codes used in the Oil & Gas industries. And nowhere else,” she intoned with the voice of wisdom and experience.

Since the advent of generalized search engines – such as Google, but in fact there were several good ones around a few years before that – everyone knows (or think they know) about the basics of searching for information via their computer and a browser.

Scientific search, by contrast, is a specialized corner of the search industry. When the literature of your subject discipline entails such aspects as a controlled vocabulary (possibly including taxonomies and ontologies) and other discipline-specific metadata, as a researcher and author you are probably better served by a semantic domain-aware search tool.

For Example: A screenshot from NLM’s PubChem search utility:

Note: Although I used an example from chemistry, one could as easily look to Physics or Geology. There are probably as many discipline-specific tools as there are scientific domains.

It follows that a search tool optimized for organic chemistry is not likely to work well at all when applied to questions in quantum astrophysics. They are not, as the idiom has it, “talking the same language.” One approach, however, that is useful across disciplines is that known as “semantic search.” Semantic search basically describes a cluster of techniques that enable the algorithms of a search utility to probe a dataset for concept correlations (words and phrases traveling together) and, from these, to draw reliable inferences about ‘hidden’ relationships lurking in the text. It is an interesting angle, I think, because it goes beyond the simple “type in a bunch of keywords and hope for the best” tactic that most of us use, and it applies those semantic techniques to get more relevant results, more quickly, when the datasets include predefined scientific concepts. It’s something I have been interested in (at a distance) for a while.

Some of these thoughts came to mind again recently when I read an announcement from my employer (CCC) and its partner SciBite which heralded the incorporation of semantic search into our product RightFind Navigate. Initially developed for use in the biomedical domain, this approach has opened to me possibilities  – for better-targeted concept searching and e-discovery across scientific domains – that seem huge.

While I’m not in contact with my old prof. anymore, I wish I were so we could sit down over coffee and discuss the brave new world these new tools may open.

Read more about Semantic Search & Enrichment:

Topic:

Author: Dave Davis

Dave Davis joined CCC in 1994 and currently serves as research analyst. He previously held directorships in both public libraries and corporate libraries and earned joint master’s degrees in Library and Information Sciences and Medieval European History from Catholic University of America. Dave is fascinated by copyright issues, content licensing and data. Also, rock and roll music.
Don't Miss a Post

Subscribe by email to the Velocity of Content blog