The following is an excerpt from Accessing and Analyzing Relevant Content in Today’s Information Chaos.
Semantic enrichment is the ingredient behind getting relevant search results even if they don’t use the same terminology as the query. For example, a query for “rare disease drug approval” would include results for the Orphan Drug Act from the FDA. Google recognizes that “drug approval” relates to “government regulations.” It also knows “orphan drug” and “rare disease” are associated, though different terms are used.
Compare this to another scenario. You’ve been asked to pull out an important piece of information that was emailed. You scour all your emails but cannot recall the exact verbiage or phrase in the subject line. The email’s text-based search function is unlikely to return the correct result unless you use the precise word, which inevitably leads to multiple search attempts and time lost hunting through emails.
The vast differences in algorithms between our two examples — Google and a simple email search — show the power and utility of semantic enrichment in our daily lives. We’ve grown to rely on search tools to automatically include appropriate synonyms.
Where it comes into play
To eliminate the noise and provide relevant search results, information solutions must go beyond simple keyword matching and to use search engines and algorithms that link concepts, topics, and associations to form a deeper understanding of a user’s intent.
For instance, a researcher in pharmacovigilance may need to identify and list all potential Injection Site Reactions (ISRs) before an upcoming clinical trial. Searching published materials might identify traditional symptoms such as sore arm, redness, and inflammation. However, without integrating the company’s Adverse Event or Safety database, the search results could miss other unknown reactions such as itching, eczema, and hives.
To tap into external and internal data sources, it becomes necessary to use biomedical vocabularies and ontologies (e.g., NIH’s MeSH [MeSH Browser, n.d.]) which are semantically enriched and indexed. The result would be that a search for “Injection Site Reactions” could produce results from known ISRs that had been published previously and catalogued and could also draw from adverse events gleaned through internal sources. A comprehensive solution would account for a company’s particular ontology as well as the various vocabularies specific to different organizations within the company.
Challenges and opportunities
While Google continues to evolve its search algorithms, biomedical research has its own set of challenges as noted in the article “Dug: A Semantic Search Engine Leveraging Peer-Reviewed Knowledge to Span Biomedical Data Repositories” by Waldrop et al: “Despite the practical utility of Google’s proprietary knowledge graph for general search, the provenance, depth, and quality of its biomedically relevant connections are not easily verifiable. There remains a need for a search tool capable of leveraging evidence-based biological connections to show researchers datasets useful for hypothesis generation or scientific support.”
This is where functionality beyond linking key terms evolves into topic-linking (or topic co-occurrence). Like Dug, scientific communities and commercial entities are collaborating to improve semantic search. Continuing to build dictionaries and structures to organize, link, and catalog scientific data will require standardization and sustained commitment.
Life science companies should look to software solutions that embed semantic enrichment to find relevant scientific concepts faster and to accelerate new discoveries.
Keep reading Accessing and Analyzing Relevant Content in Today’s Information Chaos.
Learn more about finding relevant content across data sources with semantic search in RightFind Navigate.