Semantic search capabilities have the power to help solve many intractable problems across different functional areas and teams, including early phase research, competitive intelligence, drug safety monitoring and more.
But while semantic search is powerful – does it completely remove the need for manual indexing?
Let’s break it down.
Semantic search applies the intent of the researcher and the context of the terms within the corpus to improve the relevancy of the query results. Consider the differences in results that may be retrieved when searching for aids in the context of hearing aids versus Acquired Immunodeficiency Syndrome. A well-functioning semantic search system can help disambiguate these concepts and determine which concept the user intends to locate through their literal aids query. Likewise, from the source corpus of content, semantic search can help apply the surrounding context of the text to retrieve results that align with the user’s intent. Semantic search systems may use automated methods such as named entity recognition or graph-based models, to achieve this result.
MEDLINE is an example of a manually indexed database in which curators manually apply the hierarchical Medical Subject Headings (or MeSH) to the database records. These headings facilitate information retrieval workflows for users of PubMed – researchers can conduct headings-based searches for MEDLINE records, browse the subject headings to locate articles of interest, and use the headings applied to an article as an indicator of its relevancy when conducting keyword search.
There is still room for both approaches in the market presently, meeting the needs of different use cases – a testament to this is the continued reliance on a system like PubMed for literature review and search at countless R&D organizations globally. Any entrant to the market must clearly prove itself beyond this freely available resource.
There will be downward pressure on manual indexing as the volume of research continues to grow. However, there is likely always a place for high-quality, manually curated data sets, even as the volume of data processing tasks taken on by machines grows.
Ready to keep learning? Check out: