Before You Begin: What Your Team Needs to Know About Text Mining

Text mining uses sophisticated natural language processing (NLP) techniques to quickly analyze massive volumes of biomedical literature. It can transform your organization’s approach across the drug development pipeline – from early phase drug discovery and clinical trial development to pharmacovigilance.

Imagine being able to give your laboratory scientists a head start by extracting candidate relationships between whole classes of concepts like genes and diseases from your organization’s existing information resources, at scale and with confidence. Or envision supporting your pharmacovigilance with incredibly precise NLP search strategies, vastly reducing time spent on false positives and focusing their efforts on tracking down meaningful issues. These and more are the promise of a text mining initiative done right.

But before you begin, here are three things your team needs to understand about text mining:

1. Text mining relies on data that’s ready to be mined

Across the life sciences / pharmaceutical industry, information is often siloed and stored in multiple varying formats, reducing its usefulness. Structured or semi-structured content may be in multiple schemas, requiring additional data cleansing and normalization work. Scientific literature may be licensed for only certain uses, and subscription agreements typically do not include permission to conduct text mining activities. Each of these is an obstacle to realizing benefits from your organization’s existing content investments and its text mining efforts.

Related Reading: The Benefits of Text Mining Full Text Instead of Abstracts

2. A strong relationship between bioinformaticians and information managers is key

Collaboration between informatics and information management teams can overcome these challenges. Information managers understand how scientific literature and other resources are consumed within the organization. They also manage external publisher relationships – ensuring a fit between licensed content and the information needs of the organization. Bioinformaticians and other informatics professionals understand data interoperability and the information architecture required to practically apply text mining to solve organizational problems.

3. Have a clear goal in mind – but be realistic about instant benefits

Like most organizational efforts, a text mining initiative is more likely to succeed if stakeholders agree on what success looks like at the outset. Identify an appropriate use case by looking for situations where internal teams struggle to synthesize findings from large amounts of data, have difficulty staying on top of current findings, or suffer from low signal-to-noise ratios in their information resources.

Even with the right use case, keep your expectations reasonable. It takes time and effort to source a proper text mining solution, conduct proofs of concept, evangelize internally, and ultimately scale your efforts across the organization. While you and your team might recognize this, your stakeholders need to share a similar understanding.

By working together on text mining programs, bioinformaticians and information managers can deliver real insights to the organization from relevant data, leading to enhanced drug discovery, more efficient literature monitoring, and fewer information dead ends.

Ready to improve your text mining results? Learn more about RightFind XML.

Keep Learning:


Author: Mike Iarrobino

Mike Iarrobino is CCC's product manager for content and rights workflow solutions RightFind® XML for Mining and RightFind Music. He has previously managed marketing technology and content discovery products at FreshAddress, Inc., and HCPro, Inc. He speaks at webinars and conferences on the topics of content discovery and data management, and loves to get into conversations about the nature of free will.
Don't Miss a Post

Subscribe to the award-winning
Velocity of Content blog