When people think about real world evidence, they generally think about using this data to address questions around drug effectiveness, or population level safety effects. But there are many applications that “real world data” can address.

If you think of real world data as any type of information gathered about drugs in non-trial settings, a whole world of possibilities opens.

  • Social media data can be used to understand how well packaging and formulations are working.
  • Customer call feeds can be analyzed for trends in drug switching, off-label use, or contra-indicated medications among concomitant drugs.
  • Full-text literature can be mined for information about epidemiology, disease prevalence, and more.

Text Mining transforms real world data to real world evidence

Many of these real world sources have free text fields, and this is where text analytics, and natural language processing (NLP), can fit in. At Linguamatics, organizations use text analytics to get actionable insight from real world data – and find valuable intelligence that can inform commercial business strategies.

Here is a look at two use cases where text mining has transformed real world data to real world evidence.

Related Reading: Pharma Turns to Real World Evidence to Overcome the Odds

Use case 1: Evidence landscape from literature for drug economics

Understanding the potential for market access is essential for all pharma companies, and information to characterize the burden of disease and local standard of care in different countries across the globe is critical for any new drug launch. Companies need an assessment of the landscape of epidemiological data, health economics and outcomes information to inform the optimal commercial strategy.

Valuable data is published every month in scientific journals, abstracts, and conferences. One of Linguamatics’ Top 10 pharma customers decided to utilize text mining to extract, normalize, and visualize these data. They then used this structured data to generate a comprehensive understanding of the available evidence, thus establishing the market “gaps” they could address. Focusing on a particular therapeutic area of immunological diseases, the organization was able to develop precise searches with increased recall across these different data sources, including full-text literature.

Linguamatics I2E enables the use of ontologies to improve disease coverage, and to incorporate domain knowledge to increase the identification of particular geographical regions (for example, enabling the use of the adjectival form of the country, e.g. French as well as France, and cities, e.g. Paris, Toulouse). I2E also extracts and normalizes numbers, which is useful to standardize epidemiological reports for incidence and prevalence of disease. Searching within full-text papers can be noisy, and I2E allows search to be specific, and to exclude certain parts of the document from a search, such as the references.

I2E can provide the starting point for efficiently performing evidence based systematic reviews over very large sets of scientific literature, enabling researchers to answer questions around commercial business decisions.

Use case 2: Gaining insights from medical science liaison professionals

Conversations between medical science liaison (MSL) professionals and patients or healthcare professionals (HCPs) can lead to valuable insights. The role of the MSL is to ensure the effective use, and success, of a pharmaceutical company’s drug. MSLs act as the therapy area experts for internal colleagues, and maintain good relationships with external experts, such as leading physicians, to educate and inform on new drugs and therapeutics.

Thierry Breyette, Novo Nordisk, presented at Linguamatics Text Mining Summit 2016 on “Generating actionable insights from real world data”. The figure shows a map of where particular topics are discussed, and what materials are used.

Top pharma company Novo Nordisk uses text mining to gain clinical insights from MSL interactions with HCPs. These interactions may be broad ranging, covering topics such as safety and efficacy, dosing, cost, special populations, indication, comparisons, competitor products, etc. MSLs may use approved slide decks, package inserts (PIs), factsheets, studies or publications to answer HCP questions. Linguamatics’ text mining platform I2E is used to structure these source files with custom ontologies (e.g. for material types, product, disease terminology variation, topics).

This analysis enables Novo Nordisk to better address what support HCPs may need in their interactions with patients, insurance providers, and other clinicians and invest in resource development appropriately.


Interested in learning more? Keep exploring:


Author: Jane Reed

Jane Reed joined Linguamatics in March 2014, as the head of life science strategy. She is responsible for developing the strategic vision for Linguamatics’ growing product portfolio and business development in the life science domain. Jane has extensive experience in life sciences informatics. She worked for more than 15 years in vendor companies supplying data products, data integration and analysis and consultancy to pharma and biotech – with roles at Instem, BioWisdom, Incyte, and Hexagen. Before moving into the life science industry, Jane worked in academia with post-docs in genetics and genomics
Don't Miss a Post

Subscribe to the award-winning
Velocity of Content blog