How the information team impacts a global pharma company’s pipeline

A big challenge for major pharma companies these days is to systematically spot promising new ideas from universities and biotechs. The days when a pharma company could base its future on discovering a few new drugs are gone, so to grow a pipeline of new drugs, companies explore new therapy areas and technologies. And in these new areas, even the best companies need input from scientists around the globe. However, the sheer amount of new data coming out makes it impossible to go over everything manually. On a weekly basis a researcher in diabetes, nonalcoholic steatohepatitis (NASH) or heart disease could easily face 500 new publications, patents, grants, or start-ups that all look relevant on the surface.  

So how do you systematically screen and uncover the most relevant candidates for further review by scientists? 

“Can you build it?”

This was the challenge my colleagues and I were facing when the Head of Early Research contacted the information department a few years ago. “Can you build it? What do you need to get there?”

“Of course we can,” my director said, assembling a team of great information scientists with competencies in surveillance, natural language processing (NLP), information sources, and our therapy areas. To make it all come together I was asked to project manage, a huge – but exciting – challenge for an information professional who has spent most of his career in the intersection between information and IT.    

Now, to create the systematic surveillance requested, we needed three things: 

  1. Content from a broad range of sources 
  2. An effective way to filter the content 
  3. An efficient way to share the relevant content 

When it comes to sources, obviously literature, database pipelines, and patents come to mind, but as the early bird catches the worm, conference presentations, tech transfer offices, and news about startups can be interesting additions to extract new ideas and insights.  

It’s not “just” a search

We wanted to create streams of information targeting specific groups of researchers – maybe even individuals. However, we could not “just” search traditionally because what would be our starting place? When we look for something new – potentially groundbreaking – we do not know the name of the company, the drug, the mode of action, or the gene target.  We just know the overall therapy area and we would want to look at anything in the early phases that can be considered novel.   

To help narrow down the results, a mix of NLP, artificial intelligence (AI), and human review was applied. 

  • Using a text mining tool, we could extract key concepts like companies and gene names from the texts. The entity extraction pulls out gene names, drug names, company names and helps normalize them according to ontologies. Suddenly, we had structured data that could be sorted, linked, and reviewed in Excel-like columns rather than thousands of bits of unstructured text.    
  • AI helped to determine how similar a new piece of text was to something previously deemed interesting. This looked at the linguistic fingerprint of incoming articles and compared it to training sets. It is far from perfect – but still provides an indication as information. For example, later stage clinical activities would have a different fingerprint than the ones we were looking for on very early exploratory science.  
  • Human review is still very valuable as added experience and common sense to override the AI when it´s wrong. An expert with extensive experience in the field will start to see patterns and can alert scientists to these.  

The end result was highly curated newsletters with the most relevant opportunities. These were shared broadly – not just among core scientists – but anyone who was able to give input on the quality and feasibility of the ideas coming in.  Now, a few years later, the service has expanded to eight business areas with good feedback, but the demand is even bigger. Now, the challenge becomes how to scale it – big time.  

Can you do double the work in half the time?

Once we had a working solution, we began turning our attention to scalability. The question became “Can we do double the work in half the time?” We thought we could, but – only if we did it differently.  

Human curation is expensive and limits the ability to scale into new areas as needed.  We already implemented a couple of machine learning algorithms in the process to help rank and extract key points from the unstructured text.  But how could we get AI one step closer to human performance?  


  • What if the algorithm was instantly aware that we have worked on this target before?  
  • What if it could see the similar drugs competitors have in the pipeline at this very moment?  
  • What if it could see if a piece of news makes a splash on social media?  
  • What if it could see the credibility of the research group behind the publication?    
  • What if we could see a timeline for the company or research group based on what has been picked up before? 

And what if this information is used to rank and present incoming data?: 

  • Would we be able to rank content more efficiently?  
  • Would researchers be able to review  more content faster?  
  • Would we make better decisions on what to dig deeper into? 

Tantalizingly, all the data needed for this already exists. However, there are practical barriers in terms of access, licensing, and different user interfaces. It is very time consuming to check each source manually. As a result, the information adding value rarely comes into play and doesn’t help decision making. 

What we should aim for is presenting any new piece of information with the context we already have available.  To get there, we need to link and integrate the incoming data to data from existing internal and external systems. Think of this as a fun but challenging job for information professionals, scientists, and developers in collaboration.  

The end result will speed up evaluation and opens opportunities to present large amounts of data to scientists in dynamic ways according to their preferences.  You might even consider building a profile around each scientist to learn about these preferences.  

For the scientist, the value is having the latest opportunities match their preferences served on a regular basis. And when it is served with enough context to make a more informed decision, we impact the core process of early discovery.  In pharma, good decisions and time equal money both saved and gained since we then focus resources on the best possible opportunities rather than going into the lab with something that has already failed elsewhere.  

Do you need surveillance too?

If you find yourself in a similar situation – looking for scalable surveillance that helps you effectively identify the most relevant candidates to fill your pipeline – my first suggestion to you would not be to build everything from scratch. Instead, you can evaluate new systems appearing in the marketplace. When looking for a solution, ask yourself these key questions: 

  • What kind of content is key for your users?  
  • If you cannot find it all in one place: What kind of integrations with other data (internal or external systems) would you need?  
  • How automated should it be vs. how much noise can you live with in the alerts?  
  • What options do you have to deliver targeted information to key groups in your company?    

CCC’s RightFind Suite offers robust software solutions to fuel scientific research and simplify copyright anytime, anywhere, including personalized search across multiple sources of data for highly relevant discovery, and scientific articles to power AI discovery. CCC’s deep search solutions offer all the market intelligence you need, without the noise.

Related Reading: 


Author: Brian Michels Schurmann

Brian Michels Schurmann is an information architect and project manager based in Copenhagen, Denmark.   He has worked plus 20 years in the pharmaceutical industry creating and managing information solutions targeted to both scientists and a corporate audience. He has a background in information science.
Don't Miss a Post

Subscribe to the award-winning
Velocity of Content blog