Search and Discovery – Copyright Clearance Center http://www.copyright.com Rights Licensing Expert Fri, 20 Apr 2018 20:37:46 +0000 en-US hourly 1 http://www.copyright.com/wp-content/uploads/2016/05/cropped-ccc-favicon-32x32.png Search and Discovery – Copyright Clearance Center http://www.copyright.com 32 32 2 Real World Examples: Using Real World Data for Commercial Pharmaceutical Product Insights http://www.copyright.com/blog/using-real-world-data-commercial-pharmaceutical-insights/ http://www.copyright.com/blog/using-real-world-data-commercial-pharmaceutical-insights/#respond Tue, 17 Apr 2018 07:58:15 +0000 http://www.copyright.com/?post_type=blog_post&p=16213 Here is a look at two pharmaceutical use cases where text mining has transformed real world data into real world evidence.

The post 2 Real World Examples: Using Real World Data for Commercial Pharmaceutical Product Insights appeared first on Copyright Clearance Center.

]]>
When people think about real world evidence, they generally think about using this data to address questions around drug effectiveness, or population level safety effects. But there are many applications that “real world data” can address.

If you think of real world data as any type of information gathered about drugs in non-trial settings, a whole world of possibilities opens.

  • Social media data can be used to understand how well packaging and formulations are working.
  • Customer call feeds can be analyzed for trends in drug switching, off-label use, or contra-indicated medications among concomitant drugs.
  • Full-text literature can be mined for information about epidemiology, disease prevalence, and more.

Text Mining transforms real world data to real world evidence

Many of these real world sources have free text fields, and this is where text analytics, and natural language processing (NLP), can fit in. At Linguamatics, organizations use text analytics to get actionable insight from real world data – and find valuable intelligence that can inform commercial business strategies.

Here is a look at two use cases where text mining has transformed real world data to real world evidence.

Related Reading: Pharma Turns to Real World Evidence to Overcome the Odds

Use case 1: Evidence landscape from literature for drug economics

Understanding the potential for market access is essential for all pharma companies, and information to characterize the burden of disease and local standard of care in different countries across the globe is critical for any new drug launch. Companies need an assessment of the landscape of epidemiological data, health economics and outcomes information to inform the optimal commercial strategy.

Valuable data is published every month in scientific journals, abstracts, and conferences. One of Linguamatics’ Top 10 pharma customers decided to utilize text mining to extract, normalize, and visualize these data. They then used this structured data to generate a comprehensive understanding of the available evidence, thus establishing the market “gaps” they could address. Focusing on a particular therapeutic area of immunological diseases, the organization was able to develop precise searches with increased recall across these different data sources, including full-text literature.

Linguamatics I2E enables the use of ontologies to improve disease coverage, and to incorporate domain knowledge to increase the identification of particular geographical regions (for example, enabling the use of the adjectival form of the country, e.g. French as well as France, and cities, e.g. Paris, Toulouse). I2E also extracts and normalizes numbers, which is useful to standardize epidemiological reports for incidence and prevalence of disease. Searching within full-text papers can be noisy, and I2E allows search to be specific, and to exclude certain parts of the document from a search, such as the references.

I2E can provide the starting point for efficiently performing evidence based systematic reviews over very large sets of scientific literature, enabling researchers to answer questions around commercial business decisions.

Use case 2: Gaining insights from medical science liaison professionals

Conversations between medical science liaison (MSL) professionals and patients or healthcare professionals (HCPs) can lead to valuable insights. The role of the MSL is to ensure the effective use, and success, of a pharmaceutical company’s drug. MSLs act as the therapy area experts for internal colleagues, and maintain good relationships with external experts, such as leading physicians, to educate and inform on new drugs and therapeutics.

clinical-insights-linguamatics-text-mining-summit
Thierry Breyette, Novo Nordisk, presented at Linguamatics Text Mining Summit 2016 on “Generating actionable insights from real world data”. The figure shows a map of where particular topics are discussed, and what materials are used.

Top pharma company Novo Nordisk uses text mining to gain clinical insights from MSL interactions with HCPs. These interactions may be broad ranging, covering topics such as safety and efficacy, dosing, cost, special populations, indication, comparisons, competitor products, etc. MSLs may use approved slide decks, package inserts (PIs), factsheets, studies or publications to answer HCP questions. Linguamatics’ text mining platform I2E is used to structure these source files with custom ontologies (e.g. for material types, product, disease terminology variation, topics).

This analysis enables Novo Nordisk to better address what support HCPs may need in their interactions with patients, insurance providers, and other clinicians and invest in resource development appropriately.

 

Interested in learning more? Keep exploring:

The post 2 Real World Examples: Using Real World Data for Commercial Pharmaceutical Product Insights appeared first on Copyright Clearance Center.

]]>
http://www.copyright.com/blog/using-real-world-data-commercial-pharmaceutical-insights/feed/ 0
Drug Repurposing, Rare Diseases and Semantic Analytics http://www.copyright.com/blog/drug-repurposing-rare-diseases-semantic-analytics/ http://www.copyright.com/blog/drug-repurposing-rare-diseases-semantic-analytics/#respond Tue, 03 Apr 2018 07:05:09 +0000 http://www.copyright.com/?post_type=blog_post&p=16071 Drug repurposing could help find cures for rare diseases faster, but trawling through research is a time-consuming and resource-heavy task.

The post Drug Repurposing, Rare Diseases and Semantic Analytics appeared first on Copyright Clearance Center.

]]>
Rare diseases affect around 6-7% of the population in the developed world (defined as fewer than 1 in 2,000 in Europe and fewer than 200,000 individuals in the US).

By their very definition of serving a relatively small population of people, the cost of developing brand new drugs for this audience (or orphan drugs) can be prohibitively expensive – yet legislation in the U.S. (FDA Orphan Drug Act, 1983), Japan, Australia and Europe incentivises treatment development.

So what’s a pharmaceutical company to do?  Is there a more cost-effective way to reach cures faster?

Enter drug repurposing

On the surface, drug repurposing promises much – known safety profiles of existing drugs, a reduced development timeline and as a result, a significantly reduced cost to market (we’re talking bringing expenditure down from billions of dollars to millions of dollars here).

There’s still a large amount of research to trawl through, a time-consuming and resource-heavy task.  This is why drug companies are currently focusing on automated literature analysis.

Let’s look at the example of Arteriovenous Malformation (AVM), which has been in the news recently in the UK.  It’s a condition which affects hundreds of thousands of people across the world, causing abnormalities in blood vessels.  These abnormalities can result in dangerous complications and disfigurements.  Now, researchers have identified drugs which could target the underlying cause of the condition.

Take a look at this diagram which simplifies the repurposing pipeline from this piece of research:A simplified repurposing pipeline https://www.jci.org/articles/view/98589

Once these genes were identified, the next step in this particular repurposing study was to screen for drugs that targeted the relevant proteins. In this case there were a number of candidate drugs that were already used in cancer therapy.

Here, the disease in question has been taken as the starting point and faulty genes which have been identified on the RAS/MAPK pathway, which controls cell growth.

A simplified repurposing pipeline part 2

In this case, we see that in the treatment of AVM-BRAF mutant zebrafish with the BRAF inhibitor, Vemurafinib, restored blood flow in AVM.

How could semantic analytics play a part?

Drug repurposing relies on making connections, but as mentioned earlier, this is not easy when you’re faced with millions of documents, all with unstructured text.

Semantic annotation 

Wouldn’t it be helpful if a computer could recognise key scientific information in unstructured text, such as scientific papers?  Of course, the answer is yes, but one of the main hurdles with this approach is getting the computer to do this quickly, whilst being able to process scientific synonyms and ambiguity.

Semantic Search

Building on this is semantic search.  A tool which allows a researcher to find relevant information about their target.  In this case, we’re looking for drugs that inhibit BRAF.  As you can see, the search tool also picks up synonyms, ensuring that you don’t miss out on potentially valuable data.  Contrast this with a conventional search engine, where if you search for “drug”, you’ll get results which mention the word “drug”.  However, with a semantically enriched search engine, the computer knows that this actually means anything which is defined as a drug.

Related Reading: Semantic Search vs. Keyword Search

Extracting associations

Picture5

And the results go beyond just highlighting individual entities, allowing you to extract information about relationships between entities, such as gene-phenotype or drug-target.  Extrapolate this over 28 million Medline abstracts, and you have an incredibly powerful tool.

Building a knowledge network

Image: Lopez-Pajares V et al 2013

These relationships can then be built into networks, providing you with a computer readable framework for searching the data and making new connections.

Labelling the entities in the text with unique identifiers allows you to take this a step further and map to other data systems, connecting related diseases, adverse events, pathways, drug labels.

And of course, this method can be turned on its head to discover new information.  For example, you could compare diseases based on their phenotype profiles.  Once you know that two diseases are strongly related, if there’s a drug which treats one of these conditions, you can hypothesise that you have a potential repurposing candidate on your hands for the other condition. This is a technique we’ve explored before here.

Ready to learn more? Listen to SciBite & CCC’s on-demand webinar: Exploring Drug Repurposing for Rare Diseases Through Semantic Analytics

*Editor’s Note: This blog post was originally published on SciBite’s blog on Feb. 28, 2018. 

The post Drug Repurposing, Rare Diseases and Semantic Analytics appeared first on Copyright Clearance Center.

]]>
http://www.copyright.com/blog/drug-repurposing-rare-diseases-semantic-analytics/feed/ 0
What are Ontologies – And How Are They Built? An Interview with SciBite’s Founder Lee Harland http://www.copyright.com/blog/what-are-ontologies-scibite-lee-harland/ http://www.copyright.com/blog/what-are-ontologies-scibite-lee-harland/#respond Wed, 07 Feb 2018 10:20:36 +0000 http://www.copyright.com/?post_type=blog_post&p=15576 What are ontologies, and how are they created? To answer this question, we spoke with our partners at SciBite in CCC’s Beyond the Book podcast series.

The post What are Ontologies – And How Are They Built? An Interview with SciBite’s Founder Lee Harland appeared first on Copyright Clearance Center.

]]>
Here at CCC, scientific ontologies are hugely important to the semantic search capabilities built into RightFind Insight.

But what exactly are ontologies, and how are they created? To answer this question, we spoke with our partners at SciBite in CCC’s Beyond the Book podcast series.

Listen here to the full interview with SciBite’s founder Lee Harland, or check out our summary below.

Humans In The Age of Big Data

What is an ontology?

Oxford Dictionaries defines an ontology as:

A set of concepts and categories in a subject area or domain that shows their properties and the relations between them.

Essentially, an ontology’s purpose is to properly define something. In the instance of life sciences organizations, ontologies could be created to categorize diseases, drugs, genotypes/phenotypes, mechanisms of action, and other biomedical concepts. By adding a layer of meaning to raw text, it makes a document easier to synthesize and process further.

ontologies-for-semantic-search-example

“For those who do study ontologies, there’s a very famous concept learned in their first year of university: the pizza ontology,” Lee said. “The idea is that pizzas are split up into bases and toppings, and how those relate to each other.  It’s really a conceptualization of a particular domain in a computer-readable format.”

How are ontologies produced?

Ontologies are produced by the scientific community, and funded by both private and public money across the globe.  SciBite is in a unique space, in that the organization is both a consumer and a producer of ontologies.

“The ontologies we work with aren’t the result of one, two, three, four people,” Lee said. “They’re the result of thousands of experts, everyone contributing a tiny little bit of knowledge to an overall coherent map of a particular set of cells, tissues, diseases, etc. The power to be able to leverage that expertise in a computer-readable format is incredible.”

This collaborative process gets to the heart of why organizations are doing this research in the first place.

“I think the power is in the openness, the fact that they are done in the public domain, they are free to use by everybody,” Lee said. “It promotes data interoperability, and the ability to do these experiments.”

How can ontologies be applied to text?

When ontologies are applied to text, the result is a semantically-enriched text document.

Lee breaks down the concept with the example of a hedgehog. If you’re not a scientist in the life sciences realm, you’re likely to think of a hedgehog as a little, spiky animal. But to many scientists, hedgehog is the better-known name of a protein that’s critical in cell division, a major process involved in cancer.

“When you say hedgehog to a life scientist, particularly in molecular biology or human genetics, they’re much more likely to be thinking about the hedgehog gene or protein, and not the loveable animal,” Lee said.  “When you are trying to apply ontologies to text, and you see the word hedgehog, you’ve got to build systems that say right, OK, this could mean one of two things. I’m not going to annotate it as the hedgehog protein unless I really think it is, and similarly I’m not going to annotate as the hedgehog animal unless there’s something that tells me that it is the animal.  That’s disambiguation.”

Today, when organizations like SciBite apply ontologies to text, they’re providing the ability to search through a document, or thousands of documents, to find relevant terms, ultimately enhancing and accelerating the R&D process.

Ready to Learn More? Check out:

The post What are Ontologies – And How Are They Built? An Interview with SciBite’s Founder Lee Harland appeared first on Copyright Clearance Center.

]]>
http://www.copyright.com/blog/what-are-ontologies-scibite-lee-harland/feed/ 0
What R&D and Life Sciences Organizations Need to Know About IDMP http://www.copyright.com/blog/rd-life-sciences-organizations-need-know-idmp/ http://www.copyright.com/blog/rd-life-sciences-organizations-need-know-idmp/#respond Tue, 16 Jan 2018 08:31:17 +0000 http://www.copyright.com/?post_type=blog_post&p=15300 Beset by delays and revisions, IDMP—the set of new international standards for identifying and describing “medicinal products” is nonetheless being rolled out, month by month. Here’s what you need to know, with insights from Paul Milligan, Senior Product Manager at Linguamatics.

The post What R&D and Life Sciences Organizations Need to Know About IDMP appeared first on Copyright Clearance Center.

]]>
Complex doesn’t begin to describe IDMP–the new set of international standards for identifying and describing medicinal products, that is currently being rolled out in Europe in phases, despite a plethora of delays and revisions. Short for “Identification of Medicinal Products,” IDMP is actually meant to make the tracking of medicinal products more streamlined in a global market, by standardizing descriptions of substances including dose forms and units of measurement.

That may sound simple, but many life sciences and R&D organizations don’t have IDMP on their radar, and are not set up to meet these standards. In a 2017 survey of life sciences companies conducted during a Pistoia Alliance webinar:

  • 42% of respondents said they knew very little about IDMP
  • 25% said they had only a basic understanding of the upcoming global regulations

Just as surprising in our high-tech age, 40% said their regulatory and R&D divisions still use “unstructured paper and PDF-based reports” to exchange information on substances.

“The goal of IDMP is to make sure companies have a truly standardized description of their products. But most still store their data in all sorts of formats, files and data bases, which means an organization’s internal description of a product or substances may be very different from what the regulators want,” says Paul Milligan, senior product manager at Linguamatics, a text-mining software company based in Cambridge, England. (CCC and Linguamatics are partners—Linguamatics’ I2E software is integrated with CCC’s RightFind™ XML for Mining.)

The good news is, with foresight and the right systems in place, life sciences and R&D organizations will not only be able to comply with the new standards, but can reap benefits that translate into time saved, problems solved and the potential for more profits down the line.

I talked with Paul Milligan about what IDMP issues should be top of mind:

What are the problems life sciences and R&D organizations face when it comes to complying with IDMP standards?

Paul Milligan: The basic challenge is for companies to get their own internal data into a format that can then be shared with regulators. It’s not that companies haven’t been providing this information—they have. The problem is, the different sources of information necessary to meet the IDMP labeling standards have typically been siloed in different data bases. That creates a challenge in terms of bringing the necessary pieces of information together in a timely fashion that makes sense with a company’s workflow.

Meeting IDMP standards is going to require a big push from pharmaceutical companies, biotechs and other stakeholders to break these silos down and find a systematic way of overcoming the technical barriers. The good news is, once that happens, it will be easier for everyone involved to learn from and explore the data, spotting new patterns, speeding regulatory submissions, and tracking adverse events.

What else do companies need to do, beyond gathering and standardizing the required information?

PM: The whole idea of IDMP is that a broad set of data elements need to be tied in with the product, such as manufacturer, indication, adverse events, along with dosage strength and formulation. On a basic level, that means organizations will be a need to establish a scalable process where it’s easy to tell what information is going in and what is coming out, where they can extract information easily, and where everything is done systematically, so nothing is inadvertently omitted. After that, you have to be able to put the information into context, meaning that if you spot an adverse side effect somewhere, you’ll also want to know the drug that caused it, the dosage, and any information that can give meaning to the adverse event. These are prerequisites.

Is it possible for organizations to do this manually?

PM: It’s possible, but it would take a lot of people and time to gather the information, and there’s more chance of introducing human error.  Pulling out the required IDMP data elements from regulatory text sources can be very time-intensive, and of course it needs to be kept up-to-date with new information. Here’s an example: Let’s say you need to review the literature for any new mentions of adverse events. If you do a standard keyword search, you type in an adverse effect and a drug and then you have to wade through all the documents to find the relationship between these two terms, otherwise you won’t be able to tell if a particular drug is causing the adverse effect.

If you’re using a machine-based approach to information extraction, you can immediately say, “We’ve found this term and it’s being reported as an adverse event caused by this or that drug.” Text mining can be a powerful way to pull out the adverse events in the data without having to read every document—the machine is doing the initial info-grabbing and summarizing—and specific, relevant documents can be read later.

Are there any unlooked-for benefits that could result from the push to satisfy IDMP?

PM: Organizations will be able to identify potential problems with products earlier in the development process. Text mining software, for instance, can rapidly sift through the scientific literature on a particular drug, extracting relevant notes on patients from clinical trials and identifying any adverse events sooner rather than later. That’s going to save organizations time, effort and money so they can focus their attention on what really matters—developing drugs, designing trials and getting the products submitted to regulators.

Organizations don’t want to throw out processes that have been working for them for years. How can new and old systems be easily integrated?

PM: By definition, people who pay attention to regulatory processes are cautious—and no one wants to have to reinvent the wheel to meet these new requirements.

One way for pharmaceutical companies to approach IDMP would be to have their normal team of reviewers looking at data and spotting errors and add in a layer of automation for super fast review cycles.

 

Keep Learning: 

The post What R&D and Life Sciences Organizations Need to Know About IDMP appeared first on Copyright Clearance Center.

]]>
http://www.copyright.com/blog/rd-life-sciences-organizations-need-know-idmp/feed/ 0
Semantic Search vs. Keyword Search http://www.copyright.com/blog/semantic-search-vs-keyword-search/ http://www.copyright.com/blog/semantic-search-vs-keyword-search/#respond Tue, 14 Nov 2017 06:13:25 +0000 http://www.copyright.com/?post_type=blog_post&p=14828 Ever tried searching for medical papers using a standard search engine?  Happy with results you get?  Probably not.  There are serious limitations to using keyword search in the pharmaceutical industry.  Phil Verdemato from SciBite explains how they can be overcome with the power of semantic search.

The post Semantic Search vs. Keyword Search appeared first on Copyright Clearance Center.

]]>
Ever tried searching for medical papers using a standard search engine?  Happy with results you get?  Probably not.  There are serious limitations to using keyword search in the pharmaceutical industry.

Imagine you were looking for papers featuring the enzyme type ‘GSK’. Using a generic search engine, you would get articles that mentioned ‘glycogen synthase kinase’, as well as articles about the company ‘GlaxoSmithKline’ – which is not particularly relevant to your search here.  However, a semantic search engine, powered by scientific vocabularies and a disambiguation system, will just focus on results featuring the protein, giving you context specificity.

If you needed even more accuracy and wanted to find a specific protein such as GSK3, you would be required to do a search for:

glycogen synthase kinase 3 alpha, GSK-3-A, GSK3A, alpha glycogen synthase kinase-3, glycogen synthase kinase-3A…

It’s a pretty long list of synonym derivatives, right?  A good semantic search system on the other hand, does all this for you when it indexes, so that you don’t have to worry when searching.

Transformative Data Integration

Having done this, you are then set up for better downstream data analysis because your conversion from unstructured to structured (typed) data is way more accurate.

You can then connect your enriched, structured data to databases and other systems, giving enhanced data connectivity across the organisation and speeding up analysis.

Group Level Searches

Great semantic search provides taxonomic relationships between its entities, so higher order searches are possible.  Let’s take the example of ‘Viagra’ – whose current use was found as an adverse effect during its trials for pulmonary hypertension.

I’d find a bunch of articles that would mention things like Viagra’s protein target, Phosphodiesterase 5A (PDE5A).  The image below shows how PDE5A and Phosphodiesterase 11A (PDE11A) were found in an article and where they sit in the taxonomy.

scibite-semantic-enrichment-1

We can see that PDE5A sits in an enzyme taxonomy under the wider ‘Phosphodiesterase’ class. I could click on the ‘Phosphodiesterase’ class and get the system to search for anything under it:

scibite-semantic-enrichment-2

You can see how PDE8B and PDE10A were identified in this way.

scibite-semantic-enrichment-3

This becomes incredibly useful, say if you’re interested in finding out which competitors have developed drugs for a target you’re working on.

What you’re looking for is a rich set of taxonomies covering areas such as diseases, drugs, protein classes and so on.

A good semantic search engine will actually embed the concepts (that’s to say, entities such as “PDE5A”, entity classes, e.g. “gene”, or higher level abstractions like “protein class”) within the plain text.  How is this useful?  Well, query time is really quick and extremely accurate, all because you don’t have to do synonym expansion.

In essence, you have far more control over the granularity of your searches than in generic search engines.  You could, for example, search for articles in Medline that mention any Orphan Disease:

scibite-semantic-enrichment-4

That’s hard to do in a generic search engine that doesn’t leverage life science taxonomy data in one step.

Connections

Additionally, you could examine the co-occurrence data to get a feel for the landscape.  In this example, I could look at the indications commonly associated with documents mentioning PDE5A:

scibite-semantic-enrichment-5

 

Here, we quickly see that Erectile Dysfunction and Pulmonary Hypotension are associated with PDE5A – and also how much time this can save when working in drug repurposing.

You could also look at co-occurrences on a sentence level.  Sentence level co-occurrences are stronger indicators of a real association between entities than document level.  Why? Because at a document level you might find entities in keywords section that hold spurious and unrelated terms.

scibite-semantic-enrichment-6

A comprehensive autocomplete index helps guide your searches.  A little bit more in depth than GSK the company or GSK the protein!

Fit-For-Science Search

But you’re not limited to entities and types that have already been curated.  You can build your own vocabularies or use plain text:

scibite-semantic-enrichment-7

Note how Gilenya is searched for, but FTY720 (a synonym of the drug) is correctly identified.  See also how ‘Indication’ is an entity type and how ‘worldwide or global’ is a plain text query to identify documents that mention either word.

Remember that semantically enabled search is as good as the vocabularies it’s built on.  An excellent vocabulary with a huge number of synonyms means that typing in the brand name of a drug also brings up papers associated with its clinical name.

And there you have it – pitted against the depth and breadth that semantic search offers, keyword search simply cannot compete in terms of accuracy, full awareness, or efficiency.  Semantic search allows you to buy back valuable time that would otherwise be spent sifting through huge amounts of documents, and even convert textual data into something you can integrate across your systems, thanks to entity recognition.

 

Ready to learn more? Check out:

The post Semantic Search vs. Keyword Search appeared first on Copyright Clearance Center.

]]>
http://www.copyright.com/blog/semantic-search-vs-keyword-search/feed/ 0
Taking Semantic Search to Full Text [Upcoming Webinar] http://www.copyright.com/blog/taking-semantic-search-full-text-upcoming-webinar/ http://www.copyright.com/blog/taking-semantic-search-full-text-upcoming-webinar/#respond Tue, 31 Oct 2017 07:43:12 +0000 http://www.copyright.com/?post_type=blog_post&p=14681 How will you take your R&D program to the next level in 2018? One way to accelerate your research initiatives… Read more

The post Taking Semantic Search to Full Text [Upcoming Webinar] appeared first on Copyright Clearance Center.

]]>
How will you take your R&D program to the next level in 2018?

One way to accelerate your research initiatives and inform critical business decisions is through the semantic enrichment of full-text articles.

Semantic enrichment describes the process of adding a layer of meaning to raw content. This enhancement of content with information about its meaning adds structure to unstructured information, enabling users to move quickly to more intelligence-rich information activities.

Semantic search can have an immediate impact across your organization, and taking it a step further with full-text scientific literature improves these outcomes by enabling access to more facts and relationships, secondary study findings and adverse event data.

Even though using abstracts seems like a reasonable approach, there are limitations to what can be discovered through that process.  Researchers need access to the full text of the articles to ensure they don’t miss vital data and undiscovered assertions that can lead to new discoveries.

This sounds well and good, but it doesn’t come without its challenges. Semantic enrichment projects can be resource intensive and can take time to demonstrate business value. Plus, obtaining full-text articles in a machine-readable format across multiple publishers can be a struggle.

Join us on 7 November for a Webinar: Taking Semantic Search to Full Text

CCC will be joined by SciBite on Tuesday, 7 November at 9:00 a.m. or 1:00 p.m. EST for a live webinar.

CCC’s Product Manager Mike Iarrobino, alongside SciBite founder Lee Harland, will discuss:

  • Content challenges facing R&D teams in the life sciences
  • Benefits of semantic enrichment of full text content,
  • Solutions that enables you to reduce manual and administrative overhead, while adding value to information discovery and innovation initiatives.

Need some background information before you attend the webinar? Check out:

The post Taking Semantic Search to Full Text [Upcoming Webinar] appeared first on Copyright Clearance Center.

]]>
http://www.copyright.com/blog/taking-semantic-search-full-text-upcoming-webinar/feed/ 0
5 Ways to Apply Semantic Search Across Your Organization http://www.copyright.com/blog/5-ways-apply-semantic-search-across-organization/ http://www.copyright.com/blog/5-ways-apply-semantic-search-across-organization/#respond Tue, 17 Oct 2017 07:04:46 +0000 http://www.copyright.com/?post_type=blog_post&p=14450 Semantic search can have an immediate impact across your organization. Here are five common use cases.

The post 5 Ways to Apply Semantic Search Across Your Organization appeared first on Copyright Clearance Center.

]]>
Information managers must balance the needs of multiple internal constituencies to support information discovery. In R&D-intensive industries such as the life sciences and chemical manufacturing, semantic search can help – delivering value by giving us the ability to turn content into insight.

Semantic enrichment is the enhancement of content with information about its meaning, thereby adding structure to unstructured content. Semantic search builds on enriched content by matching the user’s query intent – not just the keywords they provide – to the relevant content, helping them quickly discover what they need.

The following illustrates how semantic search can have an immediate impact on five common use cases in life sciences and R&D organizations:

Early Phase Research

Researchers can discover interesting potential biomarkers and drug targets they hadn’t known to look for in advance. These initial results can be linked to supporting source content for further review prior to wet lab.

Competitive Intelligence

Competitor patent filings, often intended to hinder discovery, can be explored alongside non-patent literature (NPL) to provide a full picture of competitor strategy, claims, and prior art for patent landscaping or other purposes.

Pharmacovigilance

Literature monitoring for pharmacovigilance can become both more comprehensive and more precise through semantic searches that suggest links between adverse events and pharmacological substances, increasing the efficiency of these vital monitoring workflows.

Read more: Why Text Mining for Pharmacovigliance?

IDMP (Identification of Medicinal Products) Compliance

IDMP initiatives directed by the Food and Drug Administration (FDA) and European Medicines Agency (EMA) aim to standardize how information can be expressed about pharmacological products. Semantically enriched internal and external content can provide a fuller view of medicinal product attributes, supporting IDMP compliance.

Discovery of Chemical Compounds

Researchers can take advantage of well-established chemical ontologies to conduct more efficient semantic search for chemicals, more easily identifying relevant chemical compounds their properties and relationships.

Use Semantic Search to Uncover Scientific Meaning

R&D and information managers routinely use keyword search to find information they need. While keyword search may satisfy the basic needs of researchers there are limitations that can affect productivity and slow the pace of discovery.

Learn here how semantic search can provide you with more comprehensive and relevant search results.

The post 5 Ways to Apply Semantic Search Across Your Organization appeared first on Copyright Clearance Center.

]]>
http://www.copyright.com/blog/5-ways-apply-semantic-search-across-organization/feed/ 0
Understanding Text Mining: 4 Need-to-Know Terms and Their Definitions http://www.copyright.com/blog/understanding-text-mining-4-need-know-terms-definitions/ http://www.copyright.com/blog/understanding-text-mining-4-need-know-terms-definitions/#respond Tue, 03 Oct 2017 06:03:26 +0000 http://www.copyright.com/?post_type=blog_post&p=14353 Text mining offers many benefits, but the technology is complex. Discover the four terms your team needs to know to gain maximum insights.

The post Understanding Text Mining: 4 Need-to-Know Terms and Their Definitions appeared first on Copyright Clearance Center.

]]>
As the use of text mining becomes more widespread, now is the time for information managers to make sure they understand the basics.

Text mining, the process of deriving high-quality information from text materials using software, helps researchers identify patterns or relations between concepts that would otherwise be difficult to discern. The result is faster discovery and smarter decision-making.

Looking for a place to start? Here are four key text mining terms every information manager should know:

XML

Short for Extensible Markup Language, XML is an information exchange standard designed to improve usability, especially when the data is interpreted by software. In other words, it is a more readily machine-readable version of a document. XML tends to be the preferred input method for semantic or text and data mining technology, as well as other processing software.

When acquiring full-text articles, researchers are usually able to access only PDF format, necessitating conversion into XML for text mining, This can be an arduous and error-prone process.

Semantic enrichment

Semantic enrichment describes the process of adding a layer of meaning to raw content. This enhancement of content with information about its meaning thereby adds structure to unstructured information, making the content easier to synthesize and process further. For example, a scientific article can be enriched by adding in-line annotations or tags describing the genotypes/phenotypes, diseases, drugs, mechanisms of action, and other biomedical concepts mentioned within. Semantic enrichment is a key enabler of the various strategic initiatives undertaken by informatics and information management professionals.

White Paper: Semantic Enrichment & The Information Manager

TDM rights

Content is associated with a variety of rights. Information management professionals and librarians will be familiar with copyright licensing, reproduction rights organizations, and other frameworks and organizations that enable content consumers to use, share, and disseminate information while respecting copyright.

As may be expected, there are a number of copyright-sensitive acts that go hand-in-hand with the text and data mining (TDM) process. Content may be copied, stored, annotated or enriched, and otherwise scanned to produce a useable research output. In most cases, commercial TDM rights are not included in standard subscription agreements. Publishers may make a standard or special set of ‘TDM rights’ available as part of their subscription agreements, or as additional incremental rights.

Machine learning

Machine learning can be an approach to synthesize raw or semantically enriched content to yield insights.

Machines can be instructed to process information in many ways. One way is to apply strict rules that attempt to cover every instance that is likely to come up. For instance, one rule might be: when A is the input, B is always the output. But while this is simple in theory and easy for humans to understand, it can be difficult to maintain, scale, and capture value from this process in practice.

Machine learning is another way for machines to process information. In this case, the system is ‘trained’ by way of example, rather than given rules. For example, a system that is meant to classify images into either pictures of humans or pictures of cats would be given a set of images and told they are humans, and another set and told they are cats. From there, the system can move on to classifying other images, with feedback being given continually. It is through this feedback that the system is able to constantly adjust to improve its classification ability and yield greater insights.

Text mining and semantic enrichment are increasingly being used as data processing techniques to enable machine learning programs. Here are a few examples of how machine learning is helping the industry to evolve.

 

Want to learn more? Text mining enables researchers to deliver valuable insights based on relevant data. Find out about XML for Mining and more about what your team needs to know about text mining.

The post Understanding Text Mining: 4 Need-to-Know Terms and Their Definitions appeared first on Copyright Clearance Center.

]]>
http://www.copyright.com/blog/understanding-text-mining-4-need-know-terms-definitions/feed/ 0
Why Text Mining for Pharmacovigilance? http://www.copyright.com/blog/why-text-mining-for-pharmacovigilance/ http://www.copyright.com/blog/why-text-mining-for-pharmacovigilance/#respond Tue, 08 Aug 2017 08:06:01 +0000 http://www.copyright.com/?post_type=blog_post&p=13910 Literature monitoring is a key component of pharmacovigilance – and a special challenge. Here's how text mining can help.

The post Why Text Mining for Pharmacovigilance? appeared first on Copyright Clearance Center.

]]>
Finding relevant information in large volumes of unstructured text using conventional keyword search can be an arduous process.

Pharmacovigilance teams know this well – they are tasked with monitoring the effects of drugs licensed for use. This market, valued at $1 billion in 2015, is predicted to exceed $8 billion by 2024. Literature monitoring is a key component of pharmacovigilance – and a special challenge. Faced with a range of spontaneous reporting systems, time is often wasted on false positives and dead-ends.

Pharmacovigilance challenges

Biomedical literature can be a rich source of the signals that pharmacovigilance teams need to do their work. However, scientific journal articles are not designed with the special needs of these teams in mind – leaving potentially valuable information locked in unstructured research narratives and reducing the recall of literature screening approaches.

The underreporting of adverse drug reactions by healthcare professionals and patients is also a recognized issue.

Patients’ narratives of drugs and their side effects on social media represent an additional data source for postmarketing drug safety surveillance. A 2014 study conducted by Epidemico which examined 6.9 million social media posts discovered 4,401 tweets resembling an ADR.

The industry is also faced with the prospect of negative drug reactions that don’t feature in HCP reports, but do appear in literature.

Related Reading: What a Study of 15 Million Articles Can Teach us About Text Mining

Text mining and pharmacovigilance working together

Machine analysis can help assuage these challenges. The process, which uses natural language processing (NLP) techniques to swiftly analyze huge quantities of text, can transform every stage of the drug development journey. This means algorithms can identify potential adverse drug reactions within a data set at scale, reducing false positives.

Text mining tools can also help teams fine-tune their queries and see an improvement in search strategy management. Keyword-based search strategies can often be convoluted, messy, and overly-specific, frequently including every synonym possible such as brand name, substance name, pre-release name, as well as a whole range of adverse reactions. These searches can be difficult to update and maintain. Text mining or a semantically-enriched approach can help simplify those queries, making them more powerful and the results easier to interpret.

Need a refresher on semantic enrichment? Watch this on-demand webinar featuring SciBite

Looking to the future

A study by Elsevier looked at how pharmacovigilance teams not currently using text mining would like to incorporate it into their workflows. The results show that needs vary: some want to overcome taxonomy and indexing issues, others want to use it to mine multiple sources.

Whatever the objective, the industry is taking a more data-driven approach to pharmacovigilance. But the journey has only just begun. In the years to come we will see more advanced NLP, algorithms and platforms adding pharmacovigilance-friendly value to data. But right now, text mining means less time spent chasing false positives and less risk of missing vital information, which in turn means better patient care.

At CCC we’ve developed integrated solutions that make it simple to license, access, semantically enrich and index full-text XML articles from a wide range of scientific publishers. Learn more about RightFind XML for Mining here.

The post Why Text Mining for Pharmacovigilance? appeared first on Copyright Clearance Center.

]]>
http://www.copyright.com/blog/why-text-mining-for-pharmacovigilance/feed/ 0
What a Study of 15 Million Scientific Articles Can Teach Us About Text Mining http://www.copyright.com/blog/study-15-million-scientific-articles-text-mining/ http://www.copyright.com/blog/study-15-million-scientific-articles-text-mining/#respond Wed, 26 Jul 2017 10:10:09 +0000 http://www.copyright.com/?post_type=blog_post&p=13736 Inside the largest comparative study of text mining abstracts vs. full text articles.

The post What a Study of 15 Million Scientific Articles Can Teach Us About Text Mining appeared first on Copyright Clearance Center.

]]>
Text mining allows for the rapid review and analysis of large volumes of biomedical literature, giving life science companies valuable insights to drive R&D and inform business decisions.

Given the easy accessibility of article abstracts through such databases as MEDLINE, many researchers use this summary information to identify a collection of articles (or “corpus”) for use in text mining rather than taking steps to obtain the full text of the articles. While abstracts provide some valuable pieces of information, there are limitations in using abstracts that can affect the quality of text mining results when compared to the results of mining a corpus of full-text content.

Some in the research community find abstracts to be good enough for their purposes. We’ve heard these defenses for mining abstracts over full-text.:

  • “More text means more room for false positives.”
  • “Abstracts are more easily accessible via biomedical databases.”
  • “We don’t have the time or resources to spend on additional data cleansing and normalization work for unstructured content.”

While there are kernels of truth to each of these challenges, text mining full-text articles over abstracts has significant benefits.  Now, new research from bioinformaticians at the University of Copenhagen and the University of Denmark confirm that vital information goes undiscovered when abstracts are mined rather than full-text articles.

Inside the largest comparative study of text mining abstracts vs. full text articles

The study, released this month on bioRxiv, an online archive and distribution service for unpublished preprints in the life sciences, involved the analysis of more than 15 million full-text scientific documents and their abstracts published between 1823 and 2016. These articles, mainly in PDF format, comprised articles published by Elsevier, Springer, and those in the Open-Access subset of PMC

The team compared their findings from the corpus of full-text articles to the corresponding results from the matching set of abstracts included in MEDLINE.

Here’s a look at some of the report’s main takeaways:

Full text outperformed MEDLINE abstracts in all benchmarked cases

To showcase the potential of text mining full-text articles, the team extracted published protein-protein, disease-gene, and protein subcellular associations using a named entity recognition system.

In every case, the results showed that mining the full-text article corpus outperformed the same analysis using abstracts only.

“Through rigorous benchmarking and comparison of a variety of biologically relevant associations, we have demonstrated that a substantial amount of relevant information is only found in the full body of text,” the report indicates.

This suggestion isn’t the first of its kind. Back in 2010, a study published in the Journal of Biomedical Informatics found that only 8% of the scientific claims made in full-text articles were found in their abstracts. 

The biggest gain in performance when using full text was seen in finding associations between diseases and genes

The main advantages of text mining full-text scientific articles are volume, information diversity and the inclusion of secondary findings. Unsurprisingly, full-text articles contain more named entities and connections between those entities.

In the case of these 15 million scientific articles, the biggest performance gain in mining full-text articles was the associations found between diseases and genes.

A common mineable format would produce higher quality results 

Despite the perceived benefits of mining abstracts mentioned above, bioinformaticians are aware that full-text articles are likely to yield more information and contain more relationships between named entities than abstracts. The problem isn’t lack of text mining awareness; it’s contending with multiple formats and inconsistent licensing terms.

XML is the preferred format used in text mining software. XML is a markup language used to encode documents in a format that is easily read by computers. It is used widely for encoding documents so that computer programs can parse or display the content appropriately.

The study suggests if all articles were available in a structured XML format, it would have “no doubt produced a higher quality corpus.”

In an interview with Science, co-author Lars Juhl Jensen said converting full-text PDF articles into XML formatting is one of the reasons why full-text mining isn’t typically done at scale.

“We probably spent more computational resources teasing the text out of PDFs and beating it into shape than we spent on the actual text mining,” Jensen said.

As information professionals begin to understand the benefits text mining can have across functions – early phase research, pharmacovigilance, IDMP compliance, and more – the desire to find a better way to mine full-text articles will become greater.

At CCC we’ve developed integrated solutions that make it simple to license, access, semantically enrich and index full-text XML articles from a wide range of scientific publishers. Learn more about RightFind XML for Mining here.

The post What a Study of 15 Million Scientific Articles Can Teach Us About Text Mining appeared first on Copyright Clearance Center.

]]>
http://www.copyright.com/blog/study-15-million-scientific-articles-text-mining/feed/ 0