Search and Discovery – Copyright Clearance Center Rights Licensing Expert Tue, 25 Sep 2018 08:34:40 +0000 en-US hourly 1 Search and Discovery – Copyright Clearance Center 32 32 Scientific Search: 5 Key Concepts You Need to Know Tue, 18 Sep 2018 09:02:26 +0000 How can researchers cope with the deluge of data at their disposal and search more efficiently? Here’s a look at several key scientific search concepts.

The post Scientific Search: 5 Key Concepts You Need to Know appeared first on Copyright Clearance Center.

Think about the questions you type into Google. Chances are, you’re looking for instant answers to simple questions. You might look through a few different pages of results to confirm findings, but if the search engine has done its job correctly, you only need one result to be satisfied.

Now think about the types of questions researchers attempt to answer when they use search engines. The experience is far more complex than a simple query with an instant answer.

When a researcher tasked with understanding all the genes that are involved in a disease or pathway, or all the compounds that inhibit a target, or all the different ways that patients talk about a drug on the marketplace, a casual scan of top results isn’t good enough. A comprehensive, systematic view of all the information that’s out there is the only way to make an accurate claim.

So how can researchers cope with the deluge of data at their disposal and search more efficiently? Here’s a look at several key scientific search concepts:

Aggregated Search

Aggregated search is designed to bring together multiple, unlike information sources. There could be structured or semi-structured data – such as feeds or APIs that provide company-, drug-, or clinical trial-related information.

Aggregated search presents multiple information types to end users, enabling them to explore different types of content as well as visualizations, analytics, or extracted information. These act as signposts for users, helping them to explore the information and direct themselves to the most appropriate resources for their question.

Here’s an example:

Google provides relevant examples of this from a consumer search perspective, displaying location and commercial information alongside summary information boxes and in context of the traditional list of web links, while also allowing access to specific media types such as images and videos.

Personalized Search

Persanalization is about tailoring the user experience by leveraging signals collected through user interaction with a system. More specifically, personalized search is about tailoring the search experience to the user by considering the user’s context in addition to the submitted query.

This can be accomplished through explicit data knowingly provided by the user or administrators, such as user profiles that include topics of interest or areas of specialty, or through implicit signals the user provides as they go about retrieving information – such as submitting queries, filtering, and clicking on results.

The goal of personalized search is to help users find what they need faster.

Contextualized Search

Contextualized search is similar but broader in scope to personalized search.

Contextualization means that the system considers the context of an interaction – such as organization, location, and information about the user – to improve the quality of the system’s output – such as a set of search results, or overall user experience.

Enterprise Search

This is a search across enterprise information, contrasted with – for example – web search.

Federated Search

Federated search technology has a long history. It is an approach to integrating information sources for information retrieval that relies on the system to take the user’s query and submit it to various underlying data sources. The federated search system then compiles the results from the different sources and presents them to the user in a single, unified relevance sorting.

One problem with federated search is that it presumes the underlying data is largely alike – such as being all text data, for example. This means that many rich sources of information and insights for R&D users – such as semi-structured drug pipeline data, competitive intelligence information, and other content – may not be included or effectively integrated in such systems.

A second problem is that the unified relevance sorting approach presents information all together. This may inhibit the user’s ability to explore different information types or get direct answers to questions.

The Future of Search

For R&D teams, the ability to seek (and more importantly, find) information is central to success. Regardless of information being internal or external, structured or unstructured, information management and informatics professionals need to work toward the goal of removing information roadblocks, and creating a clear path to the content they seek.


Ready to learn more? Check out:

Wondering how R&D teams use RightFind to search, access, share and collaborate on copyrighted materials? Contact us for more information.

The post Scientific Search: 5 Key Concepts You Need to Know appeared first on Copyright Clearance Center.

]]> 0
Join CCC at #BioIT18 – Bio-IT World Conference & Expo in Boston Tue, 01 May 2018 07:43:32 +0000 Join CCC for a special presentation at Bio-IT World in Boston on May 16 - we'll be discussing ontology learning and personalization.

The post Join CCC at #BioIT18 – Bio-IT World Conference & Expo in Boston appeared first on Copyright Clearance Center.

Copyright Clearance Center (CCC) will be among 3,400+ life science, pharmaceutical, clinical, healthcare and IT professionals from over 35 countries at the Bio-IT World Conference & Expo ’18 on May 15-17 at the Seaport World Trade Center in Boston, MA. We invite you to visit CCC (booth #301) to talk about your team’s data and information integration challenges and how CCC solutions can help.

This year’s conference features over 280 technology and scientific presentations covering big data, smart data, cloud computing, trends in IT infrastructure, omics technologies, high-performance computing, data analytics, open source and precision medicine, from the research realm to the clinical arena.

Join Anna Lyubetskaya (Data Scientist, Engineering, CCC) on Wednesday, May 16 at 5:00 p.m. (Track 6: Bioinformatics) for her presentation, Ontology Learning and Personalization.

In this talk, Anna will discuss a framework that enables learning of ontologies in a semi-supervised manner through the best machine learning and distributed computing approaches. We discuss issues inherent to data science: input data filtering and enrichment, robust iterative learning, cross-validation, rapid prototyping, and transition between prototyping and production.

Bio-IT World ’18 conference hours:

  • Tuesday, May 15 (5 – 7 p.m.)
  • Wednesday, May 16 (9:45 – 6:30 p.m.)
  • Thursday, May 17 (9:45 – 1:55 p.m.)

Be sure to stop by the CCC booth (#301) and say hello.

We’ll be exhibiting throughout the conference, and we’re excited to talk to you about your data and information integration challenges and how solutions from CCC can help accelerate your most challenging initiatives to optimize all phases of drug discovery, development and commercialization.

Not attending the conference? Follow all the action using hashtag #BioIT18 and connect with Bio-IT World (@bioitworld) and CCC (@copyrightclear) on Twitter for up-to-the-minute dispatches from the conference.

The post Join CCC at #BioIT18 – Bio-IT World Conference & Expo in Boston appeared first on Copyright Clearance Center.

]]> 0
2 Real World Examples: Using Real World Data for Commercial Pharmaceutical Product Insights Tue, 17 Apr 2018 07:58:15 +0000 Here is a look at two pharmaceutical use cases where text mining has transformed real world data into real world evidence.

The post 2 Real World Examples: Using Real World Data for Commercial Pharmaceutical Product Insights appeared first on Copyright Clearance Center.

When people think about real world evidence, they generally think about using this data to address questions around drug effectiveness, or population level safety effects. But there are many applications that “real world data” can address.

If you think of real world data as any type of information gathered about drugs in non-trial settings, a whole world of possibilities opens.

  • Social media data can be used to understand how well packaging and formulations are working.
  • Customer call feeds can be analyzed for trends in drug switching, off-label use, or contra-indicated medications among concomitant drugs.
  • Full-text literature can be mined for information about epidemiology, disease prevalence, and more.

Text Mining transforms real world data to real world evidence

Many of these real world sources have free text fields, and this is where text analytics, and natural language processing (NLP), can fit in. At Linguamatics, organizations use text analytics to get actionable insight from real world data – and find valuable intelligence that can inform commercial business strategies.

Here is a look at two use cases where text mining has transformed real world data to real world evidence.

Related Reading: Pharma Turns to Real World Evidence to Overcome the Odds

Use case 1: Evidence landscape from literature for drug economics

Understanding the potential for market access is essential for all pharma companies, and information to characterize the burden of disease and local standard of care in different countries across the globe is critical for any new drug launch. Companies need an assessment of the landscape of epidemiological data, health economics and outcomes information to inform the optimal commercial strategy.

Valuable data is published every month in scientific journals, abstracts, and conferences. One of Linguamatics’ Top 10 pharma customers decided to utilize text mining to extract, normalize, and visualize these data. They then used this structured data to generate a comprehensive understanding of the available evidence, thus establishing the market “gaps” they could address. Focusing on a particular therapeutic area of immunological diseases, the organization was able to develop precise searches with increased recall across these different data sources, including full-text literature.

Linguamatics I2E enables the use of ontologies to improve disease coverage, and to incorporate domain knowledge to increase the identification of particular geographical regions (for example, enabling the use of the adjectival form of the country, e.g. French as well as France, and cities, e.g. Paris, Toulouse). I2E also extracts and normalizes numbers, which is useful to standardize epidemiological reports for incidence and prevalence of disease. Searching within full-text papers can be noisy, and I2E allows search to be specific, and to exclude certain parts of the document from a search, such as the references.

I2E can provide the starting point for efficiently performing evidence based systematic reviews over very large sets of scientific literature, enabling researchers to answer questions around commercial business decisions.

Use case 2: Gaining insights from medical science liaison professionals

Conversations between medical science liaison (MSL) professionals and patients or healthcare professionals (HCPs) can lead to valuable insights. The role of the MSL is to ensure the effective use, and success, of a pharmaceutical company’s drug. MSLs act as the therapy area experts for internal colleagues, and maintain good relationships with external experts, such as leading physicians, to educate and inform on new drugs and therapeutics.

Thierry Breyette, Novo Nordisk, presented at Linguamatics Text Mining Summit 2016 on “Generating actionable insights from real world data”. The figure shows a map of where particular topics are discussed, and what materials are used.

Top pharma company Novo Nordisk uses text mining to gain clinical insights from MSL interactions with HCPs. These interactions may be broad ranging, covering topics such as safety and efficacy, dosing, cost, special populations, indication, comparisons, competitor products, etc. MSLs may use approved slide decks, package inserts (PIs), factsheets, studies or publications to answer HCP questions. Linguamatics’ text mining platform I2E is used to structure these source files with custom ontologies (e.g. for material types, product, disease terminology variation, topics).

This analysis enables Novo Nordisk to better address what support HCPs may need in their interactions with patients, insurance providers, and other clinicians and invest in resource development appropriately.


Interested in learning more? Keep exploring:

The post 2 Real World Examples: Using Real World Data for Commercial Pharmaceutical Product Insights appeared first on Copyright Clearance Center.

]]> 0
Drug Repurposing, Rare Diseases and Semantic Analytics Tue, 03 Apr 2018 07:05:09 +0000 Drug repurposing could help find cures for rare diseases faster, but trawling through research is a time-consuming and resource-heavy task.

The post Drug Repurposing, Rare Diseases and Semantic Analytics appeared first on Copyright Clearance Center.

Rare diseases affect around 6-7% of the population in the developed world (defined as fewer than 1 in 2,000 in Europe and fewer than 200,000 individuals in the US).

By their very definition of serving a relatively small population of people, the cost of developing brand new drugs for this audience (or orphan drugs) can be prohibitively expensive – yet legislation in the U.S. (FDA Orphan Drug Act, 1983), Japan, Australia and Europe incentivises treatment development.

So what’s a pharmaceutical company to do?  Is there a more cost-effective way to reach cures faster?

Enter drug repurposing

On the surface, drug repurposing promises much – known safety profiles of existing drugs, a reduced development timeline and as a result, a significantly reduced cost to market (we’re talking bringing expenditure down from billions of dollars to millions of dollars here).

There’s still a large amount of research to trawl through, a time-consuming and resource-heavy task.  This is why drug companies are currently focusing on automated literature analysis.

Let’s look at the example of Arteriovenous Malformation (AVM), which has been in the news recently in the UK.  It’s a condition which affects hundreds of thousands of people across the world, causing abnormalities in blood vessels.  These abnormalities can result in dangerous complications and disfigurements.  Now, researchers have identified drugs which could target the underlying cause of the condition.

Take a look at this diagram which simplifies the repurposing pipeline from this piece of research:A simplified repurposing pipeline

Once these genes were identified, the next step in this particular repurposing study was to screen for drugs that targeted the relevant proteins. In this case there were a number of candidate drugs that were already used in cancer therapy.

Here, the disease in question has been taken as the starting point and faulty genes which have been identified on the RAS/MAPK pathway, which controls cell growth.

A simplified repurposing pipeline part 2

In this case, we see that in the treatment of AVM-BRAF mutant zebrafish with the BRAF inhibitor, Vemurafinib, restored blood flow in AVM.

How could semantic analytics play a part?

Drug repurposing relies on making connections, but as mentioned earlier, this is not easy when you’re faced with millions of documents, all with unstructured text.

Semantic annotation 

Wouldn’t it be helpful if a computer could recognise key scientific information in unstructured text, such as scientific papers?  Of course, the answer is yes, but one of the main hurdles with this approach is getting the computer to do this quickly, whilst being able to process scientific synonyms and ambiguity.

Semantic Search

Building on this is semantic search.  A tool which allows a researcher to find relevant information about their target.  In this case, we’re looking for drugs that inhibit BRAF.  As you can see, the search tool also picks up synonyms, ensuring that you don’t miss out on potentially valuable data.  Contrast this with a conventional search engine, where if you search for “drug”, you’ll get results which mention the word “drug”.  However, with a semantically enriched search engine, the computer knows that this actually means anything which is defined as a drug.

Related Reading: Semantic Search vs. Keyword Search

Extracting associations


And the results go beyond just highlighting individual entities, allowing you to extract information about relationships between entities, such as gene-phenotype or drug-target.  Extrapolate this over 28 million Medline abstracts, and you have an incredibly powerful tool.

Building a knowledge network

Image: Lopez-Pajares V et al 2013

These relationships can then be built into networks, providing you with a computer readable framework for searching the data and making new connections.

Labelling the entities in the text with unique identifiers allows you to take this a step further and map to other data systems, connecting related diseases, adverse events, pathways, drug labels.

And of course, this method can be turned on its head to discover new information.  For example, you could compare diseases based on their phenotype profiles.  Once you know that two diseases are strongly related, if there’s a drug which treats one of these conditions, you can hypothesise that you have a potential repurposing candidate on your hands for the other condition. This is a technique we’ve explored before here.

Ready to learn more? Listen to SciBite & CCC’s on-demand webinar: Exploring Drug Repurposing for Rare Diseases Through Semantic Analytics

*Editor’s Note: This blog post was originally published on SciBite’s blog on Feb. 28, 2018. 

The post Drug Repurposing, Rare Diseases and Semantic Analytics appeared first on Copyright Clearance Center.

]]> 0
What are Ontologies – And How Are They Built? An Interview with SciBite’s Founder Lee Harland Wed, 07 Feb 2018 10:20:36 +0000 What are ontologies, and how are they created? To answer this question, we spoke with our partners at SciBite in CCC’s Beyond the Book podcast series.

The post What are Ontologies – And How Are They Built? An Interview with SciBite’s Founder Lee Harland appeared first on Copyright Clearance Center.

Here at CCC, scientific ontologies are hugely important to the semantic search capabilities built into RightFind Insight.

But what exactly are ontologies, and how are they created? To answer this question, we spoke with our partners at SciBite in CCC’s Beyond the Book podcast series.

Listen here to the full interview with SciBite’s founder Lee Harland, or check out our summary below.

Humans In The Age of Big Data

What is an ontology?

Oxford Dictionaries defines an ontology as:

A set of concepts and categories in a subject area or domain that shows their properties and the relations between them.

Essentially, an ontology’s purpose is to properly define something. In the instance of life sciences organizations, ontologies could be created to categorize diseases, drugs, genotypes/phenotypes, mechanisms of action, and other biomedical concepts. By adding a layer of meaning to raw text, it makes a document easier to synthesize and process further.


“For those who do study ontologies, there’s a very famous concept learned in their first year of university: the pizza ontology,” Lee said. “The idea is that pizzas are split up into bases and toppings, and how those relate to each other.  It’s really a conceptualization of a particular domain in a computer-readable format.”

How are ontologies produced?

Ontologies are produced by the scientific community, and funded by both private and public money across the globe.  SciBite is in a unique space, in that the organization is both a consumer and a producer of ontologies.

“The ontologies we work with aren’t the result of one, two, three, four people,” Lee said. “They’re the result of thousands of experts, everyone contributing a tiny little bit of knowledge to an overall coherent map of a particular set of cells, tissues, diseases, etc. The power to be able to leverage that expertise in a computer-readable format is incredible.”

This collaborative process gets to the heart of why organizations are doing this research in the first place.

“I think the power is in the openness, the fact that they are done in the public domain, they are free to use by everybody,” Lee said. “It promotes data interoperability, and the ability to do these experiments.”

How can ontologies be applied to text?

When ontologies are applied to text, the result is a semantically-enriched text document.

Lee breaks down the concept with the example of a hedgehog. If you’re not a scientist in the life sciences realm, you’re likely to think of a hedgehog as a little, spiky animal. But to many scientists, hedgehog is the better-known name of a protein that’s critical in cell division, a major process involved in cancer.

“When you say hedgehog to a life scientist, particularly in molecular biology or human genetics, they’re much more likely to be thinking about the hedgehog gene or protein, and not the loveable animal,” Lee said.  “When you are trying to apply ontologies to text, and you see the word hedgehog, you’ve got to build systems that say right, OK, this could mean one of two things. I’m not going to annotate it as the hedgehog protein unless I really think it is, and similarly I’m not going to annotate as the hedgehog animal unless there’s something that tells me that it is the animal.  That’s disambiguation.”

Today, when organizations like SciBite apply ontologies to text, they’re providing the ability to search through a document, or thousands of documents, to find relevant terms, ultimately enhancing and accelerating the R&D process.

Ready to Learn More? Check out:

The post What are Ontologies – And How Are They Built? An Interview with SciBite’s Founder Lee Harland appeared first on Copyright Clearance Center.

]]> 0
What R&D and Life Sciences Organizations Need to Know About IDMP Tue, 16 Jan 2018 08:31:17 +0000 Beset by delays and revisions, IDMP—the set of new international standards for identifying and describing “medicinal products” is nonetheless being rolled out, month by month. Here’s what you need to know, with insights from Paul Milligan, Senior Product Manager at Linguamatics.

The post What R&D and Life Sciences Organizations Need to Know About IDMP appeared first on Copyright Clearance Center.

Complex doesn’t begin to describe IDMP–the new set of international standards for identifying and describing medicinal products, that is currently being rolled out in Europe in phases, despite a plethora of delays and revisions. Short for “Identification of Medicinal Products,” IDMP is actually meant to make the tracking of medicinal products more streamlined in a global market, by standardizing descriptions of substances including dose forms and units of measurement.

That may sound simple, but many life sciences and R&D organizations don’t have IDMP on their radar, and are not set up to meet these standards. In a 2017 survey of life sciences companies conducted during a Pistoia Alliance webinar:

  • 42% of respondents said they knew very little about IDMP
  • 25% said they had only a basic understanding of the upcoming global regulations

Just as surprising in our high-tech age, 40% said their regulatory and R&D divisions still use “unstructured paper and PDF-based reports” to exchange information on substances.

“The goal of IDMP is to make sure companies have a truly standardized description of their products. But most still store their data in all sorts of formats, files and data bases, which means an organization’s internal description of a product or substances may be very different from what the regulators want,” says Paul Milligan, senior product manager at Linguamatics, a text-mining software company based in Cambridge, England. (CCC and Linguamatics are partners—Linguamatics’ I2E software is integrated with CCC’s RightFind™ XML for Mining.)

The good news is, with foresight and the right systems in place, life sciences and R&D organizations will not only be able to comply with the new standards, but can reap benefits that translate into time saved, problems solved and the potential for more profits down the line.

I talked with Paul Milligan about what IDMP issues should be top of mind:

What are the problems life sciences and R&D organizations face when it comes to complying with IDMP standards?

Paul Milligan: The basic challenge is for companies to get their own internal data into a format that can then be shared with regulators. It’s not that companies haven’t been providing this information—they have. The problem is, the different sources of information necessary to meet the IDMP labeling standards have typically been siloed in different data bases. That creates a challenge in terms of bringing the necessary pieces of information together in a timely fashion that makes sense with a company’s workflow.

Meeting IDMP standards is going to require a big push from pharmaceutical companies, biotechs and other stakeholders to break these silos down and find a systematic way of overcoming the technical barriers. The good news is, once that happens, it will be easier for everyone involved to learn from and explore the data, spotting new patterns, speeding regulatory submissions, and tracking adverse events.

What else do companies need to do, beyond gathering and standardizing the required information?

PM: The whole idea of IDMP is that a broad set of data elements need to be tied in with the product, such as manufacturer, indication, adverse events, along with dosage strength and formulation. On a basic level, that means organizations will be a need to establish a scalable process where it’s easy to tell what information is going in and what is coming out, where they can extract information easily, and where everything is done systematically, so nothing is inadvertently omitted. After that, you have to be able to put the information into context, meaning that if you spot an adverse side effect somewhere, you’ll also want to know the drug that caused it, the dosage, and any information that can give meaning to the adverse event. These are prerequisites.

Is it possible for organizations to do this manually?

PM: It’s possible, but it would take a lot of people and time to gather the information, and there’s more chance of introducing human error.  Pulling out the required IDMP data elements from regulatory text sources can be very time-intensive, and of course it needs to be kept up-to-date with new information. Here’s an example: Let’s say you need to review the literature for any new mentions of adverse events. If you do a standard keyword search, you type in an adverse effect and a drug and then you have to wade through all the documents to find the relationship between these two terms, otherwise you won’t be able to tell if a particular drug is causing the adverse effect.

If you’re using a machine-based approach to information extraction, you can immediately say, “We’ve found this term and it’s being reported as an adverse event caused by this or that drug.” Text mining can be a powerful way to pull out the adverse events in the data without having to read every document—the machine is doing the initial info-grabbing and summarizing—and specific, relevant documents can be read later.

Are there any unlooked-for benefits that could result from the push to satisfy IDMP?

PM: Organizations will be able to identify potential problems with products earlier in the development process. Text mining software, for instance, can rapidly sift through the scientific literature on a particular drug, extracting relevant notes on patients from clinical trials and identifying any adverse events sooner rather than later. That’s going to save organizations time, effort and money so they can focus their attention on what really matters—developing drugs, designing trials and getting the products submitted to regulators.

Organizations don’t want to throw out processes that have been working for them for years. How can new and old systems be easily integrated?

PM: By definition, people who pay attention to regulatory processes are cautious—and no one wants to have to reinvent the wheel to meet these new requirements.

One way for pharmaceutical companies to approach IDMP would be to have their normal team of reviewers looking at data and spotting errors and add in a layer of automation for super fast review cycles.


Keep Learning: 

The post What R&D and Life Sciences Organizations Need to Know About IDMP appeared first on Copyright Clearance Center.

]]> 0
Semantic Search vs. Keyword Search Tue, 14 Nov 2017 06:13:25 +0000 Ever tried searching for medical papers using a standard search engine?  Happy with results you get?  Probably not.  There are serious limitations to using keyword search in the pharmaceutical industry.  Phil Verdemato from SciBite explains how they can be overcome with the power of semantic search.

The post Semantic Search vs. Keyword Search appeared first on Copyright Clearance Center.

Ever tried searching for medical papers using a standard search engine?  Happy with results you get?  Probably not.  There are serious limitations to using keyword search in the pharmaceutical industry.

Imagine you were looking for papers featuring the enzyme type ‘GSK’. Using a generic search engine, you would get articles that mentioned ‘glycogen synthase kinase’, as well as articles about the company ‘GlaxoSmithKline’ – which is not particularly relevant to your search here.  However, a semantic search engine, powered by scientific vocabularies and a disambiguation system, will just focus on results featuring the protein, giving you context specificity.

If you needed even more accuracy and wanted to find a specific protein such as GSK3, you would be required to do a search for:

glycogen synthase kinase 3 alpha, GSK-3-A, GSK3A, alpha glycogen synthase kinase-3, glycogen synthase kinase-3A…

It’s a pretty long list of synonym derivatives, right?  A good semantic search system on the other hand, does all this for you when it indexes, so that you don’t have to worry when searching.

Transformative Data Integration

Having done this, you are then set up for better downstream data analysis because your conversion from unstructured to structured (typed) data is way more accurate.

You can then connect your enriched, structured data to databases and other systems, giving enhanced data connectivity across the organisation and speeding up analysis.

Group Level Searches

Great semantic search provides taxonomic relationships between its entities, so higher order searches are possible.  Let’s take the example of ‘Viagra’ – whose current use was found as an adverse effect during its trials for pulmonary hypertension.

I’d find a bunch of articles that would mention things like Viagra’s protein target, Phosphodiesterase 5A (PDE5A).  The image below shows how PDE5A and Phosphodiesterase 11A (PDE11A) were found in an article and where they sit in the taxonomy.


We can see that PDE5A sits in an enzyme taxonomy under the wider ‘Phosphodiesterase’ class. I could click on the ‘Phosphodiesterase’ class and get the system to search for anything under it:


You can see how PDE8B and PDE10A were identified in this way.


This becomes incredibly useful, say if you’re interested in finding out which competitors have developed drugs for a target you’re working on.

What you’re looking for is a rich set of taxonomies covering areas such as diseases, drugs, protein classes and so on.

A good semantic search engine will actually embed the concepts (that’s to say, entities such as “PDE5A”, entity classes, e.g. “gene”, or higher level abstractions like “protein class”) within the plain text.  How is this useful?  Well, query time is really quick and extremely accurate, all because you don’t have to do synonym expansion.

In essence, you have far more control over the granularity of your searches than in generic search engines.  You could, for example, search for articles in Medline that mention any Orphan Disease:


That’s hard to do in a generic search engine that doesn’t leverage life science taxonomy data in one step.


Additionally, you could examine the co-occurrence data to get a feel for the landscape.  In this example, I could look at the indications commonly associated with documents mentioning PDE5A:



Here, we quickly see that Erectile Dysfunction and Pulmonary Hypotension are associated with PDE5A – and also how much time this can save when working in drug repurposing.

You could also look at co-occurrences on a sentence level.  Sentence level co-occurrences are stronger indicators of a real association between entities than document level.  Why? Because at a document level you might find entities in keywords section that hold spurious and unrelated terms.


A comprehensive autocomplete index helps guide your searches.  A little bit more in depth than GSK the company or GSK the protein!

Fit-For-Science Search

But you’re not limited to entities and types that have already been curated.  You can build your own vocabularies or use plain text:


Note how Gilenya is searched for, but FTY720 (a synonym of the drug) is correctly identified.  See also how ‘Indication’ is an entity type and how ‘worldwide or global’ is a plain text query to identify documents that mention either word.

Remember that semantically enabled search is as good as the vocabularies it’s built on.  An excellent vocabulary with a huge number of synonyms means that typing in the brand name of a drug also brings up papers associated with its clinical name.

And there you have it – pitted against the depth and breadth that semantic search offers, keyword search simply cannot compete in terms of accuracy, full awareness, or efficiency.  Semantic search allows you to buy back valuable time that would otherwise be spent sifting through huge amounts of documents, and even convert textual data into something you can integrate across your systems, thanks to entity recognition.


Ready to learn more? Check out:

The post Semantic Search vs. Keyword Search appeared first on Copyright Clearance Center.

]]> 0
Taking Semantic Search to Full Text [Upcoming Webinar] Tue, 31 Oct 2017 07:43:12 +0000 How will you take your R&D program to the next level in 2018? One way to accelerate your research initiatives… Read more

The post Taking Semantic Search to Full Text [Upcoming Webinar] appeared first on Copyright Clearance Center.

How will you take your R&D program to the next level in 2018?

One way to accelerate your research initiatives and inform critical business decisions is through the semantic enrichment of full-text articles.

Semantic enrichment describes the process of adding a layer of meaning to raw content. This enhancement of content with information about its meaning adds structure to unstructured information, enabling users to move quickly to more intelligence-rich information activities.

Semantic search can have an immediate impact across your organization, and taking it a step further with full-text scientific literature improves these outcomes by enabling access to more facts and relationships, secondary study findings and adverse event data.

Even though using abstracts seems like a reasonable approach, there are limitations to what can be discovered through that process.  Researchers need access to the full text of the articles to ensure they don’t miss vital data and undiscovered assertions that can lead to new discoveries.

This sounds well and good, but it doesn’t come without its challenges. Semantic enrichment projects can be resource intensive and can take time to demonstrate business value. Plus, obtaining full-text articles in a machine-readable format across multiple publishers can be a struggle.

Join us on 7 November for a Webinar: Taking Semantic Search to Full Text

CCC will be joined by SciBite on Tuesday, 7 November at 9:00 a.m. or 1:00 p.m. EST for a live webinar.

CCC’s Product Manager Mike Iarrobino, alongside SciBite founder Lee Harland, will discuss:

  • Content challenges facing R&D teams in the life sciences
  • Benefits of semantic enrichment of full text content,
  • Solutions that enables you to reduce manual and administrative overhead, while adding value to information discovery and innovation initiatives.

Need some background information before you attend the webinar? Check out:

The post Taking Semantic Search to Full Text [Upcoming Webinar] appeared first on Copyright Clearance Center.

]]> 0
5 Ways to Apply Semantic Search Across Your Organization Tue, 17 Oct 2017 07:04:46 +0000 Semantic search can have an immediate impact across your organization. Here are five common use cases.

The post 5 Ways to Apply Semantic Search Across Your Organization appeared first on Copyright Clearance Center.

Information managers must balance the needs of multiple internal constituencies to support information discovery. In R&D-intensive industries such as the life sciences and chemical manufacturing, semantic search can help – delivering value by giving us the ability to turn content into insight.

Semantic enrichment is the enhancement of content with information about its meaning, thereby adding structure to unstructured content. Semantic search builds on enriched content by matching the user’s query intent – not just the keywords they provide – to the relevant content, helping them quickly discover what they need.

The following illustrates how semantic search can have an immediate impact on five common use cases in life sciences and R&D organizations:

Early Phase Research

Researchers can discover interesting potential biomarkers and drug targets they hadn’t known to look for in advance. These initial results can be linked to supporting source content for further review prior to wet lab.

Competitive Intelligence

Competitor patent filings, often intended to hinder discovery, can be explored alongside non-patent literature (NPL) to provide a full picture of competitor strategy, claims, and prior art for patent landscaping or other purposes.


Literature monitoring for pharmacovigilance can become both more comprehensive and more precise through semantic searches that suggest links between adverse events and pharmacological substances, increasing the efficiency of these vital monitoring workflows.

Read more: Why Text Mining for Pharmacovigliance?

IDMP (Identification of Medicinal Products) Compliance

IDMP initiatives directed by the Food and Drug Administration (FDA) and European Medicines Agency (EMA) aim to standardize how information can be expressed about pharmacological products. Semantically enriched internal and external content can provide a fuller view of medicinal product attributes, supporting IDMP compliance.

Discovery of Chemical Compounds

Researchers can take advantage of well-established chemical ontologies to conduct more efficient semantic search for chemicals, more easily identifying relevant chemical compounds their properties and relationships.

Use Semantic Search to Uncover Scientific Meaning

R&D and information managers routinely use keyword search to find information they need. While keyword search may satisfy the basic needs of researchers there are limitations that can affect productivity and slow the pace of discovery.

Learn here how semantic search can provide you with more comprehensive and relevant search results.

The post 5 Ways to Apply Semantic Search Across Your Organization appeared first on Copyright Clearance Center.

]]> 0
Understanding Text Mining: 4 Need-to-Know Terms and Their Definitions Tue, 03 Oct 2017 06:03:26 +0000 Text mining offers many benefits, but the technology is complex. Discover the four terms your team needs to know to gain maximum insights.

The post Understanding Text Mining: 4 Need-to-Know Terms and Their Definitions appeared first on Copyright Clearance Center.

As the use of text mining becomes more widespread, now is the time for information managers to make sure they understand the basics.

Text mining, the process of deriving high-quality information from text materials using software, helps researchers identify patterns or relations between concepts that would otherwise be difficult to discern. The result is faster discovery and smarter decision-making.

Looking for a place to start? Here are four key text mining terms every information manager should know:


Short for Extensible Markup Language, XML is an information exchange standard designed to improve usability, especially when the data is interpreted by software. In other words, it is a more readily machine-readable version of a document. XML tends to be the preferred input method for semantic or text and data mining technology, as well as other processing software.

When acquiring full-text articles, researchers are usually able to access only PDF format, necessitating conversion into XML for text mining, This can be an arduous and error-prone process.

Semantic enrichment

Semantic enrichment describes the process of adding a layer of meaning to raw content. This enhancement of content with information about its meaning thereby adds structure to unstructured information, making the content easier to synthesize and process further. For example, a scientific article can be enriched by adding in-line annotations or tags describing the genotypes/phenotypes, diseases, drugs, mechanisms of action, and other biomedical concepts mentioned within. Semantic enrichment is a key enabler of the various strategic initiatives undertaken by informatics and information management professionals.

White Paper: Semantic Enrichment & The Information Manager

TDM rights

Content is associated with a variety of rights. Information management professionals and librarians will be familiar with copyright licensing, reproduction rights organizations, and other frameworks and organizations that enable content consumers to use, share, and disseminate information while respecting copyright.

As may be expected, there are a number of copyright-sensitive acts that go hand-in-hand with the text and data mining (TDM) process. Content may be copied, stored, annotated or enriched, and otherwise scanned to produce a useable research output. In most cases, commercial TDM rights are not included in standard subscription agreements. Publishers may make a standard or special set of ‘TDM rights’ available as part of their subscription agreements, or as additional incremental rights.

Machine learning

Machine learning can be an approach to synthesize raw or semantically enriched content to yield insights.

Machines can be instructed to process information in many ways. One way is to apply strict rules that attempt to cover every instance that is likely to come up. For instance, one rule might be: when A is the input, B is always the output. But while this is simple in theory and easy for humans to understand, it can be difficult to maintain, scale, and capture value from this process in practice.

Machine learning is another way for machines to process information. In this case, the system is ‘trained’ by way of example, rather than given rules. For example, a system that is meant to classify images into either pictures of humans or pictures of cats would be given a set of images and told they are humans, and another set and told they are cats. From there, the system can move on to classifying other images, with feedback being given continually. It is through this feedback that the system is able to constantly adjust to improve its classification ability and yield greater insights.

Text mining and semantic enrichment are increasingly being used as data processing techniques to enable machine learning programs. Here are a few examples of how machine learning is helping the industry to evolve.


Want to learn more? Text mining enables researchers to deliver valuable insights based on relevant data. Find out about XML for Mining and more about what your team needs to know about text mining.

The post Understanding Text Mining: 4 Need-to-Know Terms and Their Definitions appeared first on Copyright Clearance Center.

]]> 0