Search and Discovery – Copyright Clearance Center http://www.copyright.com Rights Licensing Expert Fri, 17 Nov 2017 17:09:39 +0000 en-US hourly 1 http://www.copyright.com/wp-content/uploads/2016/05/cropped-ccc-favicon-32x32.png Search and Discovery – Copyright Clearance Center http://www.copyright.com 32 32 Semantic Search vs. Keyword Search http://www.copyright.com/blog/semantic-search-vs-keyword-search/ http://www.copyright.com/blog/semantic-search-vs-keyword-search/#respond Tue, 14 Nov 2017 06:13:25 +0000 http://www.copyright.com/?post_type=blog_post&p=14828 Ever tried searching for medical papers using a standard search engine?  Happy with results you get?  Probably not.  There are serious limitations to using keyword search in the pharmaceutical industry.  Phil Verdemato from SciBite explains how they can be overcome with the power of semantic search.

The post Semantic Search vs. Keyword Search appeared first on Copyright Clearance Center.

]]>
Ever tried searching for medical papers using a standard search engine?  Happy with results you get?  Probably not.  There are serious limitations to using keyword search in the pharmaceutical industry.

Imagine you were looking for papers featuring the enzyme type ‘GSK’. Using a generic search engine, you would get articles that mentioned ‘glycogen synthase kinase’, as well as articles about the company ‘GlaxoSmithKline’ – which is not particularly relevant to your search here.  However, a semantic search engine, powered by scientific vocabularies and a disambiguation system, will just focus on results featuring the protein, giving you context specificity.

If you needed even more accuracy and wanted to find a specific protein such as GSK3, you would be required to do a search for:

glycogen synthase kinase 3 alpha, GSK-3-A, GSK3A, alpha glycogen synthase kinase-3, glycogen synthase kinase-3A…

It’s a pretty long list of synonym derivatives, right?  A good semantic search system on the other hand, does all this for you when it indexes, so that you don’t have to worry when searching.

Transformative Data Integration

Having done this, you are then set up for better downstream data analysis because your conversion from unstructured to structured (typed) data is way more accurate.

You can then connect your enriched, structured data to databases and other systems, giving enhanced data connectivity across the organisation and speeding up analysis.

Group Level Searches

Great semantic search provides taxonomic relationships between its entities, so higher order searches are possible.  Let’s take the example of ‘Viagra’ – whose current use was found as an adverse effect during its trials for pulmonary hypertension.

I’d find a bunch of articles that would mention things like Viagra’s protein target, Phosphodiesterase 5A (PDE5A).  The image below shows how PDE5A and Phosphodiesterase 11A (PDE11A) were found in an article and where they sit in the taxonomy.

scibite-semantic-enrichment-1

We can see that PDE5A sits in an enzyme taxonomy under the wider ‘Phosphodiesterase’ class. I could click on the ‘Phosphodiesterase’ class and get the system to search for anything under it:

scibite-semantic-enrichment-2

You can see how PDE8B and PDE10A were identified in this way.

scibite-semantic-enrichment-3

This becomes incredibly useful, say if you’re interested in finding out which competitors have developed drugs for a target you’re working on.

What you’re looking for is a rich set of taxonomies covering areas such as diseases, drugs, protein classes and so on.

A good semantic search engine will actually embed the concepts (that’s to say, entities such as “PDE5A”, entity classes, e.g. “gene”, or higher level abstractions like “protein class”) within the plain text.  How is this useful?  Well, query time is really quick and extremely accurate, all because you don’t have to do synonym expansion.

In essence, you have far more control over the granularity of your searches than in generic search engines.  You could, for example, search for articles in Medline that mention any Orphan Disease:

scibite-semantic-enrichment-4

That’s hard to do in a generic search engine that doesn’t leverage life science taxonomy data in one step.

Connections

Additionally, you could examine the co-occurrence data to get a feel for the landscape.  In this example, I could look at the indications commonly associated with documents mentioning PDE5A:

scibite-semantic-enrichment-5

 

Here, we quickly see that Erectile Dysfunction and Pulmonary Hypotension are associated with PDE5A – and also how much time this can save when working in drug repurposing.

You could also look at co-occurrences on a sentence level.  Sentence level co-occurrences are stronger indicators of a real association between entities than document level.  Why? Because at a document level you might find entities in keywords section that hold spurious and unrelated terms.

scibite-semantic-enrichment-6

A comprehensive autocomplete index helps guide your searches.  A little bit more in depth than GSK the company or GSK the protein!

Fit-For-Science Search

But you’re not limited to entities and types that have already been curated.  You can build your own vocabularies or use plain text:

scibite-semantic-enrichment-7

Note how Gilenya is searched for, but FTY720 (a synonym of the drug) is correctly identified.  See also how ‘Indication’ is an entity type and how ‘worldwide or global’ is a plain text query to identify documents that mention either word.

Remember that semantically enabled search is as good as the vocabularies it’s built on.  An excellent vocabulary with a huge number of synonyms means that typing in the brand name of a drug also brings up papers associated with its clinical name.

And there you have it – pitted against the depth and breadth that semantic search offers, keyword search simply cannot compete in terms of accuracy, full awareness, or efficiency.  Semantic search allows you to buy back valuable time that would otherwise be spent sifting through huge amounts of documents, and even convert textual data into something you can integrate across your systems, thanks to entity recognition.

 

Ready to learn more? Check out:

The post Semantic Search vs. Keyword Search appeared first on Copyright Clearance Center.

]]>
http://www.copyright.com/blog/semantic-search-vs-keyword-search/feed/ 0
Taking Semantic Search to Full Text [Upcoming Webinar] http://www.copyright.com/blog/taking-semantic-search-full-text-upcoming-webinar/ http://www.copyright.com/blog/taking-semantic-search-full-text-upcoming-webinar/#respond Tue, 31 Oct 2017 07:43:12 +0000 http://www.copyright.com/?post_type=blog_post&p=14681 How will you take your R&D program to the next level in 2018? One way to accelerate your research initiatives… Read more

The post Taking Semantic Search to Full Text [Upcoming Webinar] appeared first on Copyright Clearance Center.

]]>
How will you take your R&D program to the next level in 2018?

One way to accelerate your research initiatives and inform critical business decisions is through the semantic enrichment of full-text articles.

Semantic enrichment describes the process of adding a layer of meaning to raw content. This enhancement of content with information about its meaning adds structure to unstructured information, enabling users to move quickly to more intelligence-rich information activities.

Semantic search can have an immediate impact across your organization, and taking it a step further with full-text scientific literature improves these outcomes by enabling access to more facts and relationships, secondary study findings and adverse event data.

Even though using abstracts seems like a reasonable approach, there are limitations to what can be discovered through that process.  Researchers need access to the full text of the articles to ensure they don’t miss vital data and undiscovered assertions that can lead to new discoveries.

This sounds well and good, but it doesn’t come without its challenges. Semantic enrichment projects can be resource intensive and can take time to demonstrate business value. Plus, obtaining full-text articles in a machine-readable format across multiple publishers can be a struggle.

Join us on 7 November for a Webinar: Taking Semantic Search to Full Text

CCC will be joined by SciBite on Tuesday, 7 November at 9:00 a.m. or 1:00 p.m. EST for a live webinar.

CCC’s Product Manager Mike Iarrobino, alongside SciBite founder Lee Harland, will discuss:

  • Content challenges facing R&D teams in the life sciences
  • Benefits of semantic enrichment of full text content,
  • Solutions that enables you to reduce manual and administrative overhead, while adding value to information discovery and innovation initiatives.

Need some background information before you attend the webinar? Check out:

The post Taking Semantic Search to Full Text [Upcoming Webinar] appeared first on Copyright Clearance Center.

]]>
http://www.copyright.com/blog/taking-semantic-search-full-text-upcoming-webinar/feed/ 0
5 Ways to Apply Semantic Search Across Your Organization http://www.copyright.com/blog/5-ways-apply-semantic-search-across-organization/ http://www.copyright.com/blog/5-ways-apply-semantic-search-across-organization/#respond Tue, 17 Oct 2017 07:04:46 +0000 http://www.copyright.com/?post_type=blog_post&p=14450 Semantic search can have an immediate impact across your organization. Here are five common use cases.

The post 5 Ways to Apply Semantic Search Across Your Organization appeared first on Copyright Clearance Center.

]]>
Information managers must balance the needs of multiple internal constituencies to support information discovery. In R&D-intensive industries such as the life sciences and chemical manufacturing, semantic search can help – delivering value by giving us the ability to turn content into insight.

Semantic enrichment is the enhancement of content with information about its meaning, thereby adding structure to unstructured content. Semantic search builds on enriched content by matching the user’s query intent – not just the keywords they provide – to the relevant content, helping them quickly discover what they need.

The following illustrates how semantic search can have an immediate impact on five common use cases in life sciences and R&D organizations:

Early Phase Research

Researchers can discover interesting potential biomarkers and drug targets they hadn’t known to look for in advance. These initial results can be linked to supporting source content for further review prior to wet lab.

Competitive Intelligence

Competitor patent filings, often intended to hinder discovery, can be explored alongside non-patent literature (NPL) to provide a full picture of competitor strategy, claims, and prior art for patent landscaping or other purposes.

Pharmacovigilance

Literature monitoring for pharmacovigilance can become both more comprehensive and more precise through semantic searches that suggest links between adverse events and pharmacological substances, increasing the efficiency of these vital monitoring workflows.

Read more: Why Text Mining for Pharmacovigliance?

IDMP (Identification of Medicinal Products) Compliance

IDMP initiatives directed by the Food and Drug Administration (FDA) and European Medicines Agency (EMA) aim to standardize how information can be expressed about pharmacological products. Semantically enriched internal and external content can provide a fuller view of medicinal product attributes, supporting IDMP compliance.

Discovery of Chemical Compounds

Researchers can take advantage of well-established chemical ontologies to conduct more efficient semantic search for chemicals, more easily identifying relevant chemical compounds their properties and relationships.

Use Semantic Search to Uncover Scientific Meaning

R&D and information managers routinely use keyword search to find information they need. While keyword search may satisfy the basic needs of researchers there are limitations that can affect productivity and slow the pace of discovery.

Learn here how semantic search can provide you with more comprehensive and relevant search results.

The post 5 Ways to Apply Semantic Search Across Your Organization appeared first on Copyright Clearance Center.

]]>
http://www.copyright.com/blog/5-ways-apply-semantic-search-across-organization/feed/ 0
Understanding Text Mining: 4 Need-to-Know Terms and Their Definitions http://www.copyright.com/blog/understanding-text-mining-4-need-know-terms-definitions/ http://www.copyright.com/blog/understanding-text-mining-4-need-know-terms-definitions/#respond Tue, 03 Oct 2017 06:03:26 +0000 http://www.copyright.com/?post_type=blog_post&p=14353 Text mining offers many benefits, but the technology is complex. Discover the four terms your team needs to know to gain maximum insights.

The post Understanding Text Mining: 4 Need-to-Know Terms and Their Definitions appeared first on Copyright Clearance Center.

]]>
As the use of text mining becomes more widespread, now is the time for information managers to make sure they understand the basics.

Text mining, the process of deriving high-quality information from text materials using software, helps researchers identify patterns or relations between concepts that would otherwise be difficult to discern. The result is faster discovery and smarter decision-making.

Looking for a place to start? Here are four key text mining terms every information manager should know:

XML

Short for Extensible Markup Language, XML is an information exchange standard designed to improve usability, especially when the data is interpreted by software. In other words, it is a more readily machine-readable version of a document. XML tends to be the preferred input method for semantic or text and data mining technology, as well as other processing software.

When acquiring full-text articles, researchers are usually able to access only PDF format, necessitating conversion into XML for text mining, This can be an arduous and error-prone process.

Semantic enrichment

Semantic enrichment describes the process of adding a layer of meaning to raw content. This enhancement of content with information about its meaning thereby adds structure to unstructured information, making the content easier to synthesize and process further. For example, a scientific article can be enriched by adding in-line annotations or tags describing the genotypes/phenotypes, diseases, drugs, mechanisms of action, and other biomedical concepts mentioned within. Semantic enrichment is a key enabler of the various strategic initiatives undertaken by informatics and information management professionals.

White Paper: Semantic Enrichment & The Information Manager

TDM rights

Content is associated with a variety of rights. Information management professionals and librarians will be familiar with copyright licensing, reproduction rights organizations, and other frameworks and organizations that enable content consumers to use, share, and disseminate information while respecting copyright.

As may be expected, there are a number of copyright-sensitive acts that go hand-in-hand with the text and data mining (TDM) process. Content may be copied, stored, annotated or enriched, and otherwise scanned to produce a useable research output. In most cases, commercial TDM rights are not included in standard subscription agreements. Publishers may make a standard or special set of ‘TDM rights’ available as part of their subscription agreements, or as additional incremental rights.

Machine learning

Machine learning can be an approach to synthesize raw or semantically enriched content to yield insights.

Machines can be instructed to process information in many ways. One way is to apply strict rules that attempt to cover every instance that is likely to come up. For instance, one rule might be: when A is the input, B is always the output. But while this is simple in theory and easy for humans to understand, it can be difficult to maintain, scale, and capture value from this process in practice.

Machine learning is another way for machines to process information. In this case, the system is ‘trained’ by way of example, rather than given rules. For example, a system that is meant to classify images into either pictures of humans or pictures of cats would be given a set of images and told they are humans, and another set and told they are cats. From there, the system can move on to classifying other images, with feedback being given continually. It is through this feedback that the system is able to constantly adjust to improve its classification ability and yield greater insights.

Text mining and semantic enrichment are increasingly being used as data processing techniques to enable machine learning programs. Here are a few examples of how machine learning is helping the industry to evolve.

 

Want to learn more? Text mining enables researchers to deliver valuable insights based on relevant data. Find out about XML for Mining and more about what your team needs to know about text mining.

The post Understanding Text Mining: 4 Need-to-Know Terms and Their Definitions appeared first on Copyright Clearance Center.

]]>
http://www.copyright.com/blog/understanding-text-mining-4-need-know-terms-definitions/feed/ 0
Why Text Mining for Pharmacovigilance? http://www.copyright.com/blog/why-text-mining-for-pharmacovigilance/ http://www.copyright.com/blog/why-text-mining-for-pharmacovigilance/#respond Tue, 08 Aug 2017 08:06:01 +0000 http://www.copyright.com/?post_type=blog_post&p=13910 Literature monitoring is a key component of pharmacovigilance – and a special challenge. Here's how text mining can help.

The post Why Text Mining for Pharmacovigilance? appeared first on Copyright Clearance Center.

]]>
Finding relevant information in large volumes of unstructured text using conventional keyword search can be an arduous process.

Pharmacovigilance teams know this well – they are tasked with monitoring the effects of drugs licensed for use. This market, valued at $1 billion in 2015, is predicted to exceed $8 billion by 2024. Literature monitoring is a key component of pharmacovigilance – and a special challenge. Faced with a range of spontaneous reporting systems, time is often wasted on false positives and dead-ends.

Pharmacovigilance challenges

Biomedical literature can be a rich source of the signals that pharmacovigilance teams need to do their work. However, scientific journal articles are not designed with the special needs of these teams in mind – leaving potentially valuable information locked in unstructured research narratives and reducing the recall of literature screening approaches.

The underreporting of adverse drug reactions by healthcare professionals and patients is also a recognized issue.

Patients’ narratives of drugs and their side effects on social media represent an additional data source for postmarketing drug safety surveillance. A 2014 study conducted by Epidemico which examined 6.9 million social media posts discovered 4,401 tweets resembling an ADR.

The industry is also faced with the prospect of negative drug reactions that don’t feature in HCP reports, but do appear in literature.

Related Reading: What a Study of 15 Million Articles Can Teach us About Text Mining

Text mining and pharmacovigilance working together

Machine analysis can help assuage these challenges. The process, which uses natural language processing (NLP) techniques to swiftly analyze huge quantities of text, can transform every stage of the drug development journey. This means algorithms can identify potential adverse drug reactions within a data set at scale, reducing false positives.

Text mining tools can also help teams fine-tune their queries and see an improvement in search strategy management. Keyword-based search strategies can often be convoluted, messy, and overly-specific, frequently including every synonym possible such as brand name, substance name, pre-release name, as well as a whole range of adverse reactions. These searches can be difficult to update and maintain. Text mining or a semantically-enriched approach can help simplify those queries, making them more powerful and the results easier to interpret.

Need a refresher on semantic enrichment? Watch this on-demand webinar featuring SciBite

Looking to the future

A study by Elsevier looked at how pharmacovigilance teams not currently using text mining would like to incorporate it into their workflows. The results show that needs vary: some want to overcome taxonomy and indexing issues, others want to use it to mine multiple sources.

Whatever the objective, the industry is taking a more data-driven approach to pharmacovigilance. But the journey has only just begun. In the years to come we will see more advanced NLP, algorithms and platforms adding pharmacovigilance-friendly value to data. But right now, text mining means less time spent chasing false positives and less risk of missing vital information, which in turn means better patient care.

At CCC we’ve developed integrated solutions that make it simple to license, access, semantically enrich and index full-text XML articles from a wide range of scientific publishers. Learn more about RightFind XML for Mining here.

The post Why Text Mining for Pharmacovigilance? appeared first on Copyright Clearance Center.

]]>
http://www.copyright.com/blog/why-text-mining-for-pharmacovigilance/feed/ 0
What a Study of 15 Million Scientific Articles Can Teach Us About Text Mining http://www.copyright.com/blog/study-15-million-scientific-articles-text-mining/ http://www.copyright.com/blog/study-15-million-scientific-articles-text-mining/#respond Wed, 26 Jul 2017 10:10:09 +0000 http://www.copyright.com/?post_type=blog_post&p=13736 Inside the largest comparative study of text mining abstracts vs. full text articles.

The post What a Study of 15 Million Scientific Articles Can Teach Us About Text Mining appeared first on Copyright Clearance Center.

]]>
Text mining allows for the rapid review and analysis of large volumes of biomedical literature, giving life science companies valuable insights to drive R&D and inform business decisions.

Given the easy accessibility of article abstracts through such databases as MEDLINE, many researchers use this summary information to identify a collection of articles (or “corpus”) for use in text mining rather than taking steps to obtain the full text of the articles. While abstracts provide some valuable pieces of information, there are limitations in using abstracts that can affect the quality of text mining results when compared to the results of mining a corpus of full-text content.

Some in the research community find abstracts to be good enough for their purposes. We’ve heard these defenses for mining abstracts over full-text.:

  • “More text means more room for false positives.”
  • “Abstracts are more easily accessible via biomedical databases.”
  • “We don’t have the time or resources to spend on additional data cleansing and normalization work for unstructured content.”

While there are kernels of truth to each of these challenges, text mining full-text articles over abstracts has significant benefits.  Now, new research from bioinformaticians at the University of Copenhagen and the University of Denmark confirm that vital information goes undiscovered when abstracts are mined rather than full-text articles.

Inside the largest comparative study of text mining abstracts vs. full text articles

The study, released this month on bioRxiv, an online archive and distribution service for unpublished preprints in the life sciences, involved the analysis of more than 15 million full-text scientific documents and their abstracts published between 1823 and 2016. These articles, mainly in PDF format, comprised articles published by Elsevier, Springer, and those in the Open-Access subset of PMC

The team compared their findings from the corpus of full-text articles to the corresponding results from the matching set of abstracts included in MEDLINE.

Here’s a look at some of the report’s main takeaways:

Full text outperformed MEDLINE abstracts in all benchmarked cases

To showcase the potential of text mining full-text articles, the team extracted published protein-protein, disease-gene, and protein subcellular associations using a named entity recognition system.

In every case, the results showed that mining the full-text article corpus outperformed the same analysis using abstracts only.

“Through rigorous benchmarking and comparison of a variety of biologically relevant associations, we have demonstrated that a substantial amount of relevant information is only found in the full body of text,” the report indicates.

This suggestion isn’t the first of its kind. Back in 2010, a study published in the Journal of Biomedical Informatics found that only 8% of the scientific claims made in full-text articles were found in their abstracts. 

The biggest gain in performance when using full text was seen in finding associations between diseases and genes

The main advantages of text mining full-text scientific articles are volume, information diversity and the inclusion of secondary findings. Unsurprisingly, full-text articles contain more named entities and connections between those entities.

In the case of these 15 million scientific articles, the biggest performance gain in mining full-text articles was the associations found between diseases and genes.

A common mineable format would produce higher quality results 

Despite the perceived benefits of mining abstracts mentioned above, bioinformaticians are aware that full-text articles are likely to yield more information and contain more relationships between named entities than abstracts. The problem isn’t lack of text mining awareness; it’s contending with multiple formats and inconsistent licensing terms.

XML is the preferred format used in text mining software. XML is a markup language used to encode documents in a format that is easily read by computers. It is used widely for encoding documents so that computer programs can parse or display the content appropriately.

The study suggests if all articles were available in a structured XML format, it would have “no doubt produced a higher quality corpus.”

In an interview with Science, co-author Lars Juhl Jensen said converting full-text PDF articles into XML formatting is one of the reasons why full-text mining isn’t typically done at scale.

“We probably spent more computational resources teasing the text out of PDFs and beating it into shape than we spent on the actual text mining,” Jensen said.

As information professionals begin to understand the benefits text mining can have across functions – early phase research, pharmacovigilance, IDMP compliance, and more – the desire to find a better way to mine full-text articles will become greater.

At CCC we’ve developed integrated solutions that make it simple to license, access, semantically enrich and index full-text XML articles from a wide range of scientific publishers. Learn more about RightFind XML for Mining here.

The post What a Study of 15 Million Scientific Articles Can Teach Us About Text Mining appeared first on Copyright Clearance Center.

]]>
http://www.copyright.com/blog/study-15-million-scientific-articles-text-mining/feed/ 0
Discuss your Research Challenges with CCC at Bio-IT World Conference & Expo ’17 May 23-25 in Boston http://www.copyright.com/blog/ccc-bio-world-boston/ http://www.copyright.com/blog/ccc-bio-world-boston/#respond Wed, 17 May 2017 08:30:40 +0000 http://www.copyright.com/?post_type=blog_post&p=12966 Join CCC at Bio-IT World in Boston this May to learn about accelerating research through full text semantic enrichment and data integration.

The post Discuss your Research Challenges with CCC at Bio-IT World Conference & Expo ’17 May 23-25 in Boston appeared first on Copyright Clearance Center.

]]>
CCC will be among 3,300 life science, pharmaceutical, clinical, healthcare and IT professionals from more than 40 countries at the Bio-IT World Conference & Expo ’17 on May 23-25 at the Seaport World Trade Center in Boston, MA. We invite you to visit CCC (booth #548) to talk about your R&D team’s information challenges and how CCC solutions can help.

This year’s conference features more than 200+ technology and scientific presentations. It covers big data, smart data, cloud computing, trends in IT infrastructure, omics technologies, high-performance computing, data analytics, open source and precision medicine, from the research realm to the clinical arena.

Join CCC’s Anna Lyubetskaya (Data Scientist) and Mike Iarrobino (Product Manager) on Thursday, May 25 at 11:40 a.m. for their presentation, Accelerating Research through Full Text Semantic Enrichment and Data Integration.

In this talk, you’ll learn:

  • How to use a network environment where text and data are easily connected to make informed decisions
  • How to integrate expert knowledge computationally to address challenges resulting from an increased volume of information and novel facts present in unstructured text and experimental output.

Bio-IT World ’17 conference hours:

  • Tuesday, May 23 (4 – 7 p.m.)
  • Wednesday, May 24 (8 – 6:30 p.m.)
  • Thursday, May 25 (8 – 1:55 p.m.)

Not attending the conference? Follow all the action using hashtag #BioIT17. Connect with Bio-IT World (@bioitworld) and CCC (@copyrightclear) on Twitter for up-to-the-minute dispatches from the conference.

The post Discuss your Research Challenges with CCC at Bio-IT World Conference & Expo ’17 May 23-25 in Boston appeared first on Copyright Clearance Center.

]]>
http://www.copyright.com/blog/ccc-bio-world-boston/feed/ 0
Before You Begin: What Your Team Needs to Know About Text Mining http://www.copyright.com/blog/what-you-need-to-know-text-mining/ http://www.copyright.com/blog/what-you-need-to-know-text-mining/#comments Tue, 02 May 2017 07:44:21 +0000 http://www.copyright.com/?post_type=blog_post&p=12784 You know the benefits of text mining – accelerated research and faster drug discovery – but is your content creating a barrier to its success?

The post Before You Begin: What Your Team Needs to Know About Text Mining appeared first on Copyright Clearance Center.

]]>
Text mining uses sophisticated natural language processing (NLP) techniques to quickly analyze massive volumes of biomedical literature. It can transform your organization’s approach across the drug development pipeline – from early phase drug discovery and clinical trial development to pharmacovigilance.

Imagine being able to give your laboratory scientists a head start by extracting candidate relationships between whole classes of concepts like genes and diseases from your organization’s existing information resources, at scale and with confidence. Or envision supporting your pharmacovigilance with incredibly precise NLP search strategies, vastly reducing time spent on false positives and focusing their efforts on tracking down meaningful issues. These and more are the promise of a text mining initiative done right.

But before you begin, here are three things your team needs to understand about text mining:

1. Text mining relies on data that’s ready to be mined

Across the life sciences / pharmaceutical industry, information is often siloed and stored in multiple varying formats, reducing its usefulness. Structured or semi-structured content may be in multiple schemas, requiring additional data cleansing and normalization work. Scientific literature may be licensed for only certain uses, and subscription agreements typically do not include permission to conduct text mining activities. Each of these is an obstacle to realizing benefits from your organization’s existing content investments and its text mining efforts.

Related Reading: The Benefits of Text Mining Full Text Instead of Abstracts

2. A strong relationship between bioinformaticians and information managers is key

Collaboration between informatics and information management teams can overcome these challenges. Information managers understand how scientific literature and other resources are consumed within the organization. They also manage external publisher relationships – ensuring a fit between licensed content and the information needs of the organization. Bioinformaticians and other informatics professionals understand data interoperability and the information architecture required to practically apply text mining to solve organizational problems.

3. Have a clear goal in mind – but be realistic about instant benefits

Like most organizational efforts, a text mining initiative is more likely to succeed if stakeholders agree on what success looks like at the outset. Identify an appropriate use case by looking for situations where internal teams struggle to synthesize findings from large amounts of data, have difficulty staying on top of current findings, or suffer from low signal-to-noise ratios in their information resources.

Even with the right use case, keep your expectations reasonable. It takes time and effort to source a proper text mining solution, conduct proofs of concept, evangelize internally, and ultimately scale your efforts across the organization. While you and your team might recognize this, your stakeholders need to share a similar understanding.

By working together on text mining programs, bioinformaticians and information managers can deliver real insights to the organization from relevant data, leading to enhanced drug discovery, more efficient literature monitoring, and fewer information dead ends.

Ready to improve your text mining results? Learn more about XML for Mining.

The post Before You Begin: What Your Team Needs to Know About Text Mining appeared first on Copyright Clearance Center.

]]>
http://www.copyright.com/blog/what-you-need-to-know-text-mining/feed/ 1
The Benefits of Text Mining Full Text Instead of Abstracts http://www.copyright.com/blog/benefits-text-mining-full-text-instead-abstracts/ http://www.copyright.com/blog/benefits-text-mining-full-text-instead-abstracts/#respond Tue, 25 Apr 2017 08:00:41 +0000 http://www.copyright.com/?post_type=blog_post&p=12493 While abstracts provide some valuable information, researchers need access to full-text articles to get the best results from text mining projects.

The post The Benefits of Text Mining Full Text Instead of Abstracts appeared first on Copyright Clearance Center.

]]>
Many researchers use the summary information in article abstracts to compile a collection of records for text mining, rather than using full-text articles.

There are two reasons for this: Abstracts are not only easily accessible via biomedical databases like PubMed, but they also come in a suitable format for text mining: XML.

Even though using abstracts seems like an easy workaround, there are major benefits that come from mining full text. For example, abstracts often don’t include essential facts and relationships, access to secondary study findings, and adverse event data. While abstracts do provide some valuable information, researchers need access to full-text articles to get the best results from text mining projects.

Access More Facts

Full-text articles provide more information than abstracts. The difference is in both volume and type of information, including detailed descriptions of methods and protocols and the complete study results. Authors often include only their most important findings in the abstract, leaving secondary study findings, discoveries, observations and other critical insights only in the full-text article.

    1. Abstracts often exclude, or underrepresent, data. Given the size limitations of abstracts and their concise nature, results that are less relevant to or out of scope of the main idea often are left out. In some cases, critical information may reside in a footnote of full text. By mining all of a given text, including bibliographic information, researchers can gain richer results that reveal vital patterns and information in the documents.
    2. New discoveries are more likely to be mentioned in the full text of articles before appearing in abstracts. Following initial publication of a new discovery in a particular journal, the research is often repeated and included in other publications. But there is a substantial delay between when that discovery appears in full articles and when that information appears in abstracts. In fact, it can take one to two years for discoveries to appear in the abstract of a subsequent article, according to a study conducted by Elsevier.
    3. Full-text articles are more likely to contain information on adverse events. Per a study published in BMC Medical Research Methodology, “Abstracts published in high impact factor medical journals underreport harm even when the articles provide information in the main body of the article.” This missing information can reduce the value of abstracts as the “raw material” to mine, especially in pharmacovigilance use cases, or when researchers want to make novel connections that haven’t been a major focus of the literature.

Uncover More Relationships

Full-text articles also contain more relationships between named entities than abstracts. According to a study published in the Journal of Biomedical Informatics, only 8% of the scientific claims made in full-text articles were found in their abstracts.

Another study, conducted by publisher Elsevier, compared the use of abstracts and full-text articles to derive relevant information about drugs and proteins that affect the progression of fibromyalgia. They found 31 relationships in the literature by mining abstracts and an additional 53 relationships when they ran the same search across the full-text articles.

While text mining article abstracts yields some information, there are limitations to what can be discovered through that process. To ensure that researchers don’t miss vital data, discoveries, and assertions, the full text of the article should be mined.

Ready to learn more? Download 3 Reasons to Consider Text Mining

The post The Benefits of Text Mining Full Text Instead of Abstracts appeared first on Copyright Clearance Center.

]]>
http://www.copyright.com/blog/benefits-text-mining-full-text-instead-abstracts/feed/ 0
What is Text Mining? And How is it Different from a Web Search? http://www.copyright.com/blog/text-mining-different-web-search/ http://www.copyright.com/blog/text-mining-different-web-search/#respond Wed, 19 Apr 2017 08:01:46 +0000 http://www.copyright.com/?post_type=blog_post&p=12508 If you’re ready to beef up your text mining knowledge, here’s your crash course in the basics.

The post What is Text Mining? And How is it Different from a Web Search? appeared first on Copyright Clearance Center.

]]>
If you find text mining to be a confusing concept, you’re not alone. The process of deriving high-quality information from text materials using software requires background knowledge before you can fully grasp how it works, and how it can benefit your team’s research and discovery efforts.

If you’re ready to beef up your text mining knowledge, here’s your crash course in the basics:

First, the definition: What is text mining?

Text mining is a process that derives high-quality information from text materials using software. It is used to extract assertions, facts and relationships from unstructured text (e.g., scholarly articles, internal documents, and more), and identify patterns or relations between items that would otherwise be difficult to discern.

Text analytics and semantic search are two concepts that are closely related to text mining.

Why? Text mining enables R&D teams to systematically and efficiently examine content to answer questions that ultimately guide business decisions and resource investments. The alternative would be to curate thousands of pieces of content (or more) manually. At today’s fast pace, this is unfeasible, unrealistic, and error-prone.

How? Text mining tools employ sophisticated software which uses natural language processing (NLP) algorithms to read and analyze text. There are two basic steps:

  1. The first step is identifying the entities an organization is interested in. In a biomedical setting, these might include genes, cell lines, proteins, small molecules, cellular processes, drugs, or diseases.
  2. The next step is analyzing sentences in which those key entities appear, to determine how they are related. A relationship is a connection between at least two named entities; for example, that gene BCL-2 is an independent predictor of breast cancer.

Text mining can uncover relationships that might not have been found otherwise, unlocking previously hidden information to help researchers:

  • Identify and develop new hypotheses
  • Attain knowledge and improve understanding
  • Discover links between diseases and existing drugs to find new therapeutic uses
  • Detect potential safety issues early

The results of these types of projects can provide a greater understanding of the underlying biology behind specific diseases, show how they respond to certain drugs and support the target discovery process.

What Format Is Used in Text Mining Software?

XML is the preferred format used in text mining software. XML is a markup language used to encode documents in a format that is easily read by computers. It is used widely for encoding documents so that computer programs can parse or display the content appropriately.

Related Blog: The Benefits of Text Mining Full Text Instead of Abstracts

How Does Text Mining Differ from a Web Search?

Typical web searches may seem like the process of text mining, but there are stark differences. Search is the retrieval of documents or other results based on certain search terms. Search engines such as Google, Yahoo or Bing are commonly used to conduct these types of searches, and your organization may also use an enterprise search solution. The output is typically a hyperlink to text/ information residing elsewhere, along with a small amount of text that describes what is found at the other end of the link. The purpose is to find the entire existing work so that its content can be used.

In text mining, the researcher looks to analyze text. The goal is to extract useful information, not solely to find, link to, and retrieve documents that contain specific facts. Unlike with search, the output of text mining varies depending on how the researcher wishes to apply the results.

Search functionality helps users find the specific document(s) they are looking for, where text mining goes well beyond search, to find particular facts and assertions in the literature in order to derive new value.

Ready to learn more? Download 3 Reasons to Consider Text Mining

The post What is Text Mining? And How is it Different from a Web Search? appeared first on Copyright Clearance Center.

]]>
http://www.copyright.com/blog/text-mining-different-web-search/feed/ 0