Popular Available Data Sources and How Researchers Use Them: Exploring ClinicalTrials.gov

“Life moves pretty fast.”

Ferris Bueller made that pronouncement popular back in 1986. But today, in the world of R&D-intensive sciences, data is what’s moving quickly. It is abundant, ever increasing, essential, and constantly updating all around us. Identifying information in a timely fashion is a critical need, and there are many curated resources out there. How are people using them, how could they be improved, and what can we learn from their similarities and differences? In this series, we’ll look at some popular databases and strategies for how to use them effectively.

Clinical Trials

ClinicalTrials.gov is a registry of clinical trials, maintained by the National Library of Medicine (NLM) at the National Institutes of Health (NIH). With over 345,000 trials from over 215 countries, it is the largest database of its kind, resulting in an average of 3.5 million site visitors per month.

In preparation for updating ClinicalTrials.gov, the NLM recently completed a survey of users, and the results were released in April 2020. There were three broad areas surveyed:

Functionality, current and potential improvements
Information submission, including tools for internal consistency and accuracy
Data standards which support and enhance the use of this content

Functionality, current and potential improvements

The first topic led to some rich responses. Users offered suggestions found in other popular services, such as displaying lists of similar studies and guided query builders. Responders also requested specific data that could be made available for searching, including genetic mutation or biomarker, type of intervention, disease subtype, and inclusion or exclusion criteria. This latter element, inclusion and exclusion criteria, came up in several different discussions. This makes sense, both as a major criterion for evaluating a data set or for recommending a patient for a specific trial. This also represents a huge opportunity that, at present, is not met by the standard biomedical ontologies.

Common suggestions for improvements including linking between ClinicalTrials.gov records and other sources, such as related PubMed citations and PubMed Central full-text articles with study findings. Other outside resources, including European or other international datasets, were also requested. (Aggregated search and discovery tools, like Copyright Clearance Center’s RightFind Navigate, were created in part to meet this need.)

What Are Users Searching For?

The final point discussed in the current usage section regarded the scope of user queries: responses were evenly split between those looking for a narrow or wide range of studies. The narrowly focused use cases were seeking studies on a specific disease, condition or design. Those looking for a broad range were seeking all studies in a country or a given factor or type of intervention.

Topics 2 and 3 both mentioned, again, the pain point of standardized terms for eligibility criteria and discussed the use of standard ontologies. Increased usage of common ontologies, particularly those also used by Electronic Health Record Systems (eg RxNorm, SNOMED, LOINC) would increase consistency and accuracy of information, while also allowing for additional mapping of free text to controlled terms and concepts. Given that many PubMed searchers query on gene/protein and disease terms, often using abbreviations, expanding the use of ontologies to support machine learning and semantic enrichment could improve the search experience for many users who search across the biomedical landscape.

Through this series, we have looked at a range of information needs. The goals and strategies of those involved in clinical trials, potential drug-drug interaction research, clinical care, or systematic reviews vary widely. There are multiple resources, both public and private, designed to provide intelligent access to the most timely and relevant data. Life is moving fast. Biomedical sciences are committed to identifying and refining tools and best practices to advance our knowledge, in the 21^st century and beyond.