Common Data Integration Challenges


In our previous post, we defined data integration and how modern R&D organizations approach it. Simply put, data integration refers to combining data from different sources to provide a unified view of the data and easier access to it. Integration is the first step to making scientific data findable, accessible, interoperable, reusable – commonly referred to as the “FAIR” guiding principles for researchers and publishers

However, before any organization can bring these principles to life, they need to first understand and overcome the common challenges that arise when integrating data.

Data Integration Challenge #1: Data Silos

Information silos are common within organizations, because typically each department collects its own needed set of data on a purpose-built system. For instance, marketing teams collect materials like market research reports, competitive intel, and meeting notes, while research teams accumulate journal articles, grant funding info, and clinical trial results, and business development teams monitor the latest developments in patents and start-up companies. All this content is housed in a set of disjointed, incompatible internal and external repositories.

Integrating data can help free the flow of data and analytics across an organization. But the first data integration challenge is exacerbated by the fact that all this data can reside on a combination of both newer cloud-based and older on-premise systems. In fact, many organizations even store recent data separately from historical data, further complicating integration efforts. Sometimes data silos are years in the making, so it takes time and a solid strategy to deconstruct them.

Data Integration Challenge #2: Structured vs. Unstructured Data

Every organization generates and works from a combination of structured and unstructured data. As the name implies, structured data is in a structured format, such as in rows and columns within a relational database. This arrangement enables algorithms and software programs to make sense of the data.

Conversely, unstructured data has no defined organization, meaning it comes in many varieties and forms (for example documents, emails, and photos). As a result, it’s more difficult for an algorithm to access and understand the meaning of the data in these various entities.

That brings us to the data integration challenge. Inconsistency between these two unlike types of data makes it challenging to unite them in a meaningful way. While a structured data field might appear under a column heading of “Company Name,” unstructured data from a series of meeting notes or competitive intel may contain variations on company names that will need to be parsed and assigned to the appropriate “Company Name” column.

How to Overcome These Data Integration Challenges

If you’d like to learn more about how RightFind Navigate can flexibly support nearly any environment, check out:

Topic:

Author: Ray Gilmartin

Ray Gilmartin was Director of Corporate Solutions for CCC., responsible for knowledge management products within the Corporate Business Unit. Before joining CCC, he served in several leadership roles at Akamai, Avid Technology, and HP after beginning his career in TV journalism roles at Hearst Broadcasting and the Christian Science Monitor.