In May 2022, CCC acquired Ringgold, a well-known provider of Persistent Identifiers (PIDs) in the domain of scholarly communications. We recently sat down for a virtual chat with Laura Cox, now CCC’s Senior Director, Publishing Industry Data, to discuss PIDs, disambiguation, and the role each plays in improving scholarly communications.
DD: Before we dig in, could you share a little bit about your professional background? You have been active in the scholarly publishing community for some time.
LC: I have been involved in scholarly publishing for over 20 years. I even worked my school and university summers at scholarly publishers in the warehouse, stuffing renewal envelopes and such, so journals were part of my upbringing. After briefly working at Taylor & Francis, I spent over a decade as a consultant — at first working with my father, and then setting up my own business in 2004 — providing industry reports, market research and data analysis projects for a range of clients, from trade associations to publishers themselves. I sold my business to Ringgold in 2011. Since then, I have been involved in pretty much all of Ringgold’s business. I also sit on the ISNI Board as Treasurer. So, for over the past decade, my work has focused on persistent IDs.
DD: To start with, could you explain what – in the context of scholarly publishing – are the major functions of Persistent Identifiers (PIDs)? What problem(s) or issues were they created to address?
LC: The primary function PIDs serve is to disambiguate and deduplicate information about people, places, and things. PIDs have become a backbone of the metadata that tracks information about research – for example, who performs it; where it originates; who funds it; the output itself; and what organizations consume the content.
PIDs are uniquely identifying researchers with ORCID and ISNI IDs, organizations with Ringgold IDs, Funder IDs and ISNI IDs, and publications and articles with DOIs, ISSNs, ISBNs, and making sure that all of those different identifiers are enduring. There are also Grant IDs and Research Activity IDs and a whole host of other PIDs, such as ROR, DUNS, and taxonomies for identifying the type of role, such as CRediT for contributors and Ringgold’s organization classifications.
DD: Pre-internet librarians like me (MSLIS, 1982) worked with ISBNs and ISSNs. More recently, we have come to understand the useful work of the DOI and ORCID. I was not aware of most of those other examples!
LC: It is a growing landscape. The more we use data to inform decision making and make research more efficient, the more we need to persistently identify the various data elements through the process. That is where PIDs come in. I am sure we will think of more things to persistently identify.
DD: I see. How would bringing PIDs earlier into the workflow process improve data quality?
LC: Ideally, we would be able to track research progress from the grant application, through different iterations of the output, including supplementary materials, and on to submission and publication, and then provide analytics which support the grant process all over again. Everyone involved in scholarly communications will benefit from smoother processes and joined-up data for decision making.
DD: Ringgold is also known for its disambiguation services. Can you tell us more about those and the overall value they bring to the process of scholarly publishing?
LC: Sure. We provide data cleansing and mapping services, which we have historically called “auditing” (although not in a financial sense). That is, we deduplicate and disambiguate client data about global organizations involved in scholarly communications in all sorts of contexts – whether content consumers, author affiliations, payees, funders, or others – and then we map it to Ringgold’s PID and supply the rich metadata we hold about those organizations, including detailed hierarchies and classification metadata, to the client. They can use that for APC management, subscription management, author services, analytics and more. When clients use the wider database, it provides support for Master Data Management and removing silos between systems, and it improves data discovery, access and sharing.
DD: Given your new role at CCC, can you give our readers a glimpse of what the near- and middle-term future looks like, from your perspective?
LC: We need to advocate for and support wider use of PIDs, particularly earlier in the research cycle and in the infrastructure that supports the whole research process, and we need to provide more analytics that enable data driven decisions. The more we can identify the components of research, the more we can address change, including key aspects such as DEI efforts and addressing the UN’s Sustainable Development Goals.
DD: I have heard you talk about “plumbing for scholarly communications.” That is a thought-provoking way of framing it. Can you say more about how readers might conceptualize the different value-adds and roles in the publishing process?
LC: If we think about the world we all operate in, it is made up of systems and technical environments which data passes through and is augmented by before it is pushed to the next system. PIDs provide the context for the data and enable it to flow through the systems in an interoperable and smooth process. It makes me think of water flowing through pipes – it is about removing blockages and obstructions that prevent the water flowing smoothly and deriving more value from it.
DD: Anything else you would like to share with our readers about improving data quality?
LC: I would say that we all need to build systems and processes with the data and PIDs at the forefront. Well-structured data with PIDs will improve the user experience and the whole workflow.