Scientists in any discipline need to play the role of a data scientist in the process of their research, according to an information professional at a science advocacy nonprofit with whom I spoke recently. When she was hired, one of her initial remits was to build out the subscriptions licensing portfolio and to look at external information as more of a strategic resource. She uses her role as the information clearinghouse to break down silos as she identifies groups working in similar projects, keep costs down by reusing data, and negotiate licensing agreements that best match the needs of her organization.

Now, as the researchers within her organization begin a research project on an issue they influence, and start focusing on their data needs, she is brought in as the information expert. She asks the questions that scientists often don’t consider: 

  • Do we have the license that allows you to do what you need?  
  • How may people will need access to it? 
  • How long do we need access? 
Relationship building over time – becoming part of researchers’ workflows

Her ability to serve effectively as a gatekeeper is due to her success at relationship building over time. As she described it to me, “I had worked with one director on several projects and we had built a strong relationship, so now she makes sure that all her scientists contact me for help at the beginning of every project. I have become part of their workflow, and they know to ask me for literature searches or external data sources. And, looking more broadly, I like to insert myself in the project planning and budgeting process. Before the start of each fiscal year, I send out a reminder to the project budget managers, reminding them of the resources we have available and asking how that fits with their next year’s portfolio.” She emphasized the need to “institutionalize our successes so that people understand the importance of bringing the library into their projects at the outset. The data scientists get better outcomes and less stress—they’re happier people—because the library handles the data acquisition and management and we see the continuum of data needs within the institution.”

Creating a big impact with contract and licensing negotiations

The info pro observed that some of the biggest impacts she has had while serving as the information procurement clearinghouse have been around contract and licensing negotiations. “We have had times when a team might think that they have identified the data that they want, and then we start the negotiation and discover that it’s too expensive, or the publisher can’t offer the right kind of licensing, or they couldn’t do what they thought they could do with the data. At that point, the team may decide to modify the project; I see that as a success, in that we’re not spending money on something we would not be able to use. And, honestly, I am using a basic reference interview technique from librarianship—asking them what the question is that they are trying to answer.”

She also noted that, a year ago, a reorganization moved the data analyst—who was responsible for the internal and external data analytics of the non-profit—to the library. “Since the library serves as a procurement clearinghouse for external information, we have the perspective to start looking at how the data analyst can support our knowledge management and internal data infrastructure,” she said.

Where information science and data science intersect

I asked her whether she thought info pros had any blind spots with respect to working with data scientists and she answered that “sometimes we info pros aren’t viewed as having the relevant data analysis skill sets, so we twist ourselves in knots to position ourselves as a key resource. I do think it is important that we work with the data scientists enough to understand their motivations and their workflow. I have learned to not assume that I understand their perspective.”

Take data cleanup, for example. “I often have to put work in at the start of a project to make the data consistent and avoid ambiguities; I know that, if I don’t do that at the beginning, the project team will have more work to do on their end because the data input is bad. When I explain my process to them with references to resources that they understand like Jupyter Notebook rather than using librarian jargon, they are much more likely to see the value of working with me at the start of the project.”

In her experience, info pros recognize data scientists as being fellow “info nerds” and that can lead to mistaken assumptions. “Take data from the US Census Bureau, for example,” she said. “Librarians understand census data as an entity—we know how it is created, how it’s maintained, what kind of structure it has, what its limitations are, where the ambiguities are. Data scientists, on the other hand, are more focused on how they can transform the data in whatever tool they are using rather than about whether the data needs to be made consistent or whether the structure allows for a particular type of analysis. We have to remember that we look at information from a different, broader perspective and can see issues that the data scientists might not anticipate.”

As an info pro who operates primarily as a solo librarian, she finds this kind of collaboration exciting; finding where information science and data science intersect and what her most strategic role can be in furthering the goals of her organization. 

This is the second in a three-part series from Mary Ellen Bates around Info Pros in a Data Driven Enterprise. View the first blog post in this series here: 


Author: Mary Ellen Bates

Mary Ellen Bates is the principal of Bates Information Services Inc., providing business insights to strategic decision makers and consulting services to the information industry. Mary Ellen worked for over a decade in corporate and government information centers before launching her business in 1991. She received her MLIS from the University of California Berkeley and is based near Boulder, Colorado.
Don't Miss a Post

Subscribe to the award-winning
Velocity of Content blog