As we saw in my recent post, knowledge graphs are emerging as a graphical means of managing and more succinctly expressing the complex, interconnected relationships that result from processing a large volume of data from multiple sources. Knowledge graphs, while emerging as a powerful explanatory tool for researchers as well as for scholarly and scientific publishers, form but a single component in a larger trend of linked data and enhanced data visualizations — tools and components deployed with the aim of getting the most out of reported research results.
At least as far back as Elsevier’s Article 2.0 contest (2008-09), publishers have been thinking about, and working to anticipate, the next revolution in scientific publishing. While the advent of Open Access (OA) has had the most dramatic impact on publishing practices over the past dozen years, we can also readily discern an increasing expectation (on the funder side, possibly also a requirement) that the underlying data be made accessible along with the article. In years to come, this data requirement may become as weighty as the demand for an OA provision. Together, these and other enhancements should serve to ease the process of peer review, ease the conduct of reproducibility studies, and provide additional benefits to the scientific communication process overall.
Leading publishers, as well as other participants in the scholarly and scientific ecosystem have begun fielding some innovative data visualizations, of which I’ll just single out a few examples:
- KnowLife, which describes itself as a “Knowledge Graph for Health and Sciences” (or, a One-Stop Health Portal) was featured in a 2015 article entitled “KnowLife: a versatile approach for constructing a large knowledge graph for biomedical sciences.” Before discussing their methods, the authors point out that “Biomedical knowledge bases (KB’s) have become important assets in life sciences” — which has certainly proven an accurate observation in the time since this was published.
- CCC partner SciBite recently (April, 2020) released a very clear and helpful webinar entitled “Creating Knowledge Graphs from Literature.” It is only 33 minutes long, and very informative on this topic.
- In November 2019, IOS Press published a special issue of the journal Data Science on “Scholarly Data Analysis (Semantics, Analytics, Visualization).” In the editorial introduction to this special issue, the editors state “The latest web technologies – the Semantic Web, Knowledge Graphs, Question Answering, Management Systems and Recommendation-based services – combined with Artificial Intelligence are expected to provide the required pillars for addressing the aforementioned issues” — which seems entirely accurate to me. The entire issue is available free-to-read.
- The Netherlands-based project Data2Semantics is pursuing a single, critical research question along these same lines: “How to share, publish, interpret and reuse scientific data on the Web?” with cooperation from several leading publishers. If the project can get to “one-click semantic enrichment of scientific data,” that indeed will be quite a coup.
- Functional, scalable linked data is fundamental to the knowledge graph approach. In the current “year of research data” sponsored by the STM organization, the SCOLIX (Scholarly Link eXchange) project looks ambitious, but also very promising. The aim, in a large nutshell, is to build a framework facilitating: “ … [a] “wholesaler to wholesaler” exchange framework, to be implemented by existing hubs or global aggregators of data-literature link information such as DataCite, CrossRef, OpenAIRE, or EMBL-EBI.”
- For those with a deeper and ongoing interest in these topics, Springer-Nature provides an entire journal on these themes, Journal of Visualization. A recent article there looks at challenges presented by using visualization techniques in large online university classes. Springer has also published monographs (edited collections from various authors) specifically on Knowledge Graphs. Nature’s journal, Scientific Data ,has a comparable focus.
My initial takeaway is that scholarly and scientific publishers, as a group, realize the enormity of the challenge presented to their readers in confronting ever greater volumes of data — in any field. The commonalities include: finding context and making sense of what you are seeing; scanning the data for that Aha! moment when the previously unknown pattern emerges before your eyes; and moving the peer-review and reproducibility processes along with greater efficiency. As a historian by training, I try to stay away from forecasts; I’d rather forecast the past if I am allowed. But in the scientific world, certainly, more research innovation like the ones I have highlighted above simply has to happen. We can easily understand this sense of urgency in the context of COVID-19 research. The stakes there are too high to accept unnecessary delays, and without increased use of improved visualization techniques, the volume of data will otherwise be just too overwhelming.