4 Keys to Modern Knowledge Engineering

After decades of research, knowledge engineering is resurfacing as an artificial intelligence-driven solution to collect, understand, and infer relationships from big data.

Knowledge engineering is a branch of AI that refers to the process of creating rules or expert systems to describe or predict relationships in data that would normally be carried out by human experts.

When data is siloed, uncategorized and inaccessible, we’re not able to glean insights from it. The goal of knowledge engineering is to be able to take data that’s been integrated and evaluated, and ultimately turn the insights from this mass of information into knowledge.

But where do we begin, and what tools do we need to start? Here’s a simple look at four keys to modern knowledge engineering:

The semantic web

The semantic web enhances web technologies with formally represented, open ontologies that can unambiguously describe real-world domains in a machine-readable and human-readable, interoperable way.

The basic data structure for representing knowledge, or facts, is surprisingly simple. The combination of subject, predicate and object is highly flexible. These are known as triples, which can be processed as a graph, and form the basis of Linked Open Data.

Open data-sets

Open Data is data that is published with a license permitting free reuse. Governments who publish Open Government Data are taking steps towards greater transparency and, ultimately, accountability. Open data sets are proliferating and, when published using the principles of the semantic web they are known as Linked Open Data or “LOD”. Linked Open Data sets when interlinked, help establish broader connections between silos of human knowledge. This enables data from different sources to be connected and queried.

Machine learning

Machine learning is an approach to synthesize raw or semantically-enriched content to yield insights.

Machine learning processes can be effectively integrated into knowledge engineering pipelines using commonly available software frameworks that incorporate the mathematics and algorithms needed to perform deeper analysis than was possible before.

Machine learning approaches fall into two broad classes: supervised learning and unsupervised learning. The supervised algorithms are given labelled training data, whereas unsupervised learning algorithms find structure within the input data. The construction of a knowledge engineering pipeline will typically need to leverage algorithms from both classes.

Training machine learning models can be highly time consuming. Dedicated Graphics Processing Units (GPUs), which are designed for matrix mathematical operations, can reduce the training time. Several cloud computing providers enable use of GPU infrastructure on a pay as you go basis.

Cloud computing infrastructure

This type of infrastructure, such as Amazon Web Services or Google Cloud Platform, has become a financially controllable investment. Cloud computing enables rapid prototyping and experimentation, thanks to the ability to temporarily acquire large amounts of computational resources for specific tasks.

The breadth of available cloud services has seen tremendous growth, from initial ‘virtual private server’ offerings, to rich platform services such as data streaming, graph databases and machine learning. These capabilities, plus the ability to rapidly scale infrastructure up & down, make the cloud the de facto choice when determining where to build a knowledge engineering pipeline.

Check out Copyright Clearance Center’s solutions for data management.