This is the second in a series of posts on the topic of Data Management. Read the first post of the series here.
Today we are focusing on the concept of data governance and its importance in furthering the aim of transforming data into an organizational asset.
Here is a simple definition for data governance that I find useful:
“Data governance is a multi-disciplinary approach to making and upholding standards that manage data at scale. It helps organizations assess data sources, quality, and security at all stages of the data pipeline…”
– Lauren Maffeo, “Designing Data Governance from the Ground Up (Pragmatic Bookshelf, 2023)
An alternative definition is provided by Laura Sebastian-Coleman in “Navigating the Labyrinth”:
“Data Governance (DG) is defined as the exercise of authority and control (e.g., planning, monitoring, and enforcement) over the management of data assets.”
“Managing data at scale” goes back to data management’s whole ‘reason for being ‘ that I mentioned in my previous post.
Using that definitional framework, I’ll move to my main point for this post, with what may be a controversial statement: Every organization, regardless of its size, is doing some kind of data governance, whether it realizes it or not.
Often, the governance is *ad hoc* and informal, but no organization that stores any amount of data as part of its business can avoid some kind of data governance, no matter how casual it may seem to be. That data governance may be done by a single person trying to keep a dataset up-to-date and free of errors, but that is still a kind of data governance.
Here are some signs that you are already doing data governance work, but don’t recognize it as a formal discipline: staff maintains list of customers and does its best to keep the list up-to-date; staff consults authoritative sources to validate data in spreadsheets; and staff uses data to make decisions, but is unsure if the data is valid for the conclusions they draw with it.
What I want to discuss in this post is the need for an organization to have a formal data governance program, to replace the informal work it is already doing.
Data governance is the term we use to describe the process that shepherds all other data management efforts in an organization. It is foundational. Data governance sets the rules for all uses of data within an organization.
The following is a list of signs that your organization may need a formal data governance effort:
- You maintain more than one copy of the same data because employees from different teams do not necessarily discuss with each other what they are doing.
- Your data are full of errors and, while there is a desire to correct those, no one has a plan for how to improve data quality.
- Staff disagree on the meaning of important data elements used across the company.
With a formal data governance program, all the issues I mention above become solvable. In particular, a data governance program bolsters the company’s ability to use data effectively for decision making.
By building a data governance program, an organization systematically develops the tools needed to identify bad data, data provenance, proper representation of data licensing, and so on. Identifying the root causes of those data issues leads to programs that can result in the effective and permanent remediation of the related data problems. An organization’s reputation is strengthened if customers, vendors, and partners recognize that the organization’s data is of high quality. There is nothing more embarrassing than client reports of data flaws in our applications.
Today more companies are looking for ways that machine learning (ML) can help automate and further existing process automation programs. The only way such programs can be successful is if the data used to train the ML models is of high quality. Ensuring that an organization’s data quality is high is one of the primary goals of a data governance program.
The very nature of a data governance effort opens lines of communication among various users of an organization’s data. This has the effect of removing the barriers or silos between departments and business units.
One tenet of a data governance program is the establishment of a data governance council. This team’s composition should be cross-departmental and cross-disciplinary. Sitting on a data governance council is often the first time some members of other departments learn how different departments within the same company use the same (or similar) data.
Data governance can clearly help with problems like those we discussed in the preceding paragraphs. But data governance has a preventative function as well as a corrective one.
It is through data governance that an organization can ensure that its use of data conforms to various regulations. No company wants to violate one of the European Union’s [GDPR](https://gdpr.eu/what-is-gdpr/) requirements (just ask [Google](https://www.bbc.com/news/technology-46944696), for instance).
As bad as violating regulatory requirements can be, the loss of reputation that comes from data security violations is worse. Think of all the occasions when we have heard that a company where we shop has been hacked and we have had our personal data stolen. I know that when I get a letter from such a company offering to pay for a year of identity theft protection, I feel that the company did not handle my personal information properly. Data governance, along with robust security practices, can prevent such breaches.
But let us be real. Some data governance efforts miss the mark or fail entirely. As consultant Nicola Askham wrote, “[one] reason many data governance initiatives fail is a lack of support at a management level. If senior management does not buy into the benefits of data governance and only sees the associated costs, an initiative will almost never succeed.”
Data governance therefore is not only a tool for managing data in ways that optimizes the value of data, but it is an invaluable tool for managing the risks associated with all the data that companies collect today.