How Much Data Governance Is Enough Data Governance?
Executive Summary
This article discusses the different questions managers should address when planning a project to improve data governance and the usefulness of an organization's data.
Included are recommendations for specific actions that will support delivery of value in the short term while laying a foundation for future growth. I've boldfaced the actions I personally believe are the most important:
Focus initially on one and only one important problem, the data needed to describe and address it, and analytical deliverables that are clearly linked to the single problem.
Identify the systems and applications directly associated with the initial problem to be addressed. Determine what you need to know to understand how these systems and applications operate and interact. Determine how to manage them in order to deliver useful analytics quickly.
Do not initiate any efforts without establishing defined metrics for tracking both costs and business and operational effectiveness. Frequently ask questions like “How will we know if or when X happens?”
Try to support initial data governance efforts using available software tools. You can buy something dedicated after you understand your requirements better for managing data and metadata.
Engage with at least one influential stakeholder capable of expressing -- and understanding -- a vision for what improved data analytics can accomplish.
Define both effectiveness and efficiency measures based on how better data and analytics address the initially selected application or problem.
Focus initially on system and application dependencies that are controllable or static, e.g., by minimizing the number or complexity of "moving data targets" when starting out.
Emphasize a light management touch involving minimal bureaucracy, ceremony, and documentation. If appropriate, adopt an agile project management approach. Focus on constant communication and feedback as data related deliverables are developed, tested, and evaluated.
Consider not only machine-to-machine but also human-to-machine and human-to-human data exchange for the problem or issue being addressed. Be sure to understand how data are transformed or changed as they are exchanged among different systems and participants.
In connection with the problem or process that will be targeted for initial data governance efforts, create and disseminate a formal project charter that addresses how decisions will be made regarding business requirements, initial project objectives, project constraints, the initial solution approach, who the key stakeholders are, and what their responsibilities will be in this initial project.
Don't just focus on the business processes that may need to be temporarily touched or changed in the initial project but also pay close attention to how better data will impact the organization's decisions and processes, even in the short term.
What is "data governance"?
Definitions of "data governance" vary. Here's mine:
Data Governance is the orchestration and management of all the systems, processes, and technologies that contribute to and maintain the quality, reliability, and usability of an organization's data and metadata.
Why improve data governance?
Do Organizations Need Data Governance as a Service (DGaaS)? described three "data application categories" that data governance systems, processes, and technologies must support:
Using data and metadata to help understand what was (e.g., by providing historical context for data provided to management)
Using data and metadata to help understand what is (e.g., using available data from inside and outside the organization to gain a better understanding on what’s happening now)
Using data and metadata to help understand what will happen (e.g., using predictive models to help compare and contrast different options and scenarios)
Important question to ask when designing an improved data governance program are the following:
Even if we focus initially on only one of these application categories, what constitutes enough data governance to ensure the quality, reliability, and usability of data?
How can we make sure that the data governance processes associated with our data and metadata are both efficient and effective?
No ocean boiling
Starting "small and focused" is important. You don't want to set out to "boil the ocean" and become so engaged with complexity that you lose management's support along the way.
Initial overreaching -- taking on too many data related challenges initially -- is potentially a problem when the organization's data literacy is scarce while at the same times legacy systems and processes need to evolve as the organization pursues digital transformation and modernization.
You need to pick initial targets for improved data governance carefully. One implication of this "no ocean boiling" rule is that it may not be wise to start out by trying to inventory and model all the organization's data. Instead, focus on the data and metadata immediately involved with solving a critical problem and evolve from there.
Beware low hanging fruit
At the same time you don't want to overemphasize "low hanging fruit" or short term deliverables. Doing so runs the risk of "underwhelming" your target audiences or management with delivered analytics. You don't want to hear, "This is all you got?"
This might be an issue in organizations where readily accessible pools of data already exist but don't readily relate to important challenges or problems. Using currently available data and readily available visualization tools, you can quickly develop analytical prototypes, proof-of-concepts, and dashboards. Software vendors understandably promote ease of use of their analytical products but these may not be the best place to start if they don't focus on problems management really worries about.
You may find, for example, that when you examine your data sources that your problems extend beyond typical problems like missing data or values caused by minor data collection or transformation issues. Your data quality and consistency conditions may actually be more serious than you initially expected. Addressing them might require more time and resources than the quick reporting and graphics you've promised and can deliver from available sources.
Planning required
Either way, making better use of your data requires a planning effort and appropriate management. I've addressed data governance scoping -- deciding what's in and what's out in a data governance effort -- in the two part series A Framework for Defining the Scope of Data Governance Strategy Projects (Part 1, Part 2). There I suggested that having a well defined problem, application, or question was probably the key ingredient to setting reasonable bounds around an initial improvement in data governance based improvements in data analytics. Proper scoping, especially early on, is essential.
The topic of where to start is also important. Some suggest that hiring a chief data officer or data scientist at the outset is also a good idea. An example of this approach is described by Anirban Das in Reasons to hire a Director of Data Science before hiring your first data scientist.
The situation will drive whether it is more important to hire a data scientist with business qualifications or a data savvy business manager to head the effort. Either way, it is important to realize that data governance in an organization is not just a technical exercise but one that requires participation of both management and the IT department. An approach that balances business and technology is critical.
Prioritized alignment
Whichever approach you take you need to balance both tactical and strategic concerns. An overarching goal is to align data analysis and governance efforts with the needs of the organization given where it is now and where it is going. You need to prioritize when and how you address a range of technical as well as non-technical concerns.
Because of the way data permeates and flows through an organization you need to know where to "draw the line" especially when starting out. For example, once you select the problem area or issue you will be addressing with improved analytics, your planning assessment should address all three "data exchange" levels, as shown in Figure 1:
Figure 1. Data Exchange Levels
Machine-machine data exchanges inside and outside the organization are usually the most formally and rigorously controlled and governed given the complex structures of most traditional databases and applications. Otherwise they can't operate. In practical terms, machine-machine data exchanges often operate with reference to outside organizations (e.g., industry data exchange standards). For example, a medical insurance company would rarely attempt to 'invent" its own diagnostic codes but would instead defer to industry or governmental organizations.
Machine-human data exchanges occur when people enter data into systems or when they read and obtain data from systems. Humans need to understand what it is they are getting out as well as what it is they put in. even when heavily automated, the processes associated with defining and maintaining data equivalence between input and output require careful governance in order to maintain consistency.
Human-human data exchanges are the most malleable and subject to error or misinterpretation given the reality of human speech patterns. Data definitions associate with accounting are different from the language used to talk with customers, and vice-versa. Different groups use different words to refer to the same thing. Language changes over time and even age impacts the words used (e.g., I still refer to DCA as "National Airport" which causes some younger folks to ask questions).
Tools
Software tools designed to support complex data governance processes that can address how data "flow" across machine-machine, human-machine, and human-human data exchanges are becoming more available and robust. For a list of software tools related to data governance see the table of contents for the research report Global Data Governance Software Market Size, Status and Forecast 2022. Another list is 30 top master data management products. Still, it might not be a good idea to purchase a data governance software tool before you figure out how -- and why -- you need the tool.
Data governance areas
Which areas need to be addressed when establishing a data governnce program that supports improved data analytics? Consider the following ten "facets" of data governance identified by Mathematica in its white paper Holistic Data Governance: A Framework for Competitive Advantage:
Figure 2. Data Governance Facets
The individual "facets" identified in Figure 2 will be familiar to any consultant that has ever been engaged with "strategic alignment" or IT strategy projects in large organizations. These are traditional areas that planners, strategists, managers, and consultants need to address in the design of any initiative to support improved data analytics or digital transformation.
Whatever data dependent problem or decision you set out to address with improved data analytics, each of the above areas needs to be considered, even when you are eager to provide an early deliverable or prototype that shows management what better data analytics can do.
But which facets which should you address initially when management is breathing down your neck to get a data analytics program underway? How do you design a project to deliver both useful analytics quickly as well as a sustainable and expandable data governance process?
Now or later?
Assume that you are developing a shared document using a collaborative tool such as SharePoint that describes your plan. Such a document should provide:
A description of the specific tasks and initiatives that need to be performed in order to deliver as quickly as possible a useful analytical deliverable to management as a proof of value.
A description of how the work associated with these early deliverables will serve as the foundation for an enterprise wide data governance and analysis program.
Documented plans that describe how the processes associated with (1) and (2) will be managed, communicated, and evaluated (i.e., who does what and when).
The standard categories provided in the Mathematica list provide an excellent starting point for addressing all three of these requirements. For each facet, the planner needs to address:
What needs to be done now (tactical).
How this relates to an eventual enterprise wide effort (strategic).
How these efforts will all be managed.
Necessary but insufficient
For some organizations the above may not be enough. The Mathematica facet list, after all, is fundamentally a standard categorization of what needs to be done with any serious tech-reliant initiative, not just those that focus on data or analytics.Taking a comprehensive view of how data can be exploited may also require both organizational and technical capabilities that are new or unfamiliar to the organization (e.g., shortcomings regarding staff and management data literacy). Some resistance may arise, for example, when data and metadata standardization require changes in how current systems, processes, and data-related communication or semantics are managed. ("You want us to do what with our data?")
Differences are bound to exist in how even basic data are described by different functions or departments, differences that need to be addressed when taking an enterprise view of data. Different departments may have different ways of referencing customer addresses, for example, differences that ripple through the databases and applications that these departments rely on for daily operational support. At the international level, different countries and cultures may have different family and housing structures that need to be addressed when comparing sales and demographic data.
Existing processes
Also, it's one thing to focus data analysis attention on making improvements in existing processes and systems more efficient. People are likely to understand why certain changes need to be made to increase efficiency related to traditional metrics such as throughput per resource unit or cost per transaction. Such metrics can be understood and justified in the context of currently understood processes and technologies. Focusing initial improvements in data governance on the improvement or optimization of current well-understood processes is one of the reasons why it may make sense to start by upgrading how currently available data are analyzed and presented.
Future focus
It's quite another thing to sell management on what you hope will come out of better data analysis efforts, especially if needed management and governance efforts are complex or expensive.
Such uncertainty will always be a challenge, especially when data literacy is at a premium in the organization. This situation is similar to the challenges associated with justifying R&D expenditures involving uncertain outcomes.
Uncertainty regarding data and analytics is addressed in Risk and Uncertainty in Prioritizing Health Data Analysis Initiatives and in Risk, Uncertainty, and Managing Big Data Projects.)
Basic questions
A data governance program should evolve by supporting initial data analytics initiatives as well as a foundation for future more comprehensive data governance operations.
In support of this evolutionary approach, the following are examples of governance-related questions and issues to address when planning initial data analytics efforts. These are associated -- loosely -- with the "facets" mentioned above. Addressing them will help ensure that initiatives associated with improved data analytics are managed efficiently and with an eye to creating a foundation for future growth. (Another "short list" of planning questions is here: Improving Data Program Management: Where to Start?)
The following list displays, for each planning area. the purpose, tactical (short term) considerations, and strategic (long term) considerations.
1. Alignment
Purpose: Make sure planned data governance efforts directly address problems or issues of importance to the organization.
Tactical: Focus initially on one important problem, the data needed to describe and address it, and analytical deliverables that are clearly linked to the problem.
Strategic: Make sure that how the organization is changing is considered in growing or expanding data governance efforts. As organizational goals and objectives change, data governance efforts should evolve as well.
2. Architecture
Purpose: Understand the technologies associated with managing data and metadata and how they are organized and interact.
Tactical: What systems and applications are directly associated with the initial problem to be addressed? What do we need to know about how these operate and interact? In the short term, how do we manage them in order to deliver useful analytics quickly?
Strategic: In the longer term, what do we need to know about how the organization's technical architecture is changing as the organization as a whole engages in digital transformation efforts? For example, as more systems and data are moved to the cloud, how will data governance be impacted?
3. Business Case
Purpose: Define the relationship between improved data governance efforts and their quantitative and qualitative impacts on the organization and how it accomplishes its goals and objectives.
Tactical: Define both effectiveness and efficiency measures based on how better data and analytics addresses the initially selected application or problem.
Strategic: Develop and implement processes and procedures for engaging with both technical and business staff as data analytics and data governance expand to address more problems and application areas. (This has organizational implications.)
4. Dependencies
Purpose: Understand how the people, processes, and technologies interact in applying data analytics to help solve corporate problems.
Tactical: Focus initially on dependencies that are controllable or static, e.g., by minimizing the number or complexity of "moving data targets" when starting out.
Strategic: Acknowledge that it will never be possible to control or predict all the process or system dependencies that impact how data are governed and analyzed. This interdependency should influence the structure of the governance processes introduced and how they relate to ongoing management.
5. Management
Purpose: Develop and implement the management initiatives required to build and sustain an ongoing data governance effort.
Tactical: Emphasize a "light touch" involving minimal bureaucracy, ceremony, and documentation. If appropriate adopt an agile project management approach. Focus on constant communication and feedback as deliverables are developed, tested, and evaluated.
Strategic: Document as you go what is learned about managing the initial project and how this may need evolve as the scope of data analysis efforts are increased over time. Carefully consider how management of more complex efforts will be managed and how this "fits in" to existing management and oversight practices.
6. Measurement
Purpose: Define, track, and deliver the metrics that describe the costs, benefits, and effectiveness of improved data analytics and their supporting data governance program.
Tactical: Do not initiate any efforts without establishing defined metrics for tracking both costs and business and operational effectiveness.
Strategic: As with Management above, consider how ongoing cost and effectiveness measures of expanded data governance and analytics efforts will be tracked. Be prepared to document the changes that may be required in corporate oversight given the possible need to address cross-functional and cross-departmental data exchange. Such efforts may include consideration of how to overcome a lack of standardization in how data and metadata are managed, exchanged, and communicated.
7. People
Purpose: Secure sufficient human time and talent to manage not only data analytics work and technology but also management support to ensure efficiency, effectiveness, and sustainability.
Tactical: Consider not only machine-to-machine but also human-to-machine and human-to-human data exchange for the problem or issue being addressed. Be sure to understand how data are transformed or changed as they are exchanged among different systems and participants.
Strategic: Understand that over time complete standardization of data and metadata may be neither feasible nor desirable. Take this into account in defining the skills and staffing resources needed to perform necessary data stewardship and data modeling tasks as more application areas, systems, and data are addressed by the organization's data governance plan.
8. Policies
Purpose: Define and communicate how data will be governed, by whom, and for what reasons.
Tactical: In connection with the problem or process that will be targeted for initial data governance efforts, create and disseminate a project charter that addresses how decisions will be made regarding business requirements, initial project objectives, project constraints, the initial solution approach, who the key stakeholders are, and what their responsibilities will be in this initial project,
Strategic: Expand the initial project charter to address the addition of more application areas as the scope of data governance evolves through the organization.
9. Processes
Purpose: Identify the business processes that will be impacted by improvements in how data are analyzed and used in the organization.
Tactical: Don't just focus on the business processes that may need to be temporarily touched or changed in the initial project but also pay close attention to how better data will impact the organization's decisions and processes, even in the short term.
Strategic: As the scope of data governance efforts grows over time, make sure that efficiency and effectiveness measures for processes associated with data generation and processes associated with data utilization are synchronized.
10. Tools
Purpose: Identify what tools are needed to initiate and manage improved data and metadata governance efforts.
Tactical: Try to support initial data governance efforts using available software tools.
Strategic: Seriously consider that, as the number and complexity of data analytics applications and required data governance efforts increases, it may be useful to implement dedicated and flexible tools to support semantic analysis, data stewardship, metadata management, and collaboration.
11. Vision
Purpose: Identify who in the organization can articulate a vision for where the organization is going and secure the support and collaboration of those individuals.
Tactical: Engage with at least one person capable of expressing -- and understanding -- a vision for what improved data analytics can accomplish.
Strategic: Make sure that the people involved in ongoing data governance are in sync with those in the organization who possess and can articulate a strategic vision for how the organization is evolving.
Copyright (c) 2018 by Dennis D. McDonald.