Tuesday, February 18, 2014

The Forgotten Initiative (Data Management & Data Quality)

The Forgotten Initiative
Data Management & Data Quality

Can you believe that poor data management is the leading cause of many IT project failures? According to the Data Warehousing Institute, six hundred billion dollars is what poor data quality costs businesses annually. So, given that this is such a serious problem, why aren't corporations addressing it more aggressively? In my profession I am faced with this topic almost every time I engage in a new project, I continue to ask myself why and wanted to take a deeper look.   

Before we dive in to the meat of things I thought I’d share a short “funny” video about Data Quality. Those of you who don’t live this world everyday will hopefully not only appreciate the humor but also the sincerity of the problem. 


This blog will provide you with a brief introduction to data quality management and emphasize its importance to American businesses today.  

Specifically we will review:

  • What is Data Quality Management?
  • The obvious challenges with Data Quality
  • Define Data Quality
  • The Four Pillars of Data Quality Management (DQM)
  • How to get started

What is Data Quality Management (DQM)?

Data quality management is an administration type that incorporates the role establishment, role deployment, policies, responsibilities and processes with regard to the acquisition, maintenance, disposition and distribution of data. In order for a data quality management initiative to succeed, a strong partnership between technology groups and the business is required.

Information technology groups are in charge of building and controlling the entire environment, that is, architecture, systems, technical establishments and databases. This overall environment acquires, maintains, disseminates and disposes of an organization's electronic data assets.

The obvious Challenges with Data Quality

It’s not easy for corporations to implement a data quality management system. The challenges aren't simple to overcome. Most require accountability, ownership and corporate structure and buy-in. 

The obvious challenges include:


  • No ownership - The first and probably the biggest challenge is ownership. No single department or business unit is responsible for all of the data in an enterprise setting. Without ownership it’s nearly impossible to keep data up to date. 
  • Requires cross-functional teamwork - The lack of clear responsibility for data leads to the second major challenge. Unlike most corporate functions, data quality management is secondary and there isn't a structure behind it.
  • Recognizing that a data problem exists - Organizations are often in denial about their data quality problems, and it sometimes takes a major catastrophe to change that perception. In other words corporations are not prone to spend money fixing something that they don’t see as broken.
  • Requires discipline - The fourth major challenge is discipline. An effective data quality management program requires discipline. Responsibilities must be assigned and formal procedures must be created and followed – by everyone who handles data for the enterprise.
  • Requires financial and human resources - Data quality management software investment costs are generally inexpensive. The expense comes from the staff that is required to identify and correct data problems on an on-going basis. It’s perceived to be manual and timely.
  • ROI is often difficult to quantify - Data quality management efforts are difficult to fund because the cost of “no quality” is not documented.  The unknown of what “no data quality” could potentially cost is difficult for corporation to recognize as a problem.  As mentioned previously a major disaster has to happen for it to be recognized by the leadership teams. 

Define Data Quality

Data Quality doesn't mean necessarily zero defects. Quality is defining and enforcing valid requirements. Ultimately, it is the business that needs to set the quality requirements. IT organization contributes to the decision by creating process that runs around them.

  • Determine who sets the requirements
  • Determine how requirements are set
  • Determine the degree of conformance that is needed

The Four Pillars of DQM

A four-phase process for achieving successful data quality management for any particular set of data follows.

  • Step 1: Data Profiling - Data profiling is the process of gaining an understanding of the existing data relative to the quality specifications. This process consists initially of looking at the actual data. Two elements we look at are data completeness and accuracy. The insight gained by data profiling can be used to determine how difficult it will be to use existing data.
  • Step 2: Data Quality - Data Quality is building on the information learned from data profiling to understand the causes of the problems. This would be the validation where we uncover the symptoms and inconsistencies in the data. Once identified, we are in position to do something about it. Steps are as follows: (Exclude the data; accepts the data; correct the data; insert default values)
  • Step 3: Data Integration - Data about the same item or a customer often exists in multiple databases. This data can take virtually any form (customer name and address data, product data, etc.). Data quality management / integration can be applied to resolve the data problem.  This would be the process of data consolidation. The idea is to remove duplicate records of the same.
  • Step 4: Data Augmentation - Data augmentation is the last step for increasing the value of data. Data augmentation entails incorporating additional external data (e.g. demographic, financial, operational) information to gain further insights. The value of our data can be substantially increased if we understand it, ensure its quality, integrate it, and augment it. 

How to get started

The challenges in deploying a data quality management initiative are significant, but they are not impossible.

The initial efforts should encompass: 

  • EducationSupport from the key stakeholders will be better if they understand data quality management and are committed to its deployment. The education needs to consist of a combination of theory concerning this subject, case studies from other organizations, and specific issues within your organization’s data. It will be much easier to gain support for the initiative if people recognize that the organization is either wasting money or is missing business opportunities due to data quality management deficiencies.
  • StewardshipA steward is a person who is called upon to exercise responsible care over possessions entrusted to him or to her. In the case of the data steward, it is the person responsible for exercising care over the data assets of the organization. There are numerous responsibilities that need to be fulfilled.
  • PartnershipsData quality management requires a concerted, cooperative effort by people throughout the organization, and partnerships are critical. These partnerships include the commonly recognized ones between the information technology groups and the business units. In addition, information technology groups must partner with each other and business groups must partner with one another.
  • Four-phase programFor each data subject area, you can then proceed through the four phase approach previously described. Armed with information from the data profiling activities, you can identify the most significant opportunities for data quality improvements within a particular data subject area. 
  • Technology supportThe data quality management initiative will only be successful if it is pursued as a partnership and is supported by technology. Data quality management technology capable of supporting the functionality should be acquired to improve the effectiveness of the program and to significantly lower the effort required to manage data as an asset.

Conclusion

In the big scheme of things corporations have concluded that data is an integral piece to successful projects however unlike tangible assets it’s difficult to place a definitive value and fund activities required to adequately manage the process. Deploying a data quality management program isn't easy, but the rewards are enormous. Deploying a disciplined approach to managing data as an important corporate asset will better position your company to improve the productivity of its information workers and to better serve its customers. To move forward, the key stakeholders must be educated, a stewardship function implemented, and appropriate technology must be acquired. With these in place, the four-phase program can be effectively pursued. 

----------------------------------------------------------------------------------------------------------------------------------

References
[1] http://www2.sas.com/proceedings/sugi29/098-29.pdf  
[2] http://www.strikeiron.com/wp-content/uploads/2012/11/strikeiron-data-quality.pdf
[3] http://www.youtube.com/watch?v=E0dIu4dCnJE
[4] http://www.information-management.com/channels/data-management.html
[5] http://www.networkworld.com/newsletters/datacenter/2006/0410datacenter1.html
[6] http://www.techopedia.com/definition/28022/data-quality-management-dqm
[7] http://library.ahima.org/xpedio/groups/public/documents/ahima/bok1_049664.hcsp?dDocName=bok1_049664
[8] http://blogs.forrester.com/category/data_quality
[9] http://searchdatamanagement.techtarget.com/definition/data-quality