Tuesday, February 18, 2014

Improving Data Quality

The Problem

Data Quality is something that has been, and will continue to be, of utmost importance to organizations and their analysts because analyses are only as accurate as the data with which they were created. The problem is that data quality is currently getting worse. Worldwide, the amount of inaccurate data has risen from 17 to 22 percent. In the U.S. specifically, organizations believe that up to 25 percent[1] of their data is inaccurate. This means a quarter of the data companies use to make decisions could be misleading them. So why is this happening?

Three leading causes:

1.       More Streams: Today companies are getting data from an ever increasing amount of sources. In addition to the traditional web traffic we now have access to mobile, video, social media and GPS, etc. Gathering and making sense of all this data has led to quality issues.  
2.       More People: As Internet access continues to expand globally we now have all the streams of data mentioned above, but now they are coming from hundreds of different countries. A simple example of this would be dates. Different countries store and display dates differently. So understanding which country the dates are coming from could make a huge difference in analyzing the data.
3.       More Data: As more and more people begin to use more devices for more things there is inevitably going to be more data. Many companies don’t have the bandwidth or ability to make sure all of this new data is up to the necessary standard.   

How are companies addressing this issue?

Here are a few of the many possible solutions for improving data strategy:
Fix the data at the point of capture: The first thing a company should do is make sure the data they have control over is always as clean as possible. The easiest way to get good customer data is to have the customer input it correctly when they create it. To accomplish this, the company needs to know in what format they want the data (i.e. Do they want addresses broken up into individual pieces or in a single entry?). Once the business need is understood, validations can be put into place to ensure the format is correct.
Conduct source system data assessment: Running an analysis of the current database/source system can give a company an idea of the current state of the data. If there are missing or invalid values there is a good chance that there is an issue with the data at the point of entry or somewhere else in the process.
Text Mining: For data where the input cannot be controlled (i.e. social media data) text mining can be a helpful tool. Text mining can parse through all of a company’s data and replace a set of variations of words with standardized terms. This can ensure that more of a company’s data is in the correct format.

Conclusion:

There are many data related challenges for online companies. As tools inprove, companies will be able to better handle the diverse streams of data. However, as more and more people begin to access a companies website it is vital that organizations are smart about how they are keeping the quality of their data high.      

*for more insights on how to improve data quality go to http://dataqualitypro.com
_________________________________
[1]http://cdn.qas.com/us-marketing/whitepapers/unlock-the-power-of-data-2.pdf