Why is data quality always an afterthought? This blog post was authored by Dan Sutherland - Senior Director, Enterprise Data and Analytics and Sujay Mehta - Associate Director, Enterprise Data and Analytics on Protiviti's technology insights blog.For probably the umpteenth time, we use the term ‘garbage in, garbage out’ when we summarise problems with data quality. It has indeed become a cliché. Various industry studies have uncovered the high cost of bad data, and it’s estimated that poor data quality costs organisations an average of $12 million yearly. Data teams waste 40% of their time troubleshooting data downtime, even at mature data organisations, and utilising advanced data stacks.Data quality remains an Achilles heel for CIOs, CCOs and CROs, which has always been a critical component of enterprise data governance. In fact, data quality has become even more challenging to tackle with the prolific increase in data volume and types — structured, unstructured and semi-structured data.Data quality is not just a technology problem and never will be because we rarely think of the quality of the data we source when implementing new business initiatives and technology. Technology is only an enabler, and to get the most from the technology, we need to think about the business processes and look for opportunities to re-engineer or revamp these business processes when we start a new technology project. Some of the aspects of understanding these business processes are:What data do we need?Do we understand the sources of this data?Do we have control over these sources?Do we need to apply any transformations (i.e., changes to this data)?Most importantly, do our end users trust the data for their usage and reporting? Topics Data, Analytics and Business Intelligence These questions sound basic and obvious. However, most organisations have trust issues with their data. The end users rarely know the source of truth, so they end up building their data fiefdoms, creating their own reports and maintaining their own dashboards. Eventually, this causes ‘multiple sources of ‘truth,’ each being a different version of the other. As a result, this causes sleepless nights, especially when we want to submit a regulatory report, make any executive decisions or submit SEC filings. Not only is this wasting valuable engineering time, but it’s also costing precious revenue and diverting attention away from initiatives moving the business’s needle. In addition, this is a misuse of data scientists’ core skills and adds additional costs and time that could be better used for the organisation’s business priorities.Over time, data quality issues have become more extensive, complex, and costlier to manage. A survey conducted by Monte Carlo suggests that nearly half of all organisations measure data quality most often by the number of customer complaints their company receives, highlighting the ad hoc nature of this vital element of modern data strategy. Most organisations decide to address this issue in a piecemeal fashion which is a practical approach but requires a tremendous effort to understand the data, document the lineage, identify data owners, identify key data elements (KDE), maintain these KDEs and apply the data governance lifecycle to the data. No wonder this is only a tactical solution; sooner or later, we need to start working on another tactical project to resolve the issues caused by the previous tactical project and so on. This means an endless cycle of massive spending on IT, frustration because of low return on investment from technology projects and buying new technology products that promise a total overhaul.What is data quality management?Data quality management (DQM) is the set of procedures, policies, and processes an enterprise uses to maintain reliable data in a data warehouse as a system of record, golden record, master record or single version of the truth. First, the data must be cleansed using a structured workflow involving profiling, matching, merging, correcting and augmenting source data records. DQM workflows must also ensure the data’s format, content, handling, and management comply with all relevant standards and regulations.So how do we tackle data quality with a proactive approach? There are a few options, from the traditional approach to the real-time solution.Traditional approach: Data quality at the sourceThis is the traditional and, in most cases, the best approach to handling data qualityThis includes identifying all the data sources (external and internal)Documenting the data quality requirements and rulesApplying these rules at the source level (in the case of external sources, we apply these rules where the data enters our environment)Once the quality is handled at the source level, we publish this data for the end users through applications such as a data lake or a data warehouse. This data lake or warehouse becomes the ‘system of insight’ for everyone in the organisation.Pros of this approach:Most reliable approachOne-time and strategic solutionIt helps you with optimising your business processesCons of this approachWe need a cultural shift to look at data quality at the source level, ensuring this is applied every time there is a new data source.This is possible only with executive sponsorship, i.e., a top-down decision-making approach, making it an integral part of every employee’s daily activities.Data owners must be ready to invest time and funding to implement data quality at the sources they are responsible for.Implementation of a data quality management toolModern DQM tools automate profiling, monitoring, parsing, standardising, matching, merging, correcting, cleansing, and enhancing data for delivery into enterprise data warehouses and other downstream repositories. The tools enable creating and revising data quality rules. They support workflow-based monitoring and corrective actions, both automated and manual, in response to quality issues.This approach includes working with the business stakeholders to develop an overall data quality strategy and framework and selecting and implementing the best tool for that framework.The implemented tool should be able to discover all data, profile it and find patterns. The tool then needs to be trained with data quality rules.Once the tool is trained to a satisfactory level, it starts applying the rules, which helps improve the overall data quality.The training of the tool is perpetual — it keeps learning more as you discover and input the new rules.Pros of this approach:Easy to implement, quick resultsThere is no need to separately work on in-depth lineage documentation (tool automates the data lineage) and governance methodology; we need to define the DQ workflows so tools can automate those.Cons of this approach:Training of the tool requires a good understanding of data, data quality requirementsThere is a tendency to expect that everything will be automated. This is not the case.This is not a strategic solution; it does not help with business process improvement.Based on the above considerations, we believe the best approach is a combination of the traditional and the DQM tools approach:First, set up a business-driven data quality framework and an organisation responsible for supporting itSecond, define an enterprise DQ philosophy: “whoever creates the data owns the data.” Surround this with guiding principles and appropriate incentives. Organise around domain-driven design and treat data as a product.Third, develop an architectural blueprint that treats good data and bad data separately and deploy a robust real-time exception framework that notifies the data owner of data quality issues. This framework should include a real-time dashboard highlighting success and failure with clear and well-defined metrics. Bad data should never flow into the good data pipeline.Fourth, incorporating this holistic DQ ecosystem should be mandated for each domain/source/application in a reasonable timeframe and every new application going forward.Data quality remains one of the foremost challenges for most organisations. There is no guaranteed approach to solving this problem. One needs to look at the various factors, such as the organisation’s technology landscape, legacy architecture, existing data governance operating model, business processes, and, most importantly, the organisational culture. The problem cannot be solved only with new technology or by adding more people. It needs to be a combination of business process re-engineering, a data-driven decision-making culture, and the ability to use the DQ tools most optimally. It is not a one-time effort but a lifestyle change for the organisation.To learn more about our data and analytics services, contact us. Find out more about our solutions: Data and Analytics Services Our enterprise data and analytics services help companies not only manage the data, but also break down data silos to identify untapped opportunities and expose hidden risks. Master Data Management By unlocking and fully understanding the relationships between your customers, products and accounts, our master data management services will enable you to use your data as an asset, achieving operational efficiency and driving innovation. Data and Analytics Strategy A data and analytics strategy is critical to maximising business value. Our experts work in collaboration with you to deliver a robust strategy that aligns business goals, challenges and aspirations to build an insights-driven organisation. Leadership Michael Pang Michael is a managing director with over 20 years’ experience. He is the IT consulting practice leader for Protiviti Hong Kong and Mainland China. His experience covers cybersecurity, data privacy protection, IT strategy, IT organisation transformation, IT risk, post ... Learn More Alan Wong Alan is a director at Protiviti Hong Kong with over 21 years of experience in IT and security solutions and project management. He specialises in IT governance, risk assessment, regulatory compliance, and cybersecurity assessment and consulting. He also has an extensive ... Learn More Featured insights WHITEPAPER Modern data architecture as a strategic lever in the competitive landscape Data has become the life blood of businesses and properly managing that data to gain the most value is becoming ever more important as businesses seek to remain competitive. This Insights paper will address the importance of investing in the... WHITEPAPER Revenue Integrity: Leveraging Data to Enhance Collaboration Across the Revenue Cycle In today’s complex healthcare environment, hospitals and health systems and other provider organisations have little room for error within the revenue cycle. Faced with tight margins, growing labor and supply costs, and volatile markets, every dollar... WHITEPAPER Building Sustainable Data Governance Programs with Agile Concepts The digital revolution has led to an explosive growth in the amount of data created and collected, driving businesses to seek new ways to manage and discover value in the data they hold. In turn, the growth of stored data has been a longstanding... BLOGS Why Data Governance Programmes Are Easy to Envision, Difficult to Sustain I am often asked, with all the investments in data management and infrastructure over the last 50 years, why are we still not great with governing data? To put it simply and directly – it’s hard!Data governance programmesare easy to... PODCAST Podcast | ESG and Data Management – with Alyse Mauro Mason and Mark Carson and Zachary Unger Data management is a critical component of an organisation’s reporting and required disclosures, whether they are for regulatory authorities or stakeholders. The needs are no different when it comes to environmental, social and governance (ESG) – an... WHITEPAPER Data Analytics: Strategies to Demonstrate Value and Achieve Transformation Recently, chief information officers, chief data officers and other leaders got together to discuss how data analytics programmes can help organisations achieve transformation, and how that contribution of value can be measured. We joined to share... Button Button