Pragmatic Works Nerd News

Real-World Data Testing: Remediating Bad Data

Written by Tim Moolic | Aug 09, 2017

 There's a lot of talk about the need for BI and data warehouse testing to identify and fix bad data out there. What is the real purpose of BI and data warehouse testing? The purpose of data testing is to identify defects, which will add the new challenge of remediating the error. Who in your organization should be responsible for this? 

In this installment of our Real-World Data Testing series, I discussed this issue with our consultant, Jessica Dzurek. Data testing can be a double-edged sword – you implement a test that performs as expected by finding a failure in your result, but now you must dig in and find out why it failed.

In a previous video/blog in this series, consultant Brad Gall pointed out that when companies begin to do BI and data warehouse testing, a crucial step is the need to plan for the time required to remediate found defects.

Here, we discuss how to create a process or plan for failures and defects.

Documentation is also key. A method that Jessica uses on-site is the defect process of reviewing, assessing and assigning.

1. Review – When a failure occurs, you want to review it, identify the problem, and check to ensure there’s enough information from the tester, so the developer can understand what the problem is.

2. Assess – Dive into the failure and troubleshoot to determine if the problem is masking an underlying problem. Investigate and collect information to prepare for a root cause analysis later.

3. Assignment – Meet with people from the development team, your QA group, architect and a business stakeholder to talk through the defect found. Be sure they understand the failure and establish severity and priority of the failure.

This process gives you a definitive plan, but will likely add time to your development cycle. Some failures may take hours to understand the root cause. To manage this, you also need to have a resource plan in place. Define the rules and responsibilities of your team so developers or QA engineers are clear of where their liability and responsibility role ends.

You may want to set up a Time Box for more time-consuming scenarios. For example, define that a QA engineer will spend up to four hours doing an initial assessment/investigation of a test failure. They will document their findings and pass them on at a defect meeting with developers, business stakeholders and architects to determine the next steps.

It's also important to have a clear level of severity defined—"this is something we can live with" vs. "there’s no moving forward until it’s fixed". A final thought, documentation and communication is key. Errors that make it into production are much more costly than those found in development.

So, data testing takes time and planning, but it’s time well spent as bad data affects all areas of your business. We have the tools, training and services available to help you integrate data testing. Join us on our mission to stop bad data.