tech-docs

How-To: Data Quality Control

!!! abstract “Abstract” This guide is intended to provide some guidelines to make a data quality control over a dataset, to ensure the data has the expected content and quality.

**Use this guide to:**

- Conduct an efficient data quality control with Python tools in a Jupyter notebook.
- Enhance previous data quality controls done with this presented guideline.

**Assumptions:**

- Data to be controlled is tabular.
- Familiarity with Python tools and Jupyter notebooks.

Resources

Setup

Before beggining the analysis, check that everything works correctly.

Execution

1. Set goals and scope

It is important to describe the control to be done, to set expectation about the goals of the analysis and a scope to limit what will be done and what won’t be controlled.

!!! tip This documentation should guide you during the analysis to avoid doing unnecessary work, as well as to help a reader to set expectations about the work to review.

2. Analyze data

Now it is time to analyze the data and make some check to control the content and quality. Here some points to conduct this analysis:

!!! info To be continued soon…

!!! tip Ask for peer review in this analysis, mainly to ensure the control done is correct and it covers the needed aspects.

!!! tip Make focus on controling quality on the content rather than analyzing patterns or extracting insights from the data (for those goals you can conduct an Exploratory Data Analysis or EDA).

3. Document results and conclusions

Once your analysis it is pretty exhaustive to consider it as done, it is time to document the results.

Follow-up

The quality control is a never end process, so make a follow up to ensure the data sources to use will have the quality intended for a solution.