3. Data Quality
Data Quality is one of the main focus areas of the Tuva Project. Every healthcare dataset contains numerous data quality issues, so it's imperative that you have a strong suite of data quality tests and tools to diagnose these issues and understand their impact on analytics.
There are 3 main components in the Tuva approach to data quality:
- Data Pipeline Tests: These are dbt tests that are used at run-time to determine if the source data running through Tuva has data quality problems.
- Data Quality Metrics: These are statistics that are calculated on tables throughout the project--both intermediate and final tables--that tell us whether there are analytic issues with the data.
- Data Quality Dashboard: This is a dashboard that sits on top of the data quality results to make it easier to pinpoint issues.