Benchmark - Top 10 Data Quality Tools

As data proliferates in modern organizations, technologies we use to move this data around have become more intricate. Data pipelines have become so complex that businesses have a hard time identifying the root cause of data issues, which leads to tremendous productivity losses.

This explains the recent explosion in the past two years of data quality and observability tools (internal, open-source, and SaaS). This new trend is not going to stop, and we'd rather bring visibility and structure soon.

At Coalesce Catalog, we believe the first step to structure Data Observability tools market, is more transparency. For that reason, we put up a list of all the Observability tools we heard of.

💡

This list is still exploratory, may contain errors, or lack information. Please reach out to us, if you notice anything wrong: xavier.deboisredon@coalesce.io

Get started with Data Observability Tools

📢

In-depth analysis and evolution Read the full breakdown by generation and market analysis of data quality here.

Deeper Dive into the Data Observability landscape

‣

What does each column in the benchmark below mean?

Name	Link	Deployment support	Monitoring framework	Threshold setting	Interface type	High cardinality support	Automated features	Data sources	Integrations	Alert destinations	Security	Metrics categories tracked	Root cause analysis	Community	Github
Bigeye	www.bigeye.com	SaaSOn-premises	Anomaly detectionPipeline testing	AutomatedManual	No-code	Yes	Automated threshold settingAutomated threshold updatingAutomated circuit-breakerQuarantine bad data	Main data warehousesMain databases		GmailSlackPagerdutyAPIs	Certified SOC 2 compliantData stays in your environment	FreshnessOutliersFormatsDistributionVolumeCustom metricsNulls & blanks
Soda	www.soda.io	Cloudopen-source	Anomaly detectionPipeline testing	AutomatedManual	No-codeCommand-line tool	Yes	Automated threshold settingAutomated circuit-breakerQuarantine bad dataAutomated threshold updating	Main data warehousesMain data lakes	CollibraTableauLookerAlation	SlackE-mailwebhooks for alerts & incidents	Data stays in your environment	FreshnessVolumeFormatsSchemaCustom metricsNulls & blanks	Automated failed row analysis	https://soda-community.slack.com/join/shared_invite/zt-pf67xl6u-n3wexBNDl71VC6vK8fSPjg	github.com
Databand	databand.ai	SaaSOpen source core	Pipeline testingAnomaly detection	ManualAutomated	Command-line toolNo-code	Yes	Automated threshold settingAutomated threshold updating	Main data warehousesMain databasesMain data lakes		SlackPagerdutyOpsGenieCustom	Data stays in your environment	SchemaFormatsDistributionCustom metricsOutliersSystemFreshnessData ingestion rate	Lineage
Monte Carlo	www.montecarlodata.com	SaaSCloud	Anomaly detectionPipeline testing	AutomatedManual	No-codeCommand-line tool	Yes	Automated threshold settingAutomated threshold updatingAutomated circuit-breakerQuarantine bad data	Main data lakesMain data warehousesMain databases500+	LookerTableauPeriscopeChartioAlationAtlanAmundsenDbtDatadogModePowerBIPrefectDatahub	SlackPagerdutyWebhooksOpsgenieCustomTeamsMattermost	Certified SOC 2 compliantHIPAAGDPRPCICCPASOC 2 compliantData stays in your environment	FreshnessVolumeDistributionSchemaOutliersCustom metricsCorrelation across metricsNulls & blanksFormats	LineageCorrelation across metrics
Cito	www.citodata.com	SaaSCloudVPC	Anomaly detection	AutomatedManual	No-codeAPI	Yes	Automated threshold settingAutomated threshold updating	Main data warehouses	TableauDbtModePowerBILookerMetabase	SlackE-mail	Data stays in your environment	Nulls & blanksSchemaFreshnessOutliersDistributionCustom metricsFormatsVolume	LineageSQL Code AccessibilityColumn-level lineage
great expectations	greatexpectations.io	Open-sourcecloud-product coming soon.	Pipeline testingAnomaly detection	AutomatedManual	Command-line toolPython notebooksPython library	Yes	Automated threshold settingAutomated threshold updatingAutomated circuit-breakerQuarantine bad dataAuto-resolution	Main data warehousesMain data lakesMain databases	AtlanDbtDagsterAstronomerPrefectPandasKedroFlyteDatahubMarquez	SlackPagerdutyOpsgenieE-mail	Data stays in your environment	Nulls & blanksCorrelation across metricsMultivariate feature checksOutliersFreshnessCustom metricsVolumeDistributionSchema		GitHub Slack
Sifflet	www.siffletapp.com	SaaSOn-premises	Anomaly detectionPipeline testing	AutomatedManual	No-codeAPI	Yes	Automated threshold settingAutomated threshold updating	Main data warehousesSQL server	TableauLookerDatadogDbt	SlackGmailAPIsPagerduty	Data stays in your environment	FreshnessVolumeOutliersFormatsDistributionSchemaNulls & blanksCustom metrics	Lineage
Validio	www.validio.io	SaaSDeployed in the customer cloud environment	Anomaly detectionPipeline testing	AutomatedManual	No-code	Yes	Automated threshold settingAutomated threshold updatingAutomated circuit-breakerAuto-resolution	Main data warehousesMain databases		SlackPagerdutyE-mail	Data stays in your environment	FreshnessDistributionOutliersVolumeData ingestion rateSchemaFormatsMultivariate feature checks
Lightup	www.lightup.ai	SaaSManaged on-premFully on prem	Anomaly detection	AutomatedManual	No-codeAPI	Yes	Automated threshold settingAutomated threshold updating	Main data warehousesMain databasesMain data lakes		SlackTeamsPagerdutyE-mailAPIsMattermostWebhooksFlock	ISAEE 3000 compliantData stays in your environmentCertified SOC 2 compliant	VolumeFreshnessSchemaDistributionFormatsCorrelation across metricsCustom metrics	Correlation across metricsLineage
Lantern	www.lantern.so	SaaS	Anomaly detection	Automated	No-code	Yes	Automated threshold setting	Main data warehouses		SlackE-mail		DistributionVolume
Metaplane	metaplane.dev	SaaSVPC	Anomaly detection	AutomatedManual	No-code	Yes	Automated threshold settingAutomated threshold updating	Main data warehousesMain databases	DbtLookerTableauModePowerBI	SlackPagerdutyOpsgenieTeams	Certified SOC 2 compliantData stays in your environment	FreshnessOutliersDistributionVolumeSchemaCustom metricsNulls & blanksFormats	LineageCorrelation across metrics
Datafold	www.datafold.com	SaaS	Anomaly detection	Automated	No-code	Yes	Automated threshold setting	Main data warehouses		SlackPagerdutyE-mailWebhooks		FreshnessOutliersDistribution
Acceldata	www.acceldata.io		Pipeline testing					Main data warehousesMain data lakes				DistributionSchema	Correlation across metrics
Anomalo	www.anomalo.com	SaaSDeployed in the customer cloud environment	Anomaly detection	Automated			Automated threshold setting	Main data warehouses
Marquez	marquezproject.github.io	open-source		Manual	Command-line tool				Amundsen				Lineage

Additional benchmark resources

State of Data Quality Monitoring in 2021

Applied to a modern data stack, "data observability" tools aim to bring to provide the data needed to answer questions of this sort: Are the EC2 instances running my Airflow jobs maxing out on CPU? How often are rows loaded into my database from Fivetran?

www.metaplane.dev

State of Data Quality Monitoring in 2021

Catalog of Data Observability tools

Get started with Data Observability Tools

Deeper Dive into the Data Observability landscape

Data Observability Benchmark and Key Features

Additional benchmark resources

State of Data Quality Monitoring in 2021