More data, more tools, more people = more data catalogs

Companies are deploying their analytics to more people in the company. Now, regardless of data literacy, most departments of large companies are using data. For that reason, there's a need to improve trust and understanding in data resources and infrastructure.

This explains the recent explosion in the past five years of data catalogs (internal, open-source, and SaaS). This new trend is not going to stop, and we'd rather bring visibility and structure soon.

At Coalesce Catalog, we believe the first step to structure the data catalog market, is more transparency. For that reason, we put up a list of all the catalog tools we heard of.

💡

This list is still exploratory and may contain errors. Please reach out to us, if you notice anything wrong: xavier.deboisredon@coalesce.io

📢

In-depth analysis and evolution Read the full breakdown by generation and market analysis of data catalogs here

‣

Feature definition

For this benchmark, we're assuming that core capabilities like search, extraction, documentation, tagging, and discovery are standard fare in data catalogs. Instead, we're focusing on the standout features that can truly distinguish one catalog from another. Let us know if you feel that we are missing a feature. You will find a definition of each feature below:

Role-Based Access Controls: A system that grants or restricts data access based on a user's role within an organization.

Metadata Analytics: The analysis of aggregated data about other data, to uncover patterns and insights. This can include reports about unused data assets, for example.

Metadata Bulk Edit: The capability to make changes to metadata attributes across multiple data assets simultaneously.

Automated PII Tagging: The process of automatically identifying and marking personally identifiable information within datasets.

Social Data Discovery: A feature that enables users to explore and interact with data assets that colleagues or various teams within the organization are actively using and endorsing.

Column Lineage: The tracking of data's origin and transformations at the column level within databases or data models.

Definition Propagation: The automatic updating and synchronization of data definitions across multiple data assets and systems.

Personalized Views: Customized displays of data or interfaces tailored to individual user preferences or roles. This allows different roles to only see the information that is relevant to them.

Chrome Extension: A small software program that can be installed in the Chrome browser to extend its functionality. This allows users to access the data catalog without having to switch tooling.

Two-way Sync: The continuous synchronization of data in two different locations, ensuring that each reflects the most recent version. This allows for every tool to be a source of truth for documentation. Whether stakeholders check the documentation in dbt, the data catalog, BI tools, etc, the definitions will always be in agreement. Instead of having one source of truth, ensure all your tools act a source of truth with a two-way sync back.

Slack Integration: The ability to connect and let users interact with the data catalog through Slack.

Teams Integration: The ability to connect and let users interact with the data catalog through Teams.

Natural Language Search: Search functionalities enhanced by artificial intelligence to provide more accurate and context-aware results.

AI Documentation: The automatic generation, enhancement, or maintenance of documentation using artificial intelligence.

AI for SQL: AI technologies applied to SQL for optimizing queries, generating code, or interpreting natural language requests into SQL commands.

AI Assistant: An AI-powered tool that provides users with assistance in various tasks through natural language interaction.

Business Glossary: A centralized repository of business terms and definitions, often linked to data assets for clarity and consistency.

Knowledge Map: A visual representation or framework that organizes and displays the relationships and flows between different metrics and KPIs. A sort of data lineage, but for metrics.

Advanced Tag Management: The ability to create, assign, manage, and search for tags within a data catalog, facilitating better organization and retrieval of data assets.

Advanced Search Filtering: Enhanced search capabilities that allow users to narrow down search results using multiple criteria and filters, improving the relevance of search outcomes.

Table Popularity & Frequent Users: Metrics that track and display the usage frequency of data tables and identify the most active users, providing insights into the most valuable and frequently accessed data assets.

Rich Text: The capability to format text within the data catalog's user interface, allowing for better presentation and readability of data documentation and metadata.

API-Based Ingestion: The process of importing metadata and other relevant data into the data catalog using application programming interfaces (APIs), enabling automation and integration with other systems.

On-Premise Metadata Extractor: A tool or service that extracts metadata from various data sources within an on-premises environment, as opposed to cloud-based sources.

SQL Editor: A built-in feature that allows users to write, edit, and execute SQL queries directly within the data catalog, facilitating data exploration and management.

Data Quality Integrations: Connections between the data catalog and data quality tools or services, enabling the assessment and monitoring of data quality within the catalog.

Policy and Workflow: Features that enable the creation, management, and enforcement of data governance policies and the automation of data-related workflows.

Multi-Tenant Infrastructure: An architecture that allows multiple customers or user groups (tenants) to use the same data catalog instance while keeping each tenant's data isolated and secure.

**This is an attempt at classifying the tools on the market. If anything seems wrong, the feature list seems off, or if you don't see your data catalog and want to have it placed, please reach out: xavier.deboisredon@coalesce.io

Feature	Classification	Collibra	Alation	Atlan	Coalesce Catalog	Informatica	Data World	Dataedo	OvalEdge	Purview	Octopai	Acryl	Secoda	Select Star	Metaphor
Role-Based Access Controls	Data Governance
Metadata Analytics	Data Governance
Metadata Bulk Edit	Data Governance
Automated PII tagging	Data Governance
Advanced Tag Management	Data Governance
Policy and Workflow	Data Governance
Multi-Tenant Infrastructure	Data Governance
Social Data Discovery	Data Discovery
Advanced Search Filtering	Data Discovery
Table Popularity & Frequent Users	Data Discovery
Column Lineage	Data Lineage
Cross Platform Lineage (ETL → Data Warehouse → BI tools)	Data Lineage
Definition Propagation	Data Lineage
Personalized Views	User Experience
Chrome Extension	User Experience
Rich Text	User Experience
SQL Editor	User Experience
Two-Way Sync	Integrations
Slack Integration	Integrations
API Based Ingestion	Integrations
On Premise Metadata Extractor	Integrations
Data Quality Integration	Integrations
Natural Language Search	AI features
AI Documentation	AI features
AI for SQL	AI features
AI Assistant	AI features
Business Glossary	Knowledge Management
Knowledge Map	Knowledge Management

More Ressources

Data Catalog Pricing Guide:

Data Catalog Pricing Guide

Data Catalog Template:

Data Catalog Excel Template

Data Catalog RFI template:

Data Catalog Request for information template

Data Catalog ROI calculator:

Data Catalog ROI calculator

F.A.Q

‣

Do You Need a Data Catalog?

Additional comparisons and benchmark resources

How to Make Your Data Catalog Successful

There are only 2 goals that matter when it comes to measuring the success of a data catalog: 1) adoption, and 2) customer satisfaction. If you nail these two, you are successful. I'm the co-creator of the leading open-source data catalog, Amundsen, which is used by 35+ companies including Instacart, Square, Brex, Asana, and many more.

towardsdatascience.com

How to Make Your Data Catalog Successful

The Ultimate Guide to Evaluating a Data Catalog - CastorDoc Blog

Make informed decisions when choosing a data catalog with CastorDoc's comprehensive evaluation guide.

www.castordoc.com

The Ultimate Guide to Evaluating a Data Catalog - CastorDoc Blog

https://towardsdatascience.com/defining-data-ownership-3fbe95fd0125

In the first paragraph of a post I had written earlier this month, I referred to data engineers as producers of data. Someone immediately replied something to the extent of, " You lost me at the first sentence. Data Engineers can't be data owners."

towardsdatascience.com

Castor Data Catalog

Castor Data Catalog is a platform that enables organizations to quickly and easily find, understand, and use their data.

www.castordoc.com

Collibra vs. Alation: Points of Difference Between the Two

It is the data that drives the businesses. Managing it and getting the best out of it is of great importance in any organization. Collibra and Alation are two such platforms helping in removing the barriers for managing data efficiently.

wisdomplexus.com

Collibra vs. Alation: Points of Difference Between the Two

Data Discovery for Business Intelligence

Dashboards and reports are the lingua franca in the world of business. Simple as they may seem, behind each KPI dashboard are data analysts who are responsible for keeping dashboards working, accurate, and fresh. For small teams with a handful of data analysts, building dashboards is easy.

towardsdatascience.com

Data Discovery for Business Intelligence

Data Catalog Tools Benchmark

More data, more tools, more people = more data catalogs

Feature definition

Data Catalog Tools Benchmark

More Ressources

Data Catalog Pricing Guide:

Data Catalog Template:

Data Catalog RFI template:

Data Catalog ROI calculator:

F.A.Q

Do You Need a Data Catalog?

Additional comparisons and benchmark resources

How to Make Your Data Catalog Successful

The Ultimate Guide to Evaluating a Data Catalog - CastorDoc Blog

https://towardsdatascience.com/defining-data-ownership-3fbe95fd0125

Castor Data Catalog

Collibra vs. Alation: Points of Difference Between the Two

Data Discovery for Business Intelligence