Catalog of Data Modeling Tools
Catalog of Data Modeling Tools

Catalog of Data Modeling Tools

By Louise de Leyritz from Castor (www.castordoc.com)

The raw data collected by companies is usually messy and unusable for data analysis. Data has to be transformed, so it can be made conducive to value-generating data analysis.

This explains the recent explosion of data transformation tools (internal, open-source, and SaaS). This new trend is not going to stop, and we'd rather bring visibility and structure soon.

image

At Castor, we believe the first step to structure the data transformation tools market, is more transparency. For that reason, we put up a list of all the data modeling tools we heard of.

Get started on Data Modeling Tools

πŸ’‘

This list is still exploratory, may contain errors, or lack information. Please reach out to us, if you notice anything wrong: louise@castordoc.com.

πŸ“’

In-depth analysis and evolution Read the full breakdown by generation and market analysis of data transformation here

image

Deeper dive into SQL Editors

What does each column in the benchmark below mean?

Deployment: Is the tool SaaS or open-source?

Classification: Is the tool exclusively used for transforming data (such as dbt) or is the transformation part of a larger offering? For example, ETL tools transform data, but they also take care of the extract and loading steps.

Security: This criteria notes whether the solution is compliant with any specific regulatory law like GDPR, HIPAA, etc.

Language: What is the scripting language used for data transformations? Scala, Python, SQL? Is the solution no-code?

Community: Is there a community built around the solution? Communities tend to be especially important with open-source tools, as they provide a great amount of support.

Data sources supported: Where are the transformations operated with the solution? Does it support transformations in data warehouses? Databases?

Add data quality checks: test data quality with assertions checks for uniqueness or null values, or write a custom assertion in SQL to check any property of your data.

version control: You can easily track changes and restore version histories of datasets.

Real-time query validation: solution validates compiled queries against BigQuery in real-time, enabling users to identify issues before running queries.

Real-time data transformation: Run SQL search, aggregations and joins just as data is generated.

Benchmark data transformation tools

NameWebsiteDeploymentClassificationSecuritylanguagefeaturesCommunitydata sources supported
Dataform πŸ‡¬πŸ‡§
SaaS
Transformation only
Hosted on google cloud platform
No-codeSQL
RedshiftBigQuerySnowflakeAzure SQLPostgresSQL
Modlr πŸ‡¦πŸ‡Ί
SaaS
Transformation part of larger offering
No-code
Dashboard customisation
Trifacta πŸ‡ΊπŸ‡Έ
SaaS
Transformation only
SOC 2 compliant
SQLNo-code
Data visualization
RedshiftSnowflakeBigQuery
Rudderstack πŸ‡ΊπŸ‡Έ
Open-source
Transformation part of larger offering
SQL
Real-time transformations
RedshiftBigQuerySnowflakePostgresSQLClickhouse
Matillion πŸ‡¬πŸ‡§
SaaS
Transformation part of larger offering
SOC 2 compliantHIPAACSA STAR
SQLNo-code
RedshiftBigQuerySnowflake
Easymorph πŸ‡¨πŸ‡¦
SaaS
Transformation part of larger offeringETL tool
No-code
120 built-in transforms
Paxata πŸ‡ΊπŸ‡Έ
SaaS
Transformation part of larger offering
No-code
Rockset πŸ‡ΊπŸ‡Έ
SaaS
Transformation only
SQL
Real-time transformations
MongoDBDynamoDBPostgresSQLMySQLS3GCSKafka
Beam
Open-source
Transformation part of larger offering
SQL
Dbt πŸ‡ΊπŸ‡Έ
Open-source
Transformation only
SOC 2 compliant
SQL
Git integrationsversion controlloggingmodularityreference one data model within another
RedshiftPostgresSQLSnowflake
Mara
Open-source
Zillion
Open-source
Glue πŸ‡ΊπŸ‡Έ
SaaS
Transformation part of larger offering
ScalaPython
Airflow
Open-source
Transformation part of larger offering
Python
Quality checks