By Louise de Leyritz from Castor (www.castordoc.com)
The raw data collected by companies is usually messy and unusable for data analysis. Data has to be transformed, so it can be made conducive to value-generating data analysis.
This explains the recent explosion of data transformation tools (internal, open-source, and SaaS). This new trend is not going to stop, and we'd rather bring visibility and structure soon.
At Castor, we believe the first step to structure the data transformation tools market, is more transparency. For that reason, we put up a list of all the data modeling tools we heard of.
Get started on Data Modeling Tools
Deeper dive into SQL Editors
Deployment: Is the tool SaaS or open-source?
Classification: Is the tool exclusively used for transforming data (such as dbt) or is the transformation part of a larger offering? For example, ETL tools transform data, but they also take care of the extract and loading steps.
Security: This criteria notes whether the solution is compliant with any specific regulatory law like GDPR, HIPAA, etc.
Language: What is the scripting language used for data transformations? Scala, Python, SQL? Is the solution no-code?
Community: Is there a community built around the solution? Communities tend to be especially important with open-source tools, as they provide a great amount of support.
Data sources supported: Where are the transformations operated with the solution? Does it support transformations in data warehouses? Databases?
Add data quality checks: test data quality with assertions checks for uniqueness or null values, or write a custom assertion in SQL to check any property of your data.
version control: You can easily track changes and restore version histories of datasets.
Real-time query validation: solution validates compiled queries against BigQuery in real-time, enabling users to identify issues before running queries.
Real-time data transformation: Run SQL search, aggregations and joins just as data is generated.
Name | Website | Deployment | Classification | Security | language | features | Community | data sources supported |
---|---|---|---|---|---|---|---|---|
Dataform π¬π§ | SaaS | Transformation only | Hosted on google cloud platform | No-codeSQL | RedshiftBigQuerySnowflakeAzure SQLPostgresSQL | |||
Modlr π¦πΊ | SaaS | Transformation part of larger offering | No-code | Dashboard customisation | ||||
Trifacta πΊπΈ | SaaS | Transformation only | SOC 2 compliant | SQLNo-code | Data visualization | RedshiftSnowflakeBigQuery | ||
Rudderstack πΊπΈ | Open-source | Transformation part of larger offering | SQL | Real-time transformations | RedshiftBigQuerySnowflakePostgresSQLClickhouse | |||
Matillion π¬π§ | SaaS | Transformation part of larger offering | SOC 2 compliantHIPAACSA STAR | SQLNo-code | RedshiftBigQuerySnowflake | |||
Easymorph π¨π¦ | SaaS | Transformation part of larger offeringETL tool | No-code | 120 built-in transforms | ||||
Paxata πΊπΈ | SaaS | Transformation part of larger offering | No-code | |||||
Rockset πΊπΈ | SaaS | Transformation only | SQL | Real-time transformations | MongoDBDynamoDBPostgresSQLMySQLS3GCSKafka | |||
Beam | Open-source | Transformation part of larger offering | SQL | |||||
Dbt πΊπΈ | Open-source | Transformation only | SOC 2 compliant | SQL | Git integrationsversion controlloggingmodularityreference one data model within another | RedshiftPostgresSQLSnowflake | ||
Mara | Open-source | |||||||
Zillion | Open-source | |||||||
Glue πΊπΈ | SaaS | Transformation part of larger offering | ScalaPython | |||||
Airflow | Open-source | Transformation part of larger offering | Python | Quality checks |