Catalog of Data Storage Tools
Catalog of Data Storage Tools

Catalog of Data Storage Tools

As data proliferates in modern organizations, the technologies we use to store this data have considerably evolved recently.

This explains the recent explosion in the past few years of data storage and data warehousing tools. This new trend is not going to stop, and we'd rather bring visibility and structure soon.

At Castor, we believe the first step to structure the data storage tools market, is more transparency. For that reason, we put up a list of all the data storage tools we heard of.

image
💡

This list is still exploratory, may contain errors, or lack information. Please reach out to us, if you notice anything wrong: louise@castordoc.com

Get started with data storage tools

📢

In-depth analysis and evolution Read the full breakdown by generation and market analysis of data quality here

image

Deep dive in data storage tools

What does each column in the benchmark below mean?

Deployment support: Does the solution support a Saas development model, an open-source model? Both?

Solution: Is the solution a modern or legacy solution? Please refer to the article: to understand how we differentiate the two.

Target market: Does the solution cater to enterprise clients, or does it offer a more affordable model, more suited to mid-market organizations? "Universal" solutions are adapted to both enterprise and mid-market clients.

Core use case: Again, to fully understand this criterion, please refer to the article Cloud Data Warehousing: The Past, Present, and Future. The aim of this criteria is to distinguish between pure data warehouses, and organizations focusing on real-time analytics.

Support for Standard SQL: Does the solution support Standard SQL for querying the warehouse? Today, most solutions do as SQL is the most widespread database language, but you still have some exceptions.

Support for semi-structured data: Does the solution support semi-structured data like Avro, JSON and XML?

Decoupled storage/compute: It implies that what you pay to store data is separate from the cost to run queries on the data. This not only brings cost benefits but also makes cloud data warehouses more performant with the ability to concurrently run hundreds of queries.

Data storage benchmark and key features

NameLinkDeploymentSolutionTarget MarketCore Use CaseSupport for Standard SQLSupports semi-structured dataDecoupled storage/compute
Snowflake
Saas
Modern
Universal
Cloud Data warehousing
Yes
Yes
Yes
Amazon Redshift
Saas
Modern
Universal
Cloud Data warehousing
Yes
No
Yes
Google BigQuery
Saas
Modern
Universal
Cloud Data warehousing
Yes
Yes
Yes
Databricks
Saas
Modern
Enterprise
Cloud Data warehousing
Yes
Yes
Yes
Firebolt
Saas
Modern
Universal
Cloud Data warehousing
Yes
Yes
Yes
Clickhouse
Open-source
Modern
Universal
Real-time analytics
Yes
No
No
Apache Pinot
Open-sourceSaas
Modern
Universal
Real-time analytics
Yes
Yes
No
Apache druid
Open-source
Modern
Universal
Real-time analytics
Yes
Yes
No
Materialize
Saas
Modern
Universal
Real-time analytics
Yes
No
No
Azure Synapse Analytics
Saas
Modern
Enterprise
Cloud Data warehousing
No
No
No
Yellowbrick
Saas
Modern
Enterprise
Cloud Data warehousingReal-time analytics
Yes
Yes
Yes
SAP Hana
Saas
Legacy
Enterprise
Cloud Data warehousing
Yes
Yes
No
Oracle
Saas
Legacy
Enterprise
Cloud Data warehousing
Yes
Yes
No
IBM db2
Saas
Legacy
Enterprise
Cloud Data warehousing
Yes
Yes
No
Vertica
Saas
Legacy
Enterprise
Cloud Data warehousing
Yes
Yes
No
Startree (Managed Apache Pinot)
Open-sourceSaas
Modern
Universal
Real-time analytics
Yes
Yes
No

Additional Comparison and Benchmark Resources

💡

An in-depth comparison between Snowflake, Redshift, and Big query

🏠

An in-depth comparison between Clickhouse, PostgreSQL, and TimescaleDB.

🍷

Detailed comparison between ClickHouse, Druid, and Pinot