As companies collect increasing amounts of data, the latter tends to end up scattered around various different sources. For this reason, there is a need to distribute data from various data sources to the data warehouse.
This explains the recent explosion in the past few years of ETL/ELT tools. (internal, open-source, and SaaS). This new trend is not going to stop, and we'd rather bring visibility and structure soon.
At Castor, we believe the first step to structure the ETL tools market, is more transparency. For that reason, we put up a list of all the ETL/EL-T tools we heard of. More context on this analysis and the state of the ETL tools ecosystem here.
Get started with ETL and EL-T tools
Dive deeper into ETL and EL-T tools
Processing: Does the solution propose batch processing, stream processing, both?
Number of data sources: From how many applications can the ETL tool export data?
Observability: Does the tool allow you to see how your syncs are performing at large? Can you easily identify when sync fails and why? Can you get an alert when sync fails?
Transformation: Which kind of transformation does the solution offer? SQL, Dbt, both?
Custom connector: Can the solution build and maintain a custom data connector upon customer request?
Modeling: How is the data warehouse queried using the solution? Is it pure SQL, or does the solution have easy mode/ no-code features such as drag and drop?
Incremental synchronization: When synchronizing the sources with the destination, does the solution offer the possibility to only synchronize data that has been modified since the last export, or does it send all the data in the segment each time
Community: Is there an online community around the tool? How helpful and knowledgeable are the responses found in this community?
Features:
Capture deletes: Data deleted in the sources can still be accessed in the destination although it is marked as "deleted" in the destination
Custom data: The solution replicates your custom data. Custom data includes custom objects, tables, and fields that you have configured in the source system to better suit your business needs.
Re-sync: Re-sync all your data from scratch.
Data blocking: Prevent some tables or columns from replicating in your destination. Make sure only relevant data is synced to the destination, and that PII information remains protected.
History: The solution offers the possibility to see how your data changed over time
Import API: Data sent to the Import API is processed and sent to your destination through the solution like data from any other integration. Possibility to pull in data from any rest API.
Automatic scaling: High availability infrastructure which can process billions of records every day
Change data capture: Change data capture (CDC) is a process that captures changes made in a database and ensures that those changes are replicated to a destination such as a data warehouse.
Failed syncs: Do you get a notification when sync fail?
Name | Website | Deployment | Pricing | Process used | Optimized for | Processing | Out-of the box connectors | Transformations | incremental updates | Change data capture | History | Automatic scaling | Import API | Failed sync | features | Security | Observability | No-code platform | Setup time | Community |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Fivetran | Proprietary Cloud-based | Volume-based | ELT | ExtractionLoading | BatchStream/real-time | 150+Create custom/Rest API | DbtSQL | HistoryCapture deletesData blocking | GDPR compliantHIPAA compliant | 5min- 2hrs | ||||||||||
Stitch | Proprietary | Volume-based | ELTETL | Transformation LoadingExtraction | BatchStream/real-time | 130+ | SQLPythonJavaGUI | Import APIAutomatic scalingSupport for paying customersExtract data from any source using singerEasy schedulingEnterpriseNo code | GDPR compliantHIPAA compliantSOC2 compliant | |||||||||||
Airbyte | Open-source | To be announced | ELT | ExtractionLoading | BatchStream/real-time | 110+Create custom/Rest API | Dbt | Historysupport within 2-3hrsBuild a new connector under two hoursNormalized schemas.Easy schedule | ||||||||||||
Xplenty | Proprietary Cloud-based | Connector-based | ETLELT | ExtractionLoadingTransformation | Stream/real-timeBatch | 140+ | GUINo-code | Import API | SOC2 compliantGDPR compliantHIPAA compliantCCPA | |||||||||||
Popsink | SaaS | 24 | ||||||||||||||||||
Rivery | Proprietary SaaS | Volume-based | ETLELT | ExtractionLoadingTransformation | Stream/real-time | 150+Create custom/Rest API | SQL | Change data capture | GDPR compliantHIPAA compliantSOC2 compliant | |||||||||||
Alooma | Proprietary | Contact only | ETLELT | ExtractionLoadingTransformation | Stream/real-timeBatch | GUINo-code | Automatic scalingData visualization | SOC2 compliantHIPAA compliantGDPR compliantOAuth 2.0 | 10 min | |||||||||||
Improvado | Proprietary | Contact only | ETL | ExtractionLoadingTransformation | 200+ | GUINo-code | Import APINormalized schemas. | |||||||||||||
Segment | SaaSProprietary | Company size | ETL | ExtractionLoading | BatchStream/real-time | 40+ | GUINo-code | |||||||||||||
Rudderstack | SaaSOpen-sourceEnterprise VPC | Volume-based | ETL | ExtractionLoadingTransformation | BatchStream/real-time | 150+Create custom | javascript | event streamtransformationsReverse ETLEvent replaySSOSingle tenantGrafana dashboards | SOC2 compliantGDPR compliantHIPAA compliantCCPA compliant | Minutes | ||||||||||
Portable | Proprietary Cloud-based | Connector-based | ELT | ExtractionLoading | Batch | 250+ | N/A | HistoryCapture deletesSupport for paying customersAutomatic scalingEnterpriseNo code | 5-30min | |||||||||||
Meltano | Open-source | Free | ELT | ExtractionLoading | BatchStream/real-time | 30+ | Dbt | Minutes | ||||||||||||
Hevo | Proprietary | Events-based | ELT | ExtractionLoadingTransformation | Stream/real-time | 100+ | SQL | Automatic scaling24/7 live support | GDPR compliantSOC2 compliantHIPAA compliant | Minutes | ||||||||||
Meroxa | PaaSOpen-source | Events-based | ELT | ExtractionLoadingTransformation | Stream/real-time | 10+ | Change data capture | |||||||||||||
Panoply | SaaSProprietary | Volume-based | ELT | 50+ | GDPR compliantSOC2 compliant | |||||||||||||||
Matillion | Proprietary | Contact only | ETL | ExtractionLoadingTransformation | Batch | 100+ | GUINo-code | Automatic scaling | Few days | |||||||||||
Singer | Open-source | Free | ETL | Batch | 100+ | |||||||||||||||
Apache camel | Open-source | Free | Stream/real-time | |||||||||||||||||
Logstache | Open-source | Free | ETLELT | Stream/real-time | 200+ | |||||||||||||||
Safe | ETL | ExtractionLoadingTransformation | Batch | 450+ | No-code | |||||||||||||||
IBM Datastage | Proprietary | Contact only | ETL | ExtractionTransformation Loading | Batch | |||||||||||||||
Informatica | Proprietary | Contact only | ETL | ExtractionLoadingTransformation | Batch | HIPAA compliantSOC2 compliantSOC 3 compliant | ||||||||||||||
Talend | Open-sourceOn-premises | |||||||||||||||||||
Mulesoft | Proprietary | |||||||||||||||||||
Pipeline wise | Proprietary | Integrations with Singer | ||||||||||||||||||
Oracle | Proprietary | |||||||||||||||||||
Pentaho | Proprietary | |||||||||||||||||||
Skyvia | Proprietary | |||||||||||||||||||
SAP data services | Proprietary | |||||||||||||||||||
FlyData | Volume-based | ETL | Stream/real-time | GDPR compliant | 30 min | |||||||||||||||
*This is a brief attempt at classifying the tools on the market. If anything seems wrong. The feature list seems off, or if you don't see your data catalog and want to have it placed, please reach out: louise@castordoc.com