To be honest, I wouldn’t blame you for reading the title twice in order to spot the difference. You did so, didn’t you?
Before we answer this question, let’s have a quick look at some base concepts first.
The base concepts
ETL or ELT describes the process of moving data from one or multiple source systems, into a (centralised) target data system. That central data system enables users, other processes, or applications to reuse the combined data for reporting, analysis, mining or to feed into other applications that require data.
Both concepts, ETL and ELT, describe the process and the operations to do so.
E = Extract / T = Transform / L = Load
The order of the characters determines the sequence in which the operations occur in that process.
ETL
-
Extract the data from the source system(s)
-
Transform the data.
-
Load the data into the target system.
ELT, on its turn, does the exact same operations, but in a different order.
ELT
-
Extract the data.
-
Load the data into the target system.
-
Transform the data.
ETL has been the standard and proven way of centralising data over the last few decades.
So why change a winning team?
In that ETL process the Transform step comes before the Load step. That means that very often during that transformation process the owner of the process already must decide which transformations are bound to happen. Meaning the data is already prepared, cleansed, filtered, enriched, or aggregated before being loaded into the centralised data system.
The main reasons for doing these transformation steps upfront are 1; storage capacity, and 2, processing power of the ETL engine. It allowed us to populate our centralised data system in an optimised format and acceptable timeframe.
Nevertheless, this sensible approach comes with a downside. Because we are reducing the stored data in our centralised data system, we potentially lose some of our data and information along that road. Data that seems insignificant now but can be critical in tomorrow’s analysis.
With great data comes great analytics
During the past decade, we’ve seen data analytics move from standard reporting into ad-hoc analytics. Data is no longer used to feed predefined reports but is consumed by a variety of tools, technologies, and users. Scorecards, dashboards, metrics, self-service analytics, predictive analytics and so on… they all want a piece of your data and they all want it in their own unpredictable way.
So how on earth will be able to decide what selection of data we will store, what kind of enrichment we will do to our data, or how our transformation will look like? We simply can’t.
Enter the cloud
But what if don’t have to make that decision upfront? What if we don’t have to worry about storage capacity or the processing power to consume this massive and diverse pool of data? What if we can just push all of our data into that centralised data system and worry about the consumption later?
Cloud data systems are the answer to today’s biggest data challenges, as well as volume capacity and processing power. Because of their cloud nature and its scalability, both of those requirements become virtually unlimited and therefore perhaps the only affordable option left in today’s data landscape.
So instead of bargaining about which data makes it to the centralised data system, and which don’t, we try to store as much as possible. The more data, the better insights, the better insights, the better decisions.
Enter ELT
Without being aware, we are applying EL. We Load the data first and then later we can transform it depending on its purpose and consumer(s).
Example of ELT visualisation
Example of ELT visualisation
Besides that critical aspect of data completeness, ELT comes with a few other advantages as well.
-
Minimal impact on the Extract and Load step
-
As we don’t need to worry about any transformations, we can grab the data in its raw form and push it directly to the centralised data system. Whenever a new source occurs in our system, we set up a separate flow without impacting the existing ones.
-
-
Same data can be used for different purposes without replicating.
-
The same data in our centralised data system can be used in both an aggregated view for high performance reporting and in its most granular form for predictive modeling.
-
-
Other advantages
-
Use the power of the centralised data system to transform the data.
-
Lower duration to load data into the central data system without the transforming part
-
Loading and transformation are decoupled.
-
Failed transformations do not break the data pipeline.
-
…
-
Now, Is ELT better than ETL?
To cycle back to the question that started this data journey: Is ELT is better than ETL?
Want to learn more about ELT? Don't hesitate to check out Fivetran.I think it fair to conclude that ELT is the process of moving data into a centralised data system that fits closer to the data and analytics challenges we face nowadays, as well as optimal, uses the technical solutions out there.
Refocus your time with the help of Fivetran.
Whilst Data Pipelining is an important process in any Modern Data Stack, in isolation, it produces zero value.
The value your data holds only surfaces through analytics & insight. By cutting down the time and effort required to operate pipelines you can reallocate the capacity to do the things that matter most to your organisation, seeing & understanding your data.
We'll gladly show you how Fivetran will save you time and resources, make your data engineer's life easier and finally provide a stable and scalable solution to data pipelining. You can always start a free trial, just get in touch with us and we'll hook you up!
Bjorn Cornelis
Consultancy Director
Biztory