LS LOGICIEL SOLUTIONS
Toggle navigation

ETL to ELT Migration: Real Examples & Use Cases

Definition

ETL to ELT migration is the shift from transforming data before it lands in the warehouse to transforming it after. In the old ETL model, extract, transform, load, you pulled data from sources, reshaped it in a separate processing layer, and loaded only the clean, final result into the warehouse. In ELT, extract, load, transform, you load the raw data into the warehouse first and do the transformation there using the warehouse's own compute. The letters move, but the consequences run deep through cost, tooling, team structure, and how trustworthy your data feels.

The change was driven by the economics of cloud data warehouses. ETL existed because warehouses used to be expensive and limited, so you did not want to waste their capacity on heavy transformation or fill them with raw data you might not need. You transformed elsewhere, on cheaper infrastructure, and loaded only the polished result. Cloud warehouses like Snowflake, BigQuery, and Databricks flipped that. They are powerful, they scale on demand, and storage is cheap, so it became reasonable to load everything raw and transform inside the warehouse where the compute is.

By 2026 ELT is the default for new analytics stacks, and the tooling reflects it. Ingestion tools like Fivetran, Airbyte, and Stitch focus purely on extract and load, getting raw data into the warehouse with minimal fuss. Transformation happens separately, most commonly with dbt, which runs SQL transformations inside the warehouse and treats them as version-controlled, tested code. The clean separation between getting data in and shaping it once it is in is the structural signature of the ELT approach.

What teams underestimate is that this is not just a technical swap; it is a change in where complexity lives and who owns it. ETL concentrated transformation logic in a specialized tool managed by data engineers. ELT moves it into SQL in the warehouse, which opens it up to analysts and changes the team dynamics, the cost profile, and the governance challenges. A migration done as a pure tooling replacement, without rethinking ownership and cost, tends to recreate the old problems in a new tool.

This page covers why teams migrate, what genuinely changes, the costs and traps that surprise people, and how to move from ETL to ELT without losing the trust in your data that took years to build. The specific tools keep shifting. The underlying trade, load first and transform in the warehouse instead of before it, is the durable idea.

Key Takeaways

  • ETL transforms data before loading; ELT loads raw data into the warehouse and transforms it there using the warehouse's compute.
  • The shift was driven by cheap cloud warehouse storage and on-demand compute, which made loading everything raw and transforming in place practical.
  • ELT separates ingestion (Fivetran, Airbyte) from transformation (dbt), which changes tooling, cost structure, and who owns the logic.
  • Keeping raw data in the warehouse is a major benefit: you can re-transform without re-extracting and audit back to source.
  • The main risks are warehouse compute cost, transformation sprawl, and governance, all of which need deliberate management rather than hoping the tool handles them.

Why Teams Migrate

The headline reason is flexibility from keeping raw data. In ETL, you decide upfront what transformation to apply, and the raw data is gone once you have loaded the transformed result. If you later need a different shape, or discover a bug in the transformation, you have to re-extract from the source, which may be slow, rate-limited, or no longer hold the historical data. In ELT, the raw data sits in the warehouse, so you can re-transform it any way you need without touching the source again. This single property removes a whole class of painful situations.

The second driver is using the warehouse's compute instead of a separate transformation layer. Cloud warehouses are very good at large-scale SQL processing, and pushing transformation into them means you are not maintaining and scaling a separate processing system. The warehouse scales on demand for the transformation work the same way it does for queries. This consolidates infrastructure and lets the team work in SQL, which far more people know than the proprietary languages of legacy ETL tools.

Speed of development is a real and often underrated reason. ETL pipelines built in heavyweight tools tend to be slow to change, requiring specialized skills and careful coordination. ELT transformations written as SQL in dbt can be developed, tested, and deployed quickly by anyone who knows SQL, with version control and testing baked in. Teams migrate partly because the iteration speed on new data products goes up substantially, which matters when the business keeps asking for new questions to be answered.

The ecosystem pull reinforces all of this. Because ELT is now the default, the best new tooling, the most active communities, and the largest hiring pool all assume the ELT pattern. Staying on a legacy ETL stack increasingly means working against the grain of where the industry's tooling and talent are. Teams migrate not only for the direct benefits but because the modern data stack is built around ELT and fighting that costs more every year.

What Actually Changes

The most concrete change is that transformation logic moves from a specialized ETL tool into SQL in the warehouse. Logic that used to live in a visual pipeline tool or proprietary scripting now lives in version-controlled SQL files. This is generally a gain, the logic becomes readable, testable, and reviewable like any other code, but it means rewriting existing transformations rather than porting them, because the paradigm is different. The migration is partly a rewrite, and pretending otherwise sets up a bad estimate.

Ownership shifts, sometimes uncomfortably. ETL transformation was usually owned by data engineers who managed the specialized tool. ELT transformation in SQL is accessible to analysts and analytics engineers, which broadens who can contribute. This is the intent of the analytics engineering movement, but it also means deciding who owns what, how changes are reviewed, and how to keep quality up when more people can touch the logic. Migrations that ignore this end up with either a bottleneck on the old engineers or a free-for-all where everyone writes overlapping transformations.

The cost profile changes shape. ETL costs lived in the separate transformation infrastructure; ELT costs move into warehouse compute, where every transformation run consumes warehouse resources that show up on the warehouse bill. This is fine and often cheaper overall, but it is visible and it scales with how much transformation you do and how efficiently you do it. Teams that move to ELT without watching warehouse compute can be surprised by a bill that climbs as transformations multiply, because the cost that used to be a fixed separate system is now usage-based and easy to grow.

Data freshness and scheduling get rethought. ETL pipelines often ran on rigid batch schedules tied to the transformation infrastructure. ELT decouples ingestion from transformation, so you can load continuously and transform on a separate cadence, which gives more flexibility but requires deciding how fresh each data product needs to be and orchestrating the transformation runs accordingly. The separation is powerful but it is one more thing to design rather than something the tool decides for you.

The Costs and Traps

Warehouse compute cost is the trap people hit first. Every dbt run, every transformation, every rebuild consumes warehouse compute, and as the number of models grows, so does the cost of running them. A team that does not monitor this can watch the warehouse bill climb steadily as transformations accumulate and run more often than they need to. The fix is the same discipline as any cost management: incremental models that only process new data instead of rebuilding everything, sensible scheduling, and watching which transformations are expensive. ELT does not make cost go away; it relocates it somewhere very easy to grow.

Transformation sprawl is the second trap, and it follows directly from making transformation accessible. When anyone who knows SQL can add a model, the number of models grows fast, and without discipline you get hundreds of overlapping transformations, duplicated logic, and tables nobody is sure they can delete. The same openness that makes ELT productive makes it prone to mess. The cure is the modeling discipline of treating transformations as reviewed, tested, documented code with clear ownership, not letting the low barrier to adding a model turn into a low barrier to adding chaos.

Governance gets harder before it gets better. In ETL, the small number of people who managed the transformation tool were a natural control point. In ELT, with more contributors and raw data sitting in the warehouse, you have to think deliberately about who can access what, how sensitive raw data is protected, and how you maintain a single source of truth for key metrics. Loading raw data first means sensitive data lands in the warehouse before any transformation can mask or filter it, which is a real privacy and security consideration that the ETL model handled implicitly.

The migration itself is a trap if treated as a big-bang cutover. Replacing an entire ETL system at once, while the business depends on its outputs daily, is high risk: if the new pipeline produces different numbers, trust evaporates and people revert to the old reports. The safer path is incremental, run old and new in parallel, validate that the new ELT outputs match the old ETL outputs for each data product, and cut over one piece at a time once it is verified. Skipping the parallel-run validation is how migrations quietly corrupt numbers and lose the team's confidence in the warehouse.

Migrating Without Losing Trust

Trust in data is fragile and slow to rebuild, so the migration has to protect it deliberately. The single most important practice is parallel running with reconciliation: keep the ETL pipeline producing its outputs while you build the ELT equivalent, and compare the two outputs for each data product until they match or until you understand and accept every difference. Only then do you cut over. This is slower and less satisfying than a clean switch, but it is the difference between a migration people trust and one that triggers a quiet exodus to spreadsheets.

Migrate by value and risk, not all at once. Start with a data product that is important enough to matter but contained enough to validate cleanly, prove the new approach there, and build confidence and patterns before tackling the gnarlier pipelines. Trying to move everything simultaneously maximizes risk and means that if anything goes wrong, everything is in question at once. An incremental migration lets you learn, build reusable patterns, and limit the blast radius of any mistake to one data product at a time.

Bring the institutional knowledge along. Legacy ETL pipelines usually encode years of accumulated business logic, edge-case handling, and quiet fixes that are not documented anywhere except the pipeline itself. A rewrite that does not capture that knowledge will reproduce the bugs the old pipeline learned to avoid. Before rewriting a transformation, understand what it actually does and why, including the weird parts, so the new version preserves the hard-won correctness rather than starting the learning over. The undocumented edge cases are exactly where rewrites go wrong.

Set up the cost and quality guardrails before you scale, not after. Establish the conventions, incremental models, testing, documentation, ownership, and cost monitoring while the number of transformations is small and the habits are easy to set. Retrofitting discipline onto a sprawling pile of transformations after the fact is far harder than building it in from the start. The teams that migrate well treat the migration as the moment to establish good practice, because the ELT stack will faithfully scale whatever habits you start with, good or bad.

The Tooling Stack That Replaces ETL

The modern ELT stack splits into clear layers, which is itself part of the appeal. Ingestion tools like Fivetran, Airbyte, and Stitch handle only extract and load: they connect to your sources, pull the data, and land it raw in the warehouse, with managed connectors that handle the tedious work of source APIs and schema changes. By doing one job well, they replace the extract-and-load portion of a monolithic ETL tool with something simpler and more reliable, and they free you from maintaining brittle custom extraction code.

The warehouse itself is the engine, and the choice among Snowflake, BigQuery, Databricks, and the rest shapes the rest of the stack. These platforms provide the cheap storage that makes loading raw data viable and the on-demand compute that makes in-warehouse transformation practical. The whole ELT pattern depends on the warehouse being powerful and elastic enough to absorb both the raw data and the transformation work, which is exactly the capability cloud warehouses brought and older systems lacked.

Transformation is where dbt dominates, turning the transform step into version-controlled SQL with testing, documentation, and lineage. This is the layer that most directly replaces the transformation logic that used to live in the proprietary ETL tool, and it is the piece that brings software engineering discipline to data transformation. The combination of a managed ingestion tool plus dbt on a cloud warehouse is the canonical modern data stack, and it is what most teams migrating from ETL are migrating to.

Orchestration ties the pieces together and is the layer teams most often underestimate. Something has to decide when ingestion runs, when transformations run, in what order, and what happens when a step fails, and tools like Airflow, Dagster, and Prefect, or the scheduling built into the other tools, handle this. In a monolithic ETL tool, orchestration was baked in; in the unbundled ELT stack it becomes a separate concern you have to design. Getting orchestration right, with sensible dependencies, retries, and alerting, is part of what makes the assembled stack as dependable as the integrated tool it replaced.

Best Practices

  • Run the old ETL and new ELT pipelines in parallel and reconcile their outputs before cutting over, to protect trust in the data.
  • Migrate one data product at a time by value and risk, rather than attempting a big-bang cutover of the whole system.
  • Monitor and control warehouse compute from the start, using incremental models and sensible scheduling, since ELT relocates cost into easy-to-grow warehouse usage.
  • Treat transformations as reviewed, tested, documented code with clear ownership to prevent the sprawl that ELT's low barrier invites.
  • Capture the undocumented business logic in legacy pipelines before rewriting, so the new version preserves hard-won correctness.

Common Misconceptions

  • ELT is just ETL with the steps reordered; it changes cost structure, ownership, governance, and tooling in ways that go well beyond the order of operations.
  • Migrating to ELT is mainly a tooling swap; it is partly a rewrite and a change in who owns transformation logic.
  • ELT is automatically cheaper; it relocates cost into warehouse compute, which is easy to grow and needs active management.
  • Loading raw data first has no downside; it puts sensitive data in the warehouse before any masking, which is a real governance consideration.
  • You can cut over from ETL to ELT in one move; safe migrations run in parallel and validate one data product at a time.

Frequently Asked Questions (FAQ's)

What is the actual difference between ETL and ELT?

In ETL you transform data in a separate processing layer before loading the clean result into the warehouse. In ELT you load the raw data into the warehouse first and transform it there using the warehouse's own compute. The practical consequences are that ELT keeps your raw data available for re-transformation, uses the warehouse for processing instead of a separate system, and moves transformation logic into SQL that more people can work with.

Why has ELT become the default?

Cloud data warehouses changed the economics. They scale compute on demand and store data cheaply, so loading everything raw and transforming in place became practical, where older expensive warehouses made that wasteful. Keeping raw data gives flexibility to re-transform without re-extracting, the warehouse handles the heavy SQL processing, and transformation in SQL lets analysts contribute. The modern tooling and talent pool all assume ELT, which reinforces it as the default for new stacks.

Is ELT always cheaper than ETL?

Not automatically. ELT moves cost into warehouse compute, where every transformation run consumes resources that show up on the warehouse bill and scale with how much transforming you do. It is often cheaper overall and consolidates infrastructure, but only with active cost management: incremental models that process just new data, sensible scheduling, and attention to expensive transformations. Teams that migrate and ignore warehouse compute frequently see the bill climb as their transformations multiply.

What is the biggest risk when migrating from ETL to ELT?

Losing trust in the data through a rushed cutover. If the new ELT pipeline produces different numbers than the old ETL one and you have already switched, people stop trusting the warehouse and revert to old reports or spreadsheets. The mitigation is running both in parallel and reconciling outputs for each data product until they match or every difference is understood, then cutting over incrementally. Trust is slow to rebuild, so protecting it during migration is essential.

How does ELT change who works on data transformation?

ETL transformation usually sat with data engineers managing a specialized tool. ELT moves transformation into SQL, typically with dbt, which analysts and analytics engineers can contribute to. This broadens participation and speeds development, but it requires deciding ownership, review processes, and quality standards so the openness does not turn into sprawl. The shift is the foundation of the analytics engineering role, and managing the new team dynamics is part of doing the migration well.

What role does dbt play in ELT?

dbt is the common transformation layer for ELT. It runs SQL transformations inside the warehouse and treats them as version-controlled code with built-in testing, documentation, and lineage. It is what makes the transform step of ELT disciplined rather than a pile of ad hoc SQL, enabling tests that catch broken models, reviews through pull requests, and a clear dependency graph. Most modern ELT stacks pair an ingestion tool for extract and load with dbt for transform.

Do I lose anything by keeping raw data in the warehouse?

You gain flexibility but take on a governance responsibility. Raw data lands before any transformation can mask or filter it, so sensitive fields are in the warehouse in their original form, which is a privacy and security consideration the ETL model handled implicitly by transforming first. You need deliberate access controls and handling for sensitive raw data. The flexibility of re-transforming without re-extracting is usually worth it, but the governance work is real and should not be skipped.

How should I sequence an ETL to ELT migration?

Incrementally, by value and risk. Pick a data product important enough to matter but contained enough to validate cleanly, build its ELT version, run it in parallel with the old ETL output, reconcile until they agree, then cut over. Establish conventions and cost guardrails while the footprint is small, capture the undocumented logic in legacy pipelines before rewriting, and move one product at a time. This limits risk, builds reusable patterns, and keeps the team's trust in the data intact throughout.

What does a typical modern ELT stack look like?

A managed ingestion tool such as Fivetran or Airbyte to extract and load raw data, a cloud data warehouse such as Snowflake, BigQuery, or Databricks as the engine, dbt for version-controlled SQL transformation in the warehouse, and an orchestrator such as Airflow, Dagster, or Prefect to schedule and sequence the runs. Each layer does one job, which makes the stack flexible and each piece replaceable. The orchestration layer is the one teams most often underestimate, because a monolithic ETL tool used to handle scheduling and failure recovery internally.