What Is ELT?

Definition

ELT stands for Extract, Load, Transform. It's a data integration pattern where raw data is extracted from source systems and loaded into a data warehouse as-is, then transformed inside the warehouse using SQL. ELT inverts the classic ETL (Extract, Transform, Load) pattern, which processes data before loading.

ELT became practical and economical with cloud warehouses like Snowflake, BigQuery, and Redshift that offer elastic compute and low storage costs. Instead of building expensive ETL infrastructure to process data before loading, you load raw data and transform in the warehouse. The warehouse's elastic compute scales to handle large transformation jobs. You pay only for compute consumed, making ELT cheaper at scale than maintaining separate processing infrastructure.

ELT is now the dominant pattern for modern data stacks. It's simpler operationally, more flexible, and preserves raw data for auditing and debugging. The trade-off is that data quality checks happen downstream instead of upfront, requiring discipline in transformation and clear governance.

Key Takeaways

ELT loads raw data into the warehouse first, then transforms using SQL, the opposite of ETL which transforms before loading.
Cloud warehouse compute economics make ELT cheaper than maintaining separate ETL infrastructure; you pay for compute only when transformations run.
dbt has become the standard tool for the 'T' in ELT, providing SQL templating, testing, documentation, and dependency management.
A clear staging layer (raw data) underneath base and mart tables preserves auditability and enables flexible transformation paths.
Data quality checks in ELT happen during and after transformation, requiring comprehensive tests and clear governance to prevent bad data issues.
ELT is the default for cloud data stacks; ETL still makes sense for on-prem systems or strict compliance requirements.

The Cloud Warehouse Revolution That Enabled ELT

ELT wasn't invented recently, but it became practical with the rise of cloud warehouses. Before cloud, data warehouses had fixed resources and fixed costs. A Terabyte of storage or a compute cluster cost the same whether you used it or not. You wanted to be careful about what you stored and minimize expensive warehouse compute. This made ETL economical: transform data on a cheap processing server before loading into an expensive warehouse.

Cloud warehouses changed everything. Snowflake, BigQuery, and Redshift offer elastic compute that scales on demand. A transformation job using 100 compute units for 30 minutes costs the same as one using 10 units for 5 minutes. You're billed for what you consume. Storage is also cheap, measured in cents per gigabyte annually. This economics shift made storing raw data and transforming in the warehouse more economical than building ETL infrastructure. A team might save $50K/year by eliminating a separate ETL processing server, plus eliminate operational overhead of maintaining that server.

Cost is only part of the story. Cloud warehouses also provided flexibility that ETL couldn't match. Raw data can be transformed in multiple ways for different use cases without re-extraction. Transformations can run frequently (even continuously) because warehouse compute scales to handle them. This enabled rapid iteration and experimentation that ETL's batch-focused model couldn't match.

The Three Phases of ELT

Extract pulls raw data from source systems: databases, SaaS applications, APIs, files, or streaming systems. Extraction is typically done by tools like Fivetran or Airbyte. Extract aims to get data into the warehouse as quickly and reliably as possible, with minimal processing. The extracted data retains the source's structure, column names, and data types (possibly mapped for compatibility). Change data capture is often used to make extraction efficient: only new or changed records are extracted, not the entire table every time.

Load lands raw data in the warehouse, typically in staging tables. Staging tables are usually per-source: staging_salesforce_accounts, staging_database_customers. Load might include light processing: renaming columns for clarity, casting data types to match warehouse conventions, flattening nested JSON. But the key principle is that raw data stays raw. No business logic is applied. No validation beyond type checking happens. The goal is to preserve data exactly as it came from the source, creating an audit trail and reference point for debugging.

Transform applies business logic and creates analytics-ready data. Using SQL in the warehouse, raw staging data is cleaned, validated, and restructured. Base tables apply core business logic: calculating metrics, joining related data, handling slowly changing dimensions. Mart tables aggregate for specific use cases: financial reporting, product analytics, marketing dashboards. Transformation is where dbt shines, providing structure, testing, and documentation. Transformation queries run on the warehouse's elastic compute, scaling automatically. Multiple transformation layers provide flexibility and auditability.

Why dbt Became Essential to ELT

Before dbt, writing transformation SQL was either messy (long monolithic scripts) or vendor-specific (ETL tool languages nobody else used). dbt solved this by providing a framework for writing transformations as modular, reusable, testable SQL files. You write a SELECT statement defining your transformation; dbt creates a table or view in the warehouse. dbt manages dependencies between transformations, ensuring they run in the right order. It provides macros (reusable functions), templating, and variable management.

dbt also standardized testing and documentation. You define expectations in YAML: column must not be null, values must be in a certain range, relationships must exist. dbt runs these tests when models refresh, catching regressions before bad data reaches downstream systems. dbt auto-generates documentation from your code and tests, so stakeholders can understand what each table contains and why. This reduced the need for separate documentation that gets outdated.

dbt democratized transformation. Before dbt, writing complex SQL transforms required strong SQL skills. dbt made it accessible to analysts and junior engineers. A dbt project is version-controlled code that multiple people can collaborate on. This transformed ELT from a specialized domain into a team practice. Modern data teams write dbt models like software engineers write code: modular, tested, documented, reviewed.

Staging, Base, and Mart Layers

ELT organizations structure warehouses in layers, each serving a purpose. The staging layer contains raw data extracted from sources. Staging tables are named systematically (stg_* or raw_*) and documented clearly. Staging is the source of truth and audit trail. If data is wrong downstream, you trace back to staging to compare. Staging data is intentionally minimal: maybe renamed columns and type casting, but no business logic. Staging stays raw so you can always reference original source data.

The base (or intermediate) layer applies core business logic. A base_customers table might combine staging data from multiple sources, handle slowly changing dimensions, join in reference data, and calculate basic metrics. Base tables are the building blocks. They're not specific to one use case; they're reusable foundations. Most analysts and downstream systems query base tables, not raw staging data.

The mart layer aggregates and shapes data for specific use cases. A revenue_mart might contain pre-aggregated metrics for the finance team. A product_analytics_mart contains metrics for product decisions. Marts are optimized for specific queries: they contain joins and aggregations that are expensive to recalculate repeatedly. Marts are often the final layer that business users query through BI tools. This layering provides separation of concerns: engineers manage staging and base layers, analysts can define marts for their domains.

ELT vs. ETL: Trade-offs and When to Use Each

ELT is simpler operationally. One platform (the warehouse) handles everything: storage and transformation. You don't manage separate processing servers. You don't have complex ETL tool configurations. dbt and Airflow are simpler than enterprise ETL platforms. ELT is cheaper. Cloud warehouse compute is elastic and billed per use. You don't pay for a 24/7 processing server that's idle most of the time. Storage is cheap. The combined cost of warehouse + dbt is often less than maintaining separate ETL infrastructure.

ELT is more flexible. The same raw data can be transformed multiple ways for different use cases without re-extraction. Transformations can run frequently (even continuously) because warehouse compute scales automatically. Debugging is easier because raw data is available for investigation. ELT is lower operational overhead: fewer systems, less specialized knowledge required.

The trade-offs are real. Data quality checks in ELT happen downstream instead of upfront. Bad data can enter the warehouse. You need comprehensive tests and clear governance to catch issues. Your warehouse stores more raw data, requiring more storage and documentation about sources. Query performance might suffer if you query raw data before transforming. ETL ensures quality upfront and provides a cleaner warehouse, but with higher operational complexity. For most modern organizations, ELT's benefits outweigh the trade-offs. However, compliance-heavy industries and on-prem systems often prefer ETL's upfront quality control.

Challenges and Governance in ELT

Without clear conventions, ELT warehouses become messy. Raw data, transformed data, and intermediate tables all sit together. Users get confused about which tables are safe to query. Some query raw data and get wrong results. Some accidentally join tables from different refresh cycles. Clear naming conventions are essential. Prefixes like stg_*, base_*, and mart_* make it obvious what each table is. Documentation (which dbt provides) helps users understand what they're querying. But this requires discipline and must be enforced during code review.

Data quality is harder to guarantee upfront in ELT. Bad data can silently flow through staging into base and mart tables. You need comprehensive tests at multiple layers. dbt testing helps, but it requires discipline. Teams often skip tests and regret it when bugs reach production. Testing should be automated and enforced: tests run automatically when models refresh, and failures block the pipeline. Quality tools like Great Expectations add additional monitoring and alerting for unexpected data patterns.

Transformation explosion is another challenge. Without clear governance, teams write hundreds of transformation layers, each with unclear purpose. Complex lineage and circular dependencies emerge. A table might depend on another table that depends on the first, creating complexity and fragility. Governance means establishing clear data ownership, defining what transformations are allowed, and requiring code review. dbt's lineage visualization helps understand relationships, but governance requires people and processes, not just tools.

Best Practices

Establish clear layer naming conventions (stg_*, base_*, mart_*) so users understand what each table is and which to query.
Keep staging minimally processed, preserving raw data exactly as it came from source systems for auditing and debugging.
Write comprehensive tests in dbt to catch logic regressions and data quality issues before they reach downstream systems.
Document transformations thoroughly using dbt documentation and comments so other engineers understand the logic and can maintain or debug it.
Use version control for all transformation code, requiring code review and enabling rollback if issues are discovered.

Common Misconceptions

ELT means you don't need to worry about data quality because the warehouse handles it automatically.
You can build an ELT stack by just dumping raw data in a warehouse without any transformation framework or governance.
dbt is a data warehouse, when it's actually a transformation tool that runs on top of a warehouse like Snowflake.
ELT is always better than ETL regardless of your infrastructure, compliance needs, or organizational structure.
Transformation code in ELT doesn't need testing or documentation like software code because it's "just SQL".

What Is ELT?

Definition

Key Takeaways

The Cloud Warehouse Revolution That Enabled ELT

The Three Phases of ELT

Why dbt Became Essential to ELT

Staging, Base, and Mart Layers

ELT vs. ETL: Trade-offs and When to Use Each

Challenges and Governance in ELT

Best Practices

Common Misconceptions

Frequently Asked Questions (FAQ's)

What is ELT?

Why do cloud warehouses enable ELT?

What role does dbt play in ELT?

What are the trade-offs between ELT and ETL?

What governance challenges does ELT create?

What is staging in ELT?

When should you choose ELT over ETL?

What is the 'raw data stays raw' principle in ELT?

What tools support ELT workflows?

How do you structure a data warehouse for ELT?