ELT stands for Extract, Load, Transform. It's a data integration pattern where raw data is extracted from source systems and loaded into a data warehouse as-is, then transformed inside the warehouse using SQL. ELT inverts the classic ETL (Extract, Transform, Load) pattern, which processes data before loading.
ELT became practical and economical with cloud warehouses like Snowflake, BigQuery, and Redshift that offer elastic compute and low storage costs. Instead of building expensive ETL infrastructure to process data before loading, you load raw data and transform in the warehouse. The warehouse's elastic compute scales to handle large transformation jobs. You pay only for compute consumed, making ELT cheaper at scale than maintaining separate processing infrastructure.
ELT is now the dominant pattern for modern data stacks. It's simpler operationally, more flexible, and preserves raw data for auditing and debugging. The trade-off is that data quality checks happen downstream instead of upfront, requiring discipline in transformation and clear governance.
ELT wasn't invented recently, but it became practical with the rise of cloud warehouses. Before cloud, data warehouses had fixed resources and fixed costs. A Terabyte of storage or a compute cluster cost the same whether you used it or not. You wanted to be careful about what you stored and minimize expensive warehouse compute. This made ETL economical: transform data on a cheap processing server before loading into an expensive warehouse.
Cloud warehouses changed everything. Snowflake, BigQuery, and Redshift offer elastic compute that scales on demand. A transformation job using 100 compute units for 30 minutes costs the same as one using 10 units for 5 minutes. You're billed for what you consume. Storage is also cheap, measured in cents per gigabyte annually. This economics shift made storing raw data and transforming in the warehouse more economical than building ETL infrastructure. A team might save $50K/year by eliminating a separate ETL processing server, plus eliminate operational overhead of maintaining that server.
Cost is only part of the story. Cloud warehouses also provided flexibility that ETL couldn't match. Raw data can be transformed in multiple ways for different use cases without re-extraction. Transformations can run frequently (even continuously) because warehouse compute scales to handle them. This enabled rapid iteration and experimentation that ETL's batch-focused model couldn't match.
Extract pulls raw data from source systems: databases, SaaS applications, APIs, files, or streaming systems. Extraction is typically done by tools like Fivetran or Airbyte. Extract aims to get data into the warehouse as quickly and reliably as possible, with minimal processing. The extracted data retains the source's structure, column names, and data types (possibly mapped for compatibility). Change data capture is often used to make extraction efficient: only new or changed records are extracted, not the entire table every time.
Load lands raw data in the warehouse, typically in staging tables. Staging tables are usually per-source: staging_salesforce_accounts, staging_database_customers. Load might include light processing: renaming columns for clarity, casting data types to match warehouse conventions, flattening nested JSON. But the key principle is that raw data stays raw. No business logic is applied. No validation beyond type checking happens. The goal is to preserve data exactly as it came from the source, creating an audit trail and reference point for debugging.
Transform applies business logic and creates analytics-ready data. Using SQL in the warehouse, raw staging data is cleaned, validated, and restructured. Base tables apply core business logic: calculating metrics, joining related data, handling slowly changing dimensions. Mart tables aggregate for specific use cases: financial reporting, product analytics, marketing dashboards. Transformation is where dbt shines, providing structure, testing, and documentation. Transformation queries run on the warehouse's elastic compute, scaling automatically. Multiple transformation layers provide flexibility and auditability.
Before dbt, writing transformation SQL was either messy (long monolithic scripts) or vendor-specific (ETL tool languages nobody else used). dbt solved this by providing a framework for writing transformations as modular, reusable, testable SQL files. You write a SELECT statement defining your transformation; dbt creates a table or view in the warehouse. dbt manages dependencies between transformations, ensuring they run in the right order. It provides macros (reusable functions), templating, and variable management.
dbt also standardized testing and documentation. You define expectations in YAML: column must not be null, values must be in a certain range, relationships must exist. dbt runs these tests when models refresh, catching regressions before bad data reaches downstream systems. dbt auto-generates documentation from your code and tests, so stakeholders can understand what each table contains and why. This reduced the need for separate documentation that gets outdated.
dbt democratized transformation. Before dbt, writing complex SQL transforms required strong SQL skills. dbt made it accessible to analysts and junior engineers. A dbt project is version-controlled code that multiple people can collaborate on. This transformed ELT from a specialized domain into a team practice. Modern data teams write dbt models like software engineers write code: modular, tested, documented, reviewed.
ELT organizations structure warehouses in layers, each serving a purpose. The staging layer contains raw data extracted from sources. Staging tables are named systematically (stg_* or raw_*) and documented clearly. Staging is the source of truth and audit trail. If data is wrong downstream, you trace back to staging to compare. Staging data is intentionally minimal: maybe renamed columns and type casting, but no business logic. Staging stays raw so you can always reference original source data.
The base (or intermediate) layer applies core business logic. A base_customers table might combine staging data from multiple sources, handle slowly changing dimensions, join in reference data, and calculate basic metrics. Base tables are the building blocks. They're not specific to one use case; they're reusable foundations. Most analysts and downstream systems query base tables, not raw staging data.
The mart layer aggregates and shapes data for specific use cases. A revenue_mart might contain pre-aggregated metrics for the finance team. A product_analytics_mart contains metrics for product decisions. Marts are optimized for specific queries: they contain joins and aggregations that are expensive to recalculate repeatedly. Marts are often the final layer that business users query through BI tools. This layering provides separation of concerns: engineers manage staging and base layers, analysts can define marts for their domains.
ELT is simpler operationally. One platform (the warehouse) handles everything: storage and transformation. You don't manage separate processing servers. You don't have complex ETL tool configurations. dbt and Airflow are simpler than enterprise ETL platforms. ELT is cheaper. Cloud warehouse compute is elastic and billed per use. You don't pay for a 24/7 processing server that's idle most of the time. Storage is cheap. The combined cost of warehouse + dbt is often less than maintaining separate ETL infrastructure.
ELT is more flexible. The same raw data can be transformed multiple ways for different use cases without re-extraction. Transformations can run frequently (even continuously) because warehouse compute scales automatically. Debugging is easier because raw data is available for investigation. ELT is lower operational overhead: fewer systems, less specialized knowledge required.
The trade-offs are real. Data quality checks in ELT happen downstream instead of upfront. Bad data can enter the warehouse. You need comprehensive tests and clear governance to catch issues. Your warehouse stores more raw data, requiring more storage and documentation about sources. Query performance might suffer if you query raw data before transforming. ETL ensures quality upfront and provides a cleaner warehouse, but with higher operational complexity. For most modern organizations, ELT's benefits outweigh the trade-offs. However, compliance-heavy industries and on-prem systems often prefer ETL's upfront quality control.
Without clear conventions, ELT warehouses become messy. Raw data, transformed data, and intermediate tables all sit together. Users get confused about which tables are safe to query. Some query raw data and get wrong results. Some accidentally join tables from different refresh cycles. Clear naming conventions are essential. Prefixes like stg_*, base_*, and mart_* make it obvious what each table is. Documentation (which dbt provides) helps users understand what they're querying. But this requires discipline and must be enforced during code review.
Data quality is harder to guarantee upfront in ELT. Bad data can silently flow through staging into base and mart tables. You need comprehensive tests at multiple layers. dbt testing helps, but it requires discipline. Teams often skip tests and regret it when bugs reach production. Testing should be automated and enforced: tests run automatically when models refresh, and failures block the pipeline. Quality tools like Great Expectations add additional monitoring and alerting for unexpected data patterns.
Transformation explosion is another challenge. Without clear governance, teams write hundreds of transformation layers, each with unclear purpose. Complex lineage and circular dependencies emerge. A table might depend on another table that depends on the first, creating complexity and fragility. Governance means establishing clear data ownership, defining what transformations are allowed, and requiring code review. dbt's lineage visualization helps understand relationships, but governance requires people and processes, not just tools.
ELT stands for Extract, Load, Transform. It's a data integration pattern where raw data is extracted from source systems, loaded into a data warehouse as-is, and then transformed inside the warehouse using SQL. ELT inverts the order of the classic ETL (Extract, Transform, Load) pattern.
Extract pulls raw data from sources without processing. Load lands the raw data in the warehouse exactly as it came from the source. Transform happens afterward, using SQL queries in the warehouse to clean, validate, and restructure data into analytics-ready form.
ELT became practical and economical with cloud warehouses (Snowflake, BigQuery, Redshift) that offer elastic compute and low storage costs. It's now the dominant pattern for modern data stacks because it's simpler, cheaper, and more flexible than traditional ETL approaches.
Cloud warehouses made ELT economically viable by making compute abundant and cheap. Before cloud, warehouses had fixed resources and fixed costs. You wanted to transform before loading to minimize expensive warehouse compute. Cloud warehouses scale compute elastically and charge only for what you consume.
Snowflake and BigQuery charge only for compute consumed. If transformation runs for 30 minutes, you pay for 30 minutes. If it runs for 5 minutes, you pay for 5 minutes. This changes the economics: instead of building and maintaining expensive ETL infrastructure, you load raw data and transform in the warehouse. A team might save tens of thousands annually by eliminating a separate ETL server plus operational overhead.
Cloud warehouses also provide flexibility. Raw data can be transformed in multiple ways for different purposes without re-extraction. Transformations can run frequently because warehouse compute scales automatically. This enables rapid iteration and experimentation that ETL's batch-focused model couldn't match.
dbt (data build tool) is the de facto standard for the 'T' in ELT. It lets you write transformations as modular, testable SQL files instead of monolithic scripts or proprietary tool languages. dbt manages dependencies between transformations, ensures they run in the right order, and tests results.
You write SELECT statements defining your transformed data; dbt creates tables or views in the warehouse. dbt provides macros (reusable SQL functions), testing frameworks, and documentation generation. It's made SQL-based transformation accessible to analysts and junior engineers, not just specialists. Before dbt, writing transformation logic required either raw SQL (complex and hard to maintain) or proprietary ETL tool languages.
dbt democratized ELT and made transformation code maintainable like software code. Teams version control dbt projects, review changes, and test code before merging. This transformed transformation from a specialized domain into a team practice
ELT is simpler operationally (one platform handles everything) and cheaper (cloud compute is elastic). You also preserve raw data and can transform flexibly. The trade-offs are that data quality checks happen downstream instead of upfront, so bad data can enter the warehouse and might need cleaning. Your warehouse stores more raw data, which uses storage and might require more documentation about source data.
Query performance might be affected if you're querying raw data before transforming. ETL ensures data quality upfront and provides a cleaner warehouse, but requires maintaining separate processing infrastructure and has higher operational complexity. For most modern organizations, the benefits of ELT (simplicity, cost, flexibility) outweigh the trade-offs (downstream quality handling).
However, compliance-heavy industries sometimes prefer ETL for its quality guarantees and centralized control. The choice depends on your infrastructure, industry, and organizational preferences. Most large modern organizations use both: ELT for cloud analytics, ETL for on-prem or compliance-heavy systems.
ELT stores raw data alongside transformed data in the same warehouse. This can confuse users: which tables are raw? Which are ready for analysis? Without clear naming conventions (e.g., staging_*, base_*, mart_*) and documentation, the warehouse becomes a mess. Users might accidentally query raw data and get wrong results.
Quality standards are less clear: in ETL, the warehouse stores only validated data. In ELT, raw data is present, so you need discipline in transformation to ensure clean downstream tables. Data lineage becomes important: you need to track which raw tables feed which transforms to ensure data accuracy and enable debugging. Tools like dbt help by documenting and testing transformations, but governance requires discipline and clear standards.
Without governance, transformation explosion can occur: hundreds of layers with unclear purpose, complex lineage, circular dependencies. This requires people and processes, not just tools. Clear data ownership, defined what transformations are allowed, and code review prevent these issues.
Staging is the raw data layer in ELT. When data is extracted and loaded, it lands in staging tables with minimal processing, perhaps renamed columns for clarity, but no business logic. Staging tables keep the data exactly as it came from the source. On top of staging, you build base tables and marts through transformation.
Staging is the source of truth for raw data, essential for auditing and debugging. If you discover that a downstream table has wrong data, you can trace back to staging and compare to understand where the logic broke. Staging also preserves the original data for compliance: some regulations require keeping raw data for a certain period.
The key principle is 'raw data stays raw.' Once data is extracted and loaded into staging, it should never be changed. Instead of modifying raw data, you create new tables on top of it. This preserves the audit trail and flexibility to create new transformations without re-extracting.
Choose ELT for cloud-based data warehouses (Snowflake, BigQuery, Redshift) and modern data stacks. ELT is simpler, cheaper, and more flexible. It's the default choice for most organizations today. Choose ELT when you want to preserve raw data for auditing and debugging, or when the same source data needs multiple transformation paths.
Choose ELT when you have good SQL resources in your organization or can use dbt to manage transformation code. Choose ETL for on-premise warehouses where compute is fixed and expensive. Choose ETL when you have strict compliance requirements that mandate validated data at the load boundary. Choose ETL when transformations are complex and better expressed in Python or other languages than SQL.
Choose ETL when you need centralized data quality control and want the warehouse to contain only validated data. Most large modern organizations use both: ELT for cloud analytics, ETL for on-prem or compliance-heavy systems.
Raw data stays raw is a core principle in ELT: once data is extracted and loaded, it should never be changed. Raw data is the source of truth, the audit trail, the reference point for debugging. If you modify raw data, you lose the ability to audit or understand what actually came from the source system.
Instead of modifying raw data, you create new tables on top of it: staging tables add light processing (renaming, type casting), base tables apply business logic, mart tables aggregate for specific use cases. This layering preserves the audit trail and flexibility. When a downstream table produces wrong results, you can trace back through the layers to find where the logic broke.
This principle enables confidence in data. Users trust that raw data is exactly what came from the source and hasn't been altered. Transformations are explicit and visible, making it easy to understand and debug logic.
dbt is the standard transformation tool for ELT, providing SQL templating, testing, documentation, and dependency management. Fivetran and Airbyte handle extraction and loading of raw data. Snowflake, BigQuery, and Redshift are the primary cloud warehouse platforms. Apache Airflow orchestrates the workflow: triggers Fivetran/Airbyte to extract, then dbt to transform.
Some organizations use Prefect or Dagster for orchestration instead of Airflow. Great Expectations or Soda handle data quality testing. The typical ELT stack is Fivetran (extract/load) + dbt (transform) + Snowflake (warehouse) + Airflow (orchestration) + Great Expectations (quality). This is the modern default. Legacy ETL tools like Informatica or Talend don't fit well in ELT workflows.
The beauty of ELT is that each tool focuses on one job well. You can swap components: use Airbyte instead of Fivetran, Prefect instead of Airflow, Soda instead of Great Expectations. The ecosystem is flexible.
A common ELT structure has layers: staging (raw data, renamed but unprocessed), base or intermediate (business logic applied, cleaned), and mart (aggregated for specific use cases or reporting). Some add a landing layer before staging. Schema naming helps clarity: stg_* for staging, base_* or fct_/dim_* for business-ready tables.
Each layer has a purpose and audience. Raw/staging is for engineers and analysts doing deep investigation. Base/intermediate tables are for analysts building reports. Marts are for business users in BI tools. This layering provides flexibility: you can change mart definitions without touching raw data. You can debug by tracing through layers. You can add new marts on top of base tables without re-processing raw data.
Most organizations also create a separate schema per source (schema: salesforce contains stg_salesforce_accounts, stg_salesforce_contacts) and shared base/mart schemas. This makes it obvious where data comes from and prevents accidental mixing of raw data from different sources.