LS LOGICIEL SOLUTIONS
Toggle navigation

ETL: Real Examples & Use Cases

Definition

ETL stands for Extract, Transform, Load: the pattern of pulling data out of a source system, applying transformations on a separate processing layer, and loading the transformed results into a target system. The transformations happen before the load, so the target system receives data already in the shape downstream consumers want. Real examples reveal which workloads still genuinely call for ETL ordering, what the legacy ETL tools look like in production, and how the rise of cloud warehouses pushed many workloads toward the ELT inversion that the next page covers.

The pattern dominated for decades when the target system (typically a warehouse) lacked the compute to handle transformations efficiently. A dedicated transformation server (Informatica, Talend, DataStage, SSIS) read from source systems, ran complex transformations through specialized engines, and wrote the results to the warehouse. The warehouse stored the modeled output and served queries; transformation lived elsewhere. The architecture worked because warehouse compute was expensive and scarce while transformation engines could be sized independently.

The category in 2026 has a split identity. Legacy ETL platforms (Informatica, IBM DataStage, Microsoft SSIS, Talend) still run in many enterprises with significant historical investment, particularly in financial services, insurance, healthcare, and manufacturing. New builds usually pick ELT, but some workloads still genuinely benefit from ETL ordering: legacy targets that cannot absorb raw data, regulatory contexts that require pre-load validation, or sources too sensitive to land raw in the target.

What separates ETL from ELT is when the transformation happens relative to the load. ETL transforms before loading. ELT loads first, transforms in the target. The choice is not theological; it is an engineering decision about where the transformation work runs and what state the target sees. The ETL ordering still wins for specific cases even as the overall trend favors ELT.

This page surveys real ETL implementations across enterprise data integration, legacy warehouse loading, and the specific use cases where ETL ordering still makes sense in modern stacks. Tooling evolves; the pattern's strengths and weaknesses are stable enough to evaluate against any new workload.

Key Takeaways

  • ETL extracts data from sources, transforms it on a separate processing layer, and loads the results into a target.
  • The pattern dominated the warehouse era when target systems lacked compute for in-place transformation.
  • Legacy ETL platforms (Informatica, DataStage, Talend, SSIS) still run extensive enterprise workloads.
  • New builds usually prefer ELT, but ETL ordering still fits specific cases like legacy targets and pre-load validation requirements.
  • Modern ETL tooling has converged with broader data integration platforms rather than remaining a distinct category.

Legacy ETL Platforms in Production

Informatica PowerCenter and its cloud successor Informatica IDMC are the most-deployed enterprise ETL platforms. Customers include thousands of large enterprises across financial services, insurance, healthcare, and manufacturing. The platform handles complex source-to-target mappings, supports many connector types, and provides governance features mature enterprises depend on. The vendor has pivoted toward cloud and AI-assisted data integration; the on-premise PowerCenter installations continue to run substantial production loads.

IBM DataStage (now InfoSphere DataStage) serves similar enterprise customers, particularly in banking, insurance, and government. The platform's parallel processing engine handles very large data volumes and has been refined over decades. Migration off DataStage is a multi-year project for the customers who try it; many simply continue running their existing implementations.

Microsoft SQL Server Integration Services (SSIS) is the default for Microsoft-aligned enterprises. SSIS packages run countless production ETL jobs for SQL Server warehouses. The tool's tight integration with SQL Server and Visual Studio makes it the natural choice for teams already committed to Microsoft tooling. Adoption of Azure Data Factory has displaced some SSIS workloads but SSIS persists where teams have invested heavily in existing packages.

Talend (acquired by Qlik) provides both open-source and commercial ETL tools. The open-source Talend Open Studio has broad adoption; the commercial platform competes with Informatica in enterprise deployments. Adoption has been particularly strong in European markets and at mid-market enterprises.

Oracle Data Integrator (ODI) serves Oracle-heavy environments. The product's ELT-style execution (pushing transformation into the Oracle database) blurs the ETL/ELT line, but the product is classically positioned as an ETL platform. Many Oracle Exadata customers run ODI for their warehouse loading.

Smaller specialized ETL tools exist for specific niches: pentaho/Kettle for open-source workflows, Matillion for cloud warehouse ETL, Stitch (now part of Talend) for SaaS ingestion. Each has its niche; the category overall has commoditized as the major platforms have absorbed the patterns.

Use Cases Where ETL Ordering Still Wins

Legacy targets that cannot absorb raw data. Mainframe systems, legacy operational stores, or old warehouses that lack the compute to handle in-target transformation. The ETL pattern shapes data outside the target so the target only sees what it can handle.

Regulatory contexts requiring pre-load validation. Some compliance regimes require that data be validated, sanitized, or de-identified before it enters the target system. The pre-load transformation provides a control point that auditors can verify. After-the-fact transformation in the target makes the audit story harder.

Sensitive sources that should not land raw in the target. PII data that needs masking or tokenization before being loaded into the target where many users have access. ETL applies the protection before the load; ELT would require careful access control to prevent exposure between load and transformation.

Data quality enforcement at the boundary. Pre-load validation rejects bad data before it pollutes the target. The pattern fits operational integration use cases where bad data must not propagate. The rejected data goes to a dead-letter queue or quarantine for investigation.

Specialized transformation engines with capabilities the target lacks. ETL platforms historically had transformation primitives (lookups, complex joins, business rule engines) that simple warehouses could not match. Modern warehouses have closed much of this gap, but for some legacy or specialized targets the ETL engine still does work the target cannot.

How ETL Pipelines Are Structured

The extract phase pulls data from source systems through connectors. The ETL platform maintains hundreds of connector types for databases, APIs, file formats, mainframe data, and SaaS sources. Connectors handle authentication, pagination, change detection, and the format conversion from source-native to the platform's internal representation.

The transform phase applies business logic on the platform's processing engine. Joins across sources. Lookups against reference tables. Aggregations and calculations. Format conversions. Business rule applications. The transformation graph can be complex; enterprise ETL implementations sometimes have transformations with hundreds of stages.

The load phase writes results to the target. The target receives data already in the shape it needs; no further transformation happens in the target. Loaders handle batch sizing, parallelism, error handling, and the specific protocols of each target type.

Orchestration coordinates the phases and handles dependencies. Source A must extract before Target B can load. Failed extracts trigger retries. Successful loads trigger downstream pipeline jobs. The orchestration layer is the operational heart of the ETL system and where most of the operational complexity lives.

Metadata tracking captures what ran, what it produced, and how the lineage connects across the pipeline. The ETL platforms have rich metadata layers that support governance, impact analysis, and operational troubleshooting. The metadata is often the most valuable artifact the platform produces beyond the actual data movement.

Operational Patterns

Batch windows define when ETL jobs run. Most enterprise ETL runs in overnight batch windows when source systems have lower load and target systems can absorb the writes without affecting daytime users. The window is finite; pipelines that overrun cause cascading delays for everything downstream.

Restart points let pipelines resume from intermediate states after failures. A complex ETL job that fails halfway through does not need to re-extract from the source; it resumes from the last completed stage. The pattern matters because some extracts are expensive and re-running them stresses source systems.

Change data capture techniques (not always called CDC in ETL contexts) identify what changed since the last run. Watermarks track the last processed timestamp; high-watermark queries pull only new or modified records. The pattern reduces extract volume dramatically compared to full reloads.

Error handling routes bad records to quarantine for investigation. The pipeline continues processing good records rather than halting on the first bad one. The pattern is essential at scale where some bad records are inevitable and stopping the pipeline on every one would mean it never completes.

Reconciliation jobs verify that loads produced expected results. Row counts, sum checks, key matching against source. The verifications catch data quality issues the transformation did not anticipate. The pattern is mature in regulated industries where load-correctness has to be provable.

Migration Patterns

Enterprises migrating off legacy ETL usually move to one of three destinations. Cloud warehouse with ELT (Snowflake plus dbt is the most common pattern). Modern ETL/ELT platforms (Matillion, Fivetran plus dbt, Coalesce). Custom Python or Spark pipelines on managed orchestrators (Airflow, Dagster, Prefect).

The migration is rarely fast. Years of accumulated transformations encode business logic that is not easily ported. Some transformations have no clear modern equivalent and need to be re-architected. Vendor-specific features used heavily in the old platform have no portable replacement.

Coexistence patterns are common during migration. New pipelines build in the modern platform; legacy pipelines continue running on the old platform. The two platforms coexist for years. Eventually the legacy platform shrinks to the few pipelines that have not been migrated.

Re-architecting rather than porting often produces better results. Direct translation from Informatica to dbt loses the benefits of the modern platform. Re-architecting from business requirements to the modern platform takes longer but produces maintainable pipelines.

Vendor consolidation has accelerated migrations. Informatica, IBM, and Microsoft have all been adjusting their pricing and product directions to retain customers; some migrations are driven by these pricing changes as much as by technical need.

Common Failure Modes

Batch window overruns that cascade into business-hour disruptions. The pipeline takes longer than allotted; downstream pipelines start late; reports are delayed; users complain. The fix is window monitoring, performance budgets per pipeline, and refactoring of the worst offenders.

Source system stress from heavy extract jobs. The ETL pulls so hard from production databases that operational queries slow down. The fix is read replicas, throttled extracts, or CDC-based incremental extraction that avoids full scans.

Brittle transformations that break with any source change. A new column appears; an existing column changes type; the transformation fails. The fix is defensive transformation design plus source contracts that prevent uncoordinated changes.

Lost lineage when complex transformations span multiple stages. The team cannot trace a target column back to its source through twenty transformations. The fix is investment in the platform's metadata features and discipline about documentation.

Knowledge concentration in a few specialists. Legacy ETL platforms require specialized skill; only a few people know the system; their departure creates risk. The fix is documentation, cross-training, and gradual migration toward more accessible modern tooling.

Best Practices

  • Use ETL ordering when pre-load validation, transformation, or de-identification is required for governance reasons.
  • Design pipelines with restart points so failures do not require full re-runs of expensive extracts.
  • Implement change data capture or watermark patterns to avoid full reloads when sources are large.
  • Build reconciliation jobs that verify load correctness; do not assume the pipeline ran correctly just because it did not error.
  • Plan migration off legacy ETL platforms as multi-year programs; do not expect quick translations to produce maintainable results.

Common Misconceptions

  • ETL is obsolete; the pattern still fits specific use cases, even though ELT has displaced it for most modern warehouse loading.
  • ETL and ELT are interchangeable; the ordering matters for what intermediate states the target sees and where transformation logic lives.
  • Cloud warehouses eliminated the need for ETL platforms; legacy enterprises still run substantial ETL workloads that cannot easily be replaced.
  • All ETL is batch; many ETL platforms now support micro-batch and streaming-style operations.
  • ETL is just data movement; the transformation logic and governance metadata are usually the more valuable parts of the platform.

Frequently Asked Questions (FAQ's)

When does ETL still beat ELT?

When pre-load validation, transformation, or de-identification is required for governance. When the target cannot handle raw data or transformation workloads. When sensitive sources should not land in the target without protection applied first. Most other modern workloads do better with ELT.

Should I migrate off my legacy ETL platform?

Probably eventually, but the timing and approach matter. The cost of staying includes vendor lock-in, skill scarcity, and the gap between legacy capabilities and what modern platforms enable. The cost of migrating includes years of effort and the risk of breaking working production. Make the decision with eyes open on both sides.

What modern platform replaces legacy ETL?

The most common replacements are cloud warehouses plus dbt for transformations, ingestion tools like Fivetran or Airbyte for the extract-and-load part, and orchestrators like Airflow or Dagster for coordination. The combination is more modular than a single legacy platform but provides equivalent functionality.

Can I do ETL on a cloud warehouse?

Yes. The ordering does not require a separate transformation server. You can extract from sources, transform in a staging area, and load into final targets all within the same cloud warehouse. The pattern fits when the warehouse has both the compute for transformation and the access control for separating staging from final targets.

How do I handle source schema changes in ETL?

With defensive design and producer contracts. The transformation should not assume the source schema is fixed; it should handle additions gracefully and fail loudly on removals or type changes that would break downstream logic. Producer contracts make schema changes coordinated rather than surprising.

What is the role of orchestration in ETL?

Orchestration coordinates the extract, transform, and load phases plus dependencies between pipelines. Legacy ETL platforms include orchestration; modern decoupled stacks use separate orchestrators (Airflow, Dagster, Prefect, Argo). The orchestration layer is where most operational complexity lives in either approach.

How do I handle data quality in ETL?

With validation rules in the transformation stage that reject bad data before it loads. Bad records go to a quarantine or dead-letter location for investigation. The pattern is mature in regulated industries; the validation runs as part of every ETL job execution.

What about real-time ETL?

ETL is traditionally batch; real-time variants exist but blur into streaming data integration patterns. The ordering (transform before load) still applies; the latency just gets shorter. Modern streaming platforms (Kafka, Flink, Spark Streaming) handle this pattern when it is needed.

Where is ETL heading?

Toward continued legacy installations in enterprises with large existing investments. Toward absorption of the pattern into broader data integration platforms rather than remaining a distinct category. Toward AI-assisted pipeline development that reduces the specialist skill requirements of legacy ETL tools. The pattern itself is not going away; the bright-line distinction between ETL and other integration approaches is fading.