The Format War Has Largely Settled
The open table format conversation in 2022 was loud. Three formats competed for the lakehouse standard position. By 2026, the conversation is quieter because the formats have differentiated into recognizable use cases. The choice between them is more nuanced and less philosophical than the early debates suggested.
A principal data engineer at a streaming media company described her team's 2024 evaluation to me this way: "We thought we were picking the best format. We were actually picking the format that fit how our team operates. Once we understood that, the choice was obvious." Her team picked Iceberg. A different team at a different company would have picked Delta or Hudi for equally legitimate reasons.
The formats are not interchangeable, and they are not strictly better or worse than each other. They fit different workload patterns and team operations. Knowing the patterns is more useful than knowing which format won which benchmark.

What Each Format Actually Is
Apache Iceberg emerged from Netflix's data platform work. It is now an Apache top-level project with broad community contribution. Iceberg is engine-agnostic by design. Spark, Trino, Flink, Snowflake, BigQuery, and several others read and write Iceberg tables. The format's strength is portability across query engines.
Iceberg's catalog architecture supports multiple catalog implementations (Hive, AWS Glue, REST, Polaris). Tables are referenced through the catalog, which maintains the current state. The format handles schema evolution, partition evolution, and time travel through metadata files.
Delta Lake originated at Databricks and is now also open-source. Delta is tightly integrated with Spark but has been extending to other engines over time. The format's strength is the depth of Spark ecosystem support and the maturity that comes from operating at Databricks scale.
Delta's transaction log structure handles ACID properties through ordered log entries. Schema evolution and time travel work through log replay. The format has specific features for change data feed and merge operations that fit common analytical patterns.
Apache Hudi emerged from Uber's data platform. It is also Apache top-level and open-source. Hudi's strength is upsert and incremental processing. The format was designed for use cases where data is constantly being updated rather than appended.
Hudi has two table types: Copy-on-Write for read-optimized workloads and Merge-on-Read for write-optimized workloads. The choice affects performance characteristics significantly. Hudi also has built-in features for clustering, compaction, and cleaning that fit operational data pipelines.
When Iceberg Fits
Iceberg fits when engine portability matters. Organizations that use multiple query engines (Spark for batch, Trino for ad-hoc, Flink for streaming) benefit from a single format that all engines read consistently.
Iceberg also fits when the data architecture has many catalogs or might evolve to add more. The pluggable catalog architecture handles multi-catalog scenarios cleanly. Organizations on AWS that use Glue, on Databricks that use Unity Catalog, and operating cross-cloud that need REST catalogs all benefit from Iceberg's catalog flexibility.
Iceberg fits when partition evolution matters. The format handles changing partition schemes over time without rewriting historical data. Platforms that have outgrown their original partitioning benefit from this without the heavy migrations that other formats require.
Iceberg fits less well when the team is heavily Spark-centric and uses Databricks. Delta's tighter Databricks integration produces a smoother experience for that specific stack. Choosing Iceberg in pure-Databricks environments adds friction without clear benefit unless multi-engine portability is on the roadmap.
The Prior Authorization AI Trap
What the 16x denial rate finding means for engineering teams building PA automation.
When Delta Fits
Delta fits when the team operates primarily on Databricks. The platform's integration is deep. Features ship in Delta first. Operational tooling assumes Delta. The path of least resistance produces good outcomes.
Delta fits when change data feed is core to the workload. The format's built-in change data capture support handles incremental processing cleanly. Workflows that depend on CDC patterns (downstream syncs, real-time analytics, audit) benefit from this.
Delta fits when the team needs Liquid Clustering or other Databricks-specific advanced features. The features are real and useful. They work well within the Databricks ecosystem.
Delta fits less well when engine portability is a near-term requirement. Other engines support Delta but with less depth than they support Iceberg. The Databricks-first nature is a feature for Databricks users and a limitation for multi-engine architectures.
When Hudi Fits
Hudi fits when upsert patterns dominate. Workloads where data is constantly being updated rather than appended (customer profiles, account state, slowly-changing dimensions) benefit from Hudi's design for this case. The other formats handle upserts; Hudi handles them with less operational overhead.
Hudi fits when the workload needs fine-grained operational control. The format exposes more configuration than Iceberg or Delta. Teams that want to tune compaction, clustering, and indexing carefully have more levers in Hudi.
Hudi fits when the read pattern can tolerate Merge-on-Read overhead. The Merge-on-Read table type handles high-write workloads efficiently but adds read-time merge cost. For workloads where writes dominate over reads, the trade-off works.
Hudi fits less well when the team wants minimal operational tuning. The format's flexibility produces more decisions. Teams without Hudi-specific expertise sometimes find the defaults harder to operate than Iceberg or Delta defaults.
What Actually Drives the Choice
Three factors drive the choice in practice.
The first is existing platform commitment. Teams on Databricks usually pick Delta. Teams on AWS heavily using Glue and Athena often pick Iceberg. Teams with specific upsert-heavy workloads sometimes pick Hudi. The platform fit matters more than format benchmarks.
The second is engine diversity. Teams that operate one query engine can pick any format. Teams that operate multiple engines benefit from Iceberg's portability.
The third is workload write patterns. Append-heavy workloads run fine on any format. Upsert-heavy workloads benefit from Hudi or Delta's specific support. Workloads with evolving partition schemes benefit from Iceberg's partition evolution.
Benchmark performance differs across the three on specific workloads, but the differences are usually small enough that the operational and platform fit factors dominate.
What the 2026 Landscape Looks Like
By 2026, the three formats have settled into roughly even adoption with workload-specific patterns. Iceberg adoption has accelerated through Snowflake's Iceberg-native tables, AWS native support, and the broader catalog ecosystem. Delta remains dominant on Databricks and continues to lead in that environment. Hudi remains preferred for specific upsert-heavy use cases, particularly in CDC-heavy data lakehouse architectures.
The formats are also becoming more interoperable. Apache XTable (formerly OneTable) provides format conversion, allowing organizations to maintain one underlying storage while different engines see different formats. The tool reduces the cost of format choice over time.
The choice still matters because operating efficiently on any format requires format-specific expertise. The choice does not matter as a competitive or strategic decision in the way the 2022 framing suggested.
Clinical AI Hallucination
Why 91.8% of clinicians have encountered medical AI hallucinations, the three structural failure modes.
What Logiciel Does Here
Logiciel works with data engineering teams selecting or evaluating table formats for new platforms or considering migration between formats. The work is typically structured around workload assessment, platform fit, and operational reality rather than around format ideology.
The Data Lake vs Data Warehouse vs Lakehouse framework covers the broader storage architecture decisions. The Data Pipelines Explained framework covers the pipeline patterns that interact with format choice.
A 30-minute working session is enough to assess your workload and platform fit against the three formats.
Frequently Asked Questions
Should I migrate from one format to another?
Only with clear workload justification. Migration costs are real and benefits depend on specifics. Most teams running an existing format successfully should not migrate for fashion. Migration justified by platform changes, workload changes, or capability needs sometimes makes sense.
What about Apache XTable for interoperability?
XTable enables reading the same underlying storage as multiple formats. Useful for organizations that have made format commitments and need to support engines with different format preferences. Not a replacement for picking the right primary format; an extension that adds flexibility.
How does this interact with vendor catalogs?
The three formats handle catalog integration differently. Iceberg has the most pluggable catalog architecture. Delta integrates tightly with Unity Catalog. Hudi has its own catalog patterns. Match the catalog choice to the format choice.
What about performance differences?
Real but usually small relative to query and workload optimization. Format choice is rarely the bottleneck once basic tuning is in place. The format choice matters more for operational characteristics than for query speed.
Which format will dominate by 2028?
All three will likely remain. The formats serve different patterns, and the patterns persist. Iceberg has the most momentum on engine adoption; Delta has the largest current deployment base; Hudi serves specific workload niches. No format is going away. Sources: - Apache Iceberg, official documentation 2024 - Delta Lake, official documentation 2024 - Apache Hudi, official documentation 2024