Data Lineage Tools That Show You What Actually Happens to Your Data

Column-level. Auto-discovered. Cross-language. Queryable.

Manual lineage diagrams are wrong by next Tuesday. Logiciel's data lineage tools auto-derive column-level lineage from query logs, dbt manifests, Python code, and BI tools — so you have a queryable, always-fresh lineage graph when audit, change, or compliance comes knocking.

See Logiciel in Action

Your lineage is wrong. You just don't know which 30% yet.

What lineage really looks like in most teams:

A Lucidchart diagram from 18 months ago that everyone references and nobody updates. Lucidchart-based lineage decays the moment it's published; reliance on it for impact analysis is a structural risk that audit teams notice.
When something breaks, lineage is reconstructed by Slacking three engineers. Slack-reconstructed lineage during incidents adds hours to MTTR — the cost of broken lineage shows up as longer incidents.
Audit prep means a manual quarterly lineage exercise that everyone hates. Quarterly manual lineage exercises represent significant engineering capacity diverted from value-creating work.

If you're shopping lineage tools, you've already felt the pain

Teams here typically need:

Column-level lineage — not table-level — for impact analysis. Column-level lineage is the difference between actionable impact analysis and educated guessing; table-level lineage isn't enough at scale.
Cross-language lineage — SQL, Python, Spark, dbt — in one graph. Cross-language lineage (SQL, Python, Spark, dbt) in one graph eliminates the typical 'lineage stops at the dbt boundary' problem.
Auto-derived from runtime, not manually declared. Auto-derivation from runtime is the only lineage approach that stays current at modern data team velocity.

What you get with Logiciel

Lineage that's actually accurate.

Auto-derived from query logs, dbt manifests, Python instrumentation. Auto-derived lineage from query logs, dbt manifests, and Python instrumentation stays current as code changes — manual lineage approaches don't.
Column-level — answer 'what depends on this column' in milliseconds. Column-level granularity answers 'what depends on this column' in milliseconds, which is the question every change and every incident requires.
Cross-language — SQL, Python, Spark, dbt unified. Cross-language lineage in one graph eliminates the typical 'lineage breaks at language boundaries' problem at scale.
Queryable API — feed lineage into impact analysis, audit reports, change management. Queryable lineage API feeds impact analysis, audit reports, and change management — making lineage actionable, not just visible.

Where this fits - industries we serve in the US

FinTech & Financial Services

Trading data, risk models, regulatory reporting — sub-second SLAs and audit-ready governance.

PropTech & Real Estate

Listing data, transaction pipelines, geospatial analytics — multi-source consolidation.

Healthcare & Life Sciences

EHR integration, claims pipelines, clinical analytics — HIPAA-aware infrastructure.

B2B SaaS

Product analytics, customer 360, usage-based billing — embedded and operational data.

eCommerce & Marketplaces

Inventory, pricing, order, and customer pipelines — real-time and high-throughput.

Construction & Industrial Tech

IoT, project, and supply-chain data — operational analytics on hybrid stacks.

Engagement models that fit your stage

Dedicated Pod

Embedded data engineering pod aligned to your sprint cadence — typically 3–6 engineers + a US lead.

Staff Augmentation

Senior data engineers, architects, and SMEs slotted into your team to unblock specific work.

Project-Based Delivery

Fixed-scope, milestone-driven engagements with clear deliverables and outcomes.

From first call to first production pipeline

Discover

We map your stack, workloads, team, and constraints in a working session — not an RFP response.

Architect

Reference architecture grounded in your reality, with capacity, cost, and migration plans.

Build

Iterative implementation with weekly demos, code reviews, and your team in the loop.

Operate

Managed operations or knowledge transfer — your choice. Both with US-aligned coverage.

Optimize

Continuous tuning of cost, performance, and reliability against measurable SLAs.

Lineage capabilities

Auto-Derived Lineage

From query logs, dbt manifests, Python instrumentation, BI tools.

Column-Level Granularity

Track lineage at the column, not just the table.

Impact Analysis

What breaks if I change this column? Who consumes it?

Cross-Language Lineage

SQL, dbt, Python, Spark — in one graph.

Lineage API

Programmatic access for downstream tools.

Audit Reports

Auto-generated lineage reports for SOX, GDPR, HIPAA audits.

Questions buyers ask before they book

How does auto-discovery work?

We parse query logs (Snowflake, Databricks, BigQuery, Redshift query history), dbt manifests, Airflow DAGs, Python instrumentation, BI tool metadata (Looker LookML, Tableau workbooks, Mode notebooks), and reverse-ETL configurations to derive column-level lineage automatically. No manual declaration required; lineage updates continuously as new query logs arrive (typically within 5-15 minutes). For non-SQL transformations (Python, Spark), SDK instrumentation captures lineage at runtime with 1-3 lines of code per script. Accuracy is typically 95%+ for SQL-heavy stacks, lower for Python-heavy without instrumentation. Lineage is queryable via API for impact analysis, audit reports, change management, and AI agents needing governed access.

How is lineage kept fresh?

Updated in near-real-time as new query logs and pipeline runs arrive — typically within 5-15 minutes of execution. There's no scheduled batch job to maintain, no manual refresh button, no 'lineage was current as of last Tuesday' staleness. For customers with high-frequency pipelines (streaming, sub-hourly batch), lineage updates continuously; for daily-batch customers, lineage is fresh by morning. Stale lineage is flagged and surfaced in the UI so you know if a data source isn't being queried (potential candidate for retirement) or if instrumentation is missing on a known pipeline. Freshness is one of the structural advantages versus manual lineage tools.

Audit-ready?

Yes — pre-built lineage reports for SOX, GDPR, HIPAA, BCBS 239, EU AI Act, and other major frameworks. Reports include data flow diagrams, control mappings, evidence collection, and exception documentation aligned to specific framework requirements. For SOX, we generate IT general controls evidence including data flow attestation, change management lineage, and access control mapping. For GDPR, we map personal data flows for Article 30 records of processing activities. For BCBS 239, we provide cross-border data flow attestation. Reports are auditor-aligned (we've passed Big Four audits) and regenerable on demand — eliminating the typical 4-week pre-audit scramble.

How is this priced?

Per asset tier — predictable at scale, with unlimited users and no per-seat penalties. An 'asset' is a managed table, view, model, dashboard, or pipeline that Logiciel lineages. Mid-market customers (5,000-20,000 assets) typically pay $30-70K ARR for lineage as part of the broader catalog/governance tier. Enterprise tiers (100,000+ assets, advanced audit reports, dedicated TAM) start at $150K ARR. Lineage is included in the catalog/governance tier, not a separate SKU — this matters because lineage value compounds with catalog and quality investments. Free tier covers first 500 assets indefinitely for teams getting started.

What about non-SQL transformations?

Python and Spark transformations are supported via SDK instrumentation — typically 1-3 lines of code per script to capture lineage at runtime. The SDK wraps DataFrame and Spark operations to track inputs and outputs at the column level. For dbt-Python models, lineage is derived automatically from the model definitions. Declarative APIs cover the rest — for cases where instrumentation is impractical (closed-source systems, third-party scripts), you can declare lineage manually with versioned manifests. We don't require declarative lineage for everything (which engineers never maintain), and we don't break when code changes (which manual lineage diagrams always do). For Python-heavy customers, instrumentation coverage is the difference between accurate lineage and theater.

Can we expose lineage to BI users?

Yes — self-serve lineage views for analysts and stewards, embedded directly in the catalog and BI tool integrations. Analysts hover over a Looker dimension and see column-level lineage to source systems; stewards investigate a quality alert and see all downstream consumers immediately. Lineage views are scoped to the user's permissions (sensitive lineage is hidden from users without access). For business stakeholders, simplified lineage diagrams (table-level, system-level) reduce visual complexity while preserving traceability. For technical users, full column-level lineage with transformation logic is available. The UX adapts to the audience without forcing one view to fit all users.

Does this compete with OpenLineage?

We support and extend OpenLineage. Compatible with Marquez, Datakin (Astronomer), and other OpenLineage consumers. If you've invested in OpenLineage instrumentation, Logiciel ingests those events alongside our own discovery and instrumentation. OpenLineage is a useful standard for cross-tool interoperability; Logiciel adds the consumption layer (column-level resolution, audit reports, anomaly-aware routing, governance integration) that bare OpenLineage doesn't provide. For customers building on OpenLineage today, Logiciel is complementary; for customers without OpenLineage, our SDK and auto-discovery provide equivalent coverage without requiring upstream tool changes.

Get a lineage audit of your stack

We'll connect to your warehouse, dbt project, and BI tool. In 24 hours you'll have a column-level lineage report — and a clear list of where your current lineage diagrams are wrong.

See Lineage in Action