Most pipeline breakage starts the same way. A producer changes a column, and nobody downstream finds out until a dashboard breaks or a model run fails.
The difference is where the check runs. A wiki page describes the data. A contract refuses to ship a change that violates it.
The common pattern: a consumer finds the broken data in production, files a ticket, and the producer fixes it days later.
The approach that works: the producer's own pipeline blocks the bad change at commit, before it ever reaches a consumer.
Every production agent system has the same seven layers whether you named them or not: interface, orchestration, agents and tools.
The orchestration pattern is the first real decision and the one teams get wrong most.
Guardrails are not a feature you add at the end. They are the rules that decide whether this ships.
Copy the fillable contract.yaml into a file beside the code that produces the dataset, replace every value in angle brackets, and check it into version control. The contract lives with the producer, not in a separate catalog that drifts out of date.
Settle the one question that drives most contract disputes: was that change allowed without notice, or not? Additive changes ship freely. Anything that can break a consumer needs notice and a parallel run. The version number carries the rule, so consumers pin to a major version and move on their own schedule.
Run schema checks in CI on every pull request and data validation inside the producer job before publish. A check that runs after publish only tells a consumer they already have bad data. A check that runs before publish stops the bad data from ever leaving.
A contract you do not enforce is a comment in a wiki.
A data contract is a written agreement between the team that produces a dataset and the teams that consume it. It pins down the schema, the meaning of each field, how fresh and complete the data will be, who owns it, and what happens when any of that has to change.
Additive changes are safe and ship freely: adding a nullable field, adding an allowed value to an enum, or relaxing a constraint. Breaking changes can hurt a consumer: renaming or removing a field, changing a type, tightening a constraint, or changing the grain or primary key. Breaking changes require 30 days notice and a parallel version.
Heads of Data, data platform owners, and the senior data engineers who own the pipelines that produce shared datasets. If a producer can change a column today and a consumer only finds out when something breaks, this pack is built for you.
A catalog describes data after the fact. A contract is enforced in the producer's pipeline before the data ships. The contract file is the single source of truth for both the CI schema check and the producer-side data validation, so the description and the enforcement never drift apart.
In two places, and you want both. Schema checks run in CI on every pull request to catch a breaking change while it is still a code review comment. Data validation runs inside the producer job before publish, so a load that fails a quality, volume, or freshness rule is quarantined instead of consumed.