BLUEPRINT

Data Contract Template Pack

Most pipeline breakage starts the same way. A producer changes a column, and nobody downstream finds out until a dashboard breaks or a model run fails.

Download WhitePaper

A Contract Is Not Documentation. It Is a Build Gate.

The difference is where the check runs. A wiki page describes the data. A contract refuses to ship a change that violates it.

The common pattern: a consumer finds the broken data in production, files a ticket, and the producer fixes it days later.
The approach that works: the producer's own pipeline blocks the bad change at commit, before it ever reaches a consumer.

Download White Paper

The Numbers That Make This A Board-Level Conversation

7 sections

Every contract in this pack pins down seven things: schema, semantics, SLAs, ownership, quality expectations, versioning, and change policy.

30 days

The notice period the template assigns to any breaking change, such as renaming a field, changing a type, or shifting the grain.

2 enforcement points

Schema checks in CI on every pull request, plus producer-side data validation before publish.

The Three Moves Every Head of AI Needs

Name all seven layers and give each one an owner

Every production agent system has the same seven layers whether you named them or not: interface, orchestration, agents and tools.

Pick the simplest orchestration pattern that does the job

The orchestration pattern is the first real decision and the one teams get wrong most.

Make every action either allowed-with-a-limit or blocked

Guardrails are not a feature you add at the end. They are the rules that decide whether this ships.

The Three Moves Every Head of Data Needs

Write the contract next to the code

Copy the fillable contract.yaml into a file beside the code that produces the dataset, replace every value in angle brackets, and check it into version control. The contract lives with the producer, not in a separate catalog that drifts out of date.

Classify every change before it ships

Settle the one question that drives most contract disputes: was that change allowed without notice, or not? Additive changes ship freely. Anything that can break a consumer needs notice and a parallel run. The version number carries the rule, so consumers pin to a major version and move on their own schedule.

Enforce in the producer's pipeline, not downstream

Run schema checks in CI on every pull request and data validation inside the producer job before publish. A check that runs after publish only tells a consumer they already have bad data. A check that runs before publish stops the bad data from ever leaving.

Catch Breakage at the Source, Not in a Postmortem

A contract you do not enforce is a comment in a wiki.

Download White Paper

Frequently Asked Questions

What is a data contract?

A data contract is a written agreement between the team that produces a dataset and the teams that consume it. It pins down the schema, the meaning of each field, how fresh and complete the data will be, who owns it, and what happens when any of that has to change.

What counts as a breaking change versus an additive one?

Additive changes are safe and ship freely: adding a nullable field, adding an allowed value to an enum, or relaxing a constraint. Breaking changes can hurt a consumer: renaming or removing a field, changing a type, tightening a constraint, or changing the grain or primary key. Breaking changes require 30 days notice and a parallel version.

Who is this template pack for?

Heads of Data, data platform owners, and the senior data engineers who own the pipelines that produce shared datasets. If a producer can change a column today and a consumer only finds out when something breaks, this pack is built for you.

How is a data contract different from a data catalog or documentation?

A catalog describes data after the fact. A contract is enforced in the producer's pipeline before the data ships. The contract file is the single source of truth for both the CI schema check and the producer-side data validation, so the description and the enforcement never drift apart.

Where should I enforce the contract?

In two places, and you want both. Schema checks run in CI on every pull request to catch a breaking change while it is still a code review comment. Data validation runs inside the producer job before publish, so a load that fails a quality, volume, or freshness rule is quarantined instead of consumed.