LS LOGICIEL SOLUTIONS
Toggle navigation
Technology

Data Contracts: How They Reduce Engineering Friction

Data Contracts How They Reduce Engineering Friction

Data Contracts (How?) Engineers spend A Lot of Engineering Time on Coordination and Data-Pipeline Issues because of Friction.

A common cause of friction, especially as data use continues to grow in a variety of ways in a variety of places, is also the high rate of growth in data-pipeline/metric development work. Teams don’t see these types of issues coming, and since they will have to deal with them anyway, they spend an excessive amount of time coordinating with each other instead of actually delivering the product.

The reasons that data issues arise in an organisation are not necessarily because of the fact that engineers have a major skill deficiency or that they have bad intent; Generally, data-friction issues arise as a result of not having an agreed-upon contract or agreement between those who have produced the data and those who are consuming it.

For more than a decade, Application Programming Interfaces (APIs) have provided a resolution to this problem in the area of software engineering through the use of explicit contractual agreements. In contrast, the concept of “Data Contracts” has emerged as the best way to manage data-related agreements in the area of Data Engineering.

A Data Contract defines, as a minimum, the following:

Schema and Field Definitions

  • Data Type and Constraints
  • Expected Freshness and Availability
  • Ownership and Contact Points for Data Contract
  • Change Management Rules.

Essentially, the Data Contract indicates what a Data Consumer can rely on. It provides assurance to Data Consumers that there will not be changes made to the schema or data without notification.

A Data Contract does not lock data permanently. Rather, it helps to allow changes to be made with knowledge and predictability.

As the evolution of many organisations’ data pipelines has been an organic process, many times teams will add or remove fields, or rename columns, and adjust logic to accommodate the new features.Both the upstream changes andthe downstream consumer solutions exist as contractual obligations. When data contracts exist between consumers and producers, engineers are aware of what changes in producers’ data may or may not cause issues with their analyses, making the responsibilities clearer and improving coordination. Since engineers are able to identify which changes in producers’ data are safe without the requirement of checking multiple locations or waiting until something is actually broken, they can move from the reactive checking of where issues are occurring to creating proactive solutions that create better processes over time through increased effectiveness.

By being able to identify which producers change their data, when they are changing it, and what may happen when they do, it produces fewer instances of lost trust in the data that consumers are presented with due to new inaccuracies being presented that cause them to lose faith in the accuracy of their own analysis. Data contracts will also help reduce wasted time by providing engineers with clearer expectations and definitive timeframes as to when they are responsible for completing their analytical reports.

Overall, the primary purpose of utilizing data contracts is to create more predictable processes in the future and reduce wasted time by providing more clarity as to what data solutions can be expected to solve, and producing confidence in both data solutions created by engineers as well as effective communications between consumers and producers when creating a beneficial product for the consumers of the data.

Data Ownership and Responsibilities

Identifying the Owner Team: Each contract defines which team owns and therefore maintains the contract, and provides a way for changes to be communicated to others.

When ownership of a contract is clear, it will typically be resolved more quickly than when it is unclear or ambiguous.

Change Management Guidelines:

A contract will describe how to introduce changes to a contract (for example):

  • Allowing for backward compatibility is a common approach
  • If a change is not backward compatible, a new version of the contract should be created with a notice of the breaking change
  • Communicating timeline schedules for the deprecation of a contract will help avoid unwanted breakages
  • Having guidelines on how to manage changes in a contract helps eliminate unnecessary surprises.

When to Use Data Contracts:

Data contracts are not necessary to be used at all times. However, data contracts provide the most value when there is a high level of dependence between the data and the consumers of the data.

Examples of high-value use cases would be:

  • Core datasets that are shared across multiple teams
  • Metrics that feed directly into dashboards used by executives
  • Features that are utilized by machine learning models
  • Data is being shared with external partners/customers
  • Pipelines that help drive critical business processes
  • Using contracts selectively will ensure that there is no unnecessary overhead when managing a contract.

Data Contracts Create Velocity in Product and Platform Delivery:

  • Although the word “contract” may imply restriction, it can also increase the velocity of the product or platform.
  • Once a contract is established:
  • The team that owns and maintains it will confidently implement changes, knowing the impact of those changes
  • Other teams using the contract can plan their work with confidence and without fear of significant impact from any changes
  • Validation and testing of the contract will normally be automated
  • Communication from one team to another about the contract will be predictable and consistent
  • This will reduce the number of emergency fixes and improve the quality of releases.
  • Velocity increases when safe changes are made, not by stopping change, but by making change safer.

How Data Contracts Improve AI Reliability:

AI systems will degrade in reliability due to small changes in the schema and/or meaning of the data with which they have been trained.

Data Contracts Provide:

  • Better Definitions for Features
  • Better Visibility to Data Drift
  • Better Requirements to Enforce Data Freshness and Completeness
  • Better Data Lineage for Better Explainability
  • In addition to those benefits, for organisations that invest in AI, Data Contracts are some of The Most Important Building Blocks for Reliability.

Myths About Data Contracts

Despite the benefits of Data Contracts, many organisations have misunderstandings about them.

Data Contracts Slow Down Teams

Data Contracts Reduce the Amount of Unplanned Work and Interruptions for Teams.

Businesses Need Heavy Tooling to Implement Data Contracts

Data Contracts Can Begin with Basic Documentation of What Data is Required and What Validation Rules Should Be in Place.

Data Contracts Create Rigidity and Do Not Allow for Flexibility

Data Contracts Allow for and Manage Changes, but Do Not Stop Changes from Occurring.

Data Contracts are Only Beneficial for Larger Organizations

For Smaller Teams, due to their Closer Dependencies, Data Contracts Will Be Even More Beneficial.

Addressing the Myths Associated with Data Contracts Will Promote Adoption.

How Data Contracts fit within a Modern Data Stack

Data Contracts are Conceptually Tool-Independent but Integrate with most Modern Tools and Platforms.

Common Tooling/Platforms used When Working with Data Contracts are:

  • Schema Registry Tools
  • Data Observability Tools
  • Continuous Integration (CI) Pipelines for Data
  • Data Transformation Frameworks
  • Metadata Catalogues
  • The Important Factor is Not the Tool, but the Integration of Data Contracts into the Workflows of Delivery.

Practical Path to Adopting Data Contracts

  • Organizations can begin the Process of Implementing Data Contracts with an Incremental Approach.
  • Choose One Critical Dataset
  • Choose a Critical Dataset that has Many Consumers.
  • Document What Your Expectations Are Today
  • Capture the Expected Schema, Meaning, and Usage.
  • Clearly Define Who Owns What
  • Clearly Define Who is Responsible for Any Changes to The Dataset.
  • Create Basic Validation Checks
  • Apply Freshness and Schema Stability and Stability.
  • Clearly Communicate Any Changes
  • Establish Expectations for Versioning and Notice Period.
  • Following This Excel Model Will Provide Quick Value and Build Momentum For Your Organization.

Avoid Overengineering Data Contracts

The first risk of implementing data contracts is overengineered relationships. You should avoid the following common practices:

  • Create contracts for each dataset upfront.
  • Add an approval process to every single minor modification.
  • Create specification documents that are too complicated.
  • Create tooling before considering the behavioural aspect.
  • Data contracts should be simple and pragmatic. They should also change as they are used.

Organisational Signals That Data Contracts Are Needed

As organisations evolve, leaders should continuously observe various signs indicating that data contracts have strong potential to reduce friction throughout their organisations. For instance:

  • Frequent breakage of downstream pipelines
  • Repeated inquiries regarding the definition of metrics
  • Inconsistent behaviour of machine learning models
  • Engineers are unwilling to modify upstream systems
  • An increase in time devoted to coordination
  • These indicators are strong indicators of hidden dependencies that can be clarified by data contracts.

Measuring the Impact of Data Contracts

The operational bottlenecks of not having data contracts are obvious. The primary metrics to evaluate the operational performance of having data contracts are:

  • Number of data-related incidents
  • Time lag to recover from incidents
  • Time to debug instances of dependencies
  • Assurance level of confirmed data by developers at the time of release
  • Custodianship level of shared datasets
  • All of these metrics indicate the possibility of reducing friction and thereby increasing trust.

Perspective of Branding

Logiciel Solutions sees data contracts as an effective method to provide for scalable team collaboration while simultaneously not hindering team delivery. Through the leadership of our AI-focused engineering teams, Logiciel provides solutions to help organisations identify where data contracts will have the greatest impact in reducing friction among data platforms and ultimately improving trust between data-producing and data-consuming teams. When expectations are made clear, teams can be more efficient and have fewer unexpected hurdles.

Agent-to-Agent Future Report

Autonomous AI agents are reshaping how teams ship software read the Agent-to-Agent Future Report to future-proof your DevOps workflows.

Learn More

Extended FAQs

Are data contracts the same as APIs?
They both have similar purposes, but data contracts are used for data instead of service calls.
Do all data contracts require versioning?
Versioning will be required to denote any breaking changes, but not every update requires versioning (i.e., it could be thought of as a ‘minor’ version).
Who's responsible for creating and maintaining data contracts?
Typically, data contracts will be owned and created by the producer of the data, with input from the consumer of the data.
Can data contracts be enforced automatically?
Yes! Most teams will be able to validate data contracts at their continuous integration (CI) or during execution through their pipelines.
Will data contracts eliminate the need for governance?
No. Data contracts are complementary to governance as they provide for a better understanding of the expectations of both the data producer and data consumer.

RAG & Vector Database Guide

Smarter systems start with smarter data build the quiet infrastructure behind self-learning apps with the RAG & Vector Database Guide.

Learn More

Submit a Comment

Your email address will not be published. Required fields are marked *