Building a Trustworthy Data Layer for AI and Analytics

An organization that builds a Trustworthy Data Layer will provide an effective foundation for

Analytics is used to help make decisions based on data collected, while AI has taken over the process of analysing data. Lack of trust in the data will stop people from using analytics and AI tools. The following are three reasons trust in analytics and AI is vital:

As more companies begin to use more data across different systems, trust in that data becomes more tenuous.
Data users are now much broader than just data analysts and include engineers, product managers, C-suite executives and consumers of products.
With the introduction of AI systems, when you see an error in an AI system, the error can be spread quickly through automation and propagate through data at machine speed.

In summary, having trust in the data layer is the primary factor when trying to get people and companies to adopt and successfully scale their analytics and AI capabilities.

Consequences of an Untrustworthy Data Layer

Each of these three reasons has predictable results that occur when the data layer is perceived as untrustworthy by the company:

1. Low usage of analytics.

There are many times that teams and managers create dashboards that are shared with their teams but they aren’t utilized to their full potential (i.e. making changes, improvements or influencing their decisions).

2. AI pilot projects never achieving production levels.

Often, machine learning models are developed in a lab environment and perform well; however, they fail when exposed to real-world data.

3. Ongoing manual validation.

Many times, data analysts and engineers are reviewing data to ensure its accuracy, rather than using that same time to provide value to their companies with actionable insights based on the data.

4. Executive skepticism.

When there are concerns about how data is being used and output from that data, executive leaders can be wary or reluctant to make decisions based on an analysis of that data.

5. Shadow systems.

Many times, teams will create their data pipelines and metrics in addition to the established company data system; therefore, causing further fragmentation in an organization’s data systems.

As each of these conditions continue to develop, they will increase the complexity of recovery efforts when trust in the data layer erodes. The result will be a significant increase in recovery costs.

Attributes of a Trustworthy Data Layer

All trustworthy data layers will exhibit a set of similar attributes regardless of the technology that is being used. All organisations have a clear understanding of data ownership.
Every critical dataset and metric created by an organisation has an individual in a position of accountability to that dataset.Ownership includes defining what something means, defining the quality of something, and communicating any change to the definition, in addition to potential changes made to that data.

The presence of Ownership provides continual presence to an issue, and allows for Trust to erode.

Defined Consistency of Definitions and Models

All definitions of a metric, entity, or measurement are made once and then reused across the board. For example, the metric “revenue”, “active users”, and “churn” will have the same meaning from store to store.

As a result of having the same definitions to refer to when making incremental changes, there is much less debate about a particular item when making decisions about that item as compared to if the definitions were different.

Strong data quality signals

Data must have continuous monitoring of the freshness, completeness, and validity throughout the lifecycle, so that any issues can be found and resolved in a timely manner, as well as having a visible, reliable process for conducting quality checks.

Quality checks do not need to be complicated, they just need to be visible and reliable.

AI Velocity Blueprint

Ready to measure and multiply your engineering velocity with AI-powered diagnostics? Download the AI Velocity Blueprint now!

Learn More

Transparent Lineage/Context

All teams involved with the use of the data should understand where it came from, what process was used to convert it, and what it will be used for.

Lineage builds confidence with those working with the data and makes debugging much easier.

Controlled Access to Sensitive Data

Access to all sensitive data is controlled and logged, and therefore, any unauthorized access will not occur.

Trust encompasses not only the accuracy of a metric or data, but also the security of the data it encapsulates.

Why Building Trust In the Data Layer Will Enable Business Analytics Growth

Trust issues will typically surface first within the analytics function.

When data is trusted:

Analysts spend less time reconciling the numbers

Reports are reused

The use of self-service analytics will grow

Leadership discussions will center around taking action

When trust is diminished:

Analytics organizations become a bottleneck

Stakeholders will repeatedly request customized reports

Insight generated will arrive late or be ignored

Creating trust in the data layer will enable the data function to transition from a support role to a strategic capability in the organization.

Common Points of Failure in AI and Analytics Systems

In summary, many potential points of failure arise when creating systems that are expected to do what humans do based upon the same information and data. Common examples include:

Inconsistency in the quality or type of training data relative to actual production input.

Not maintaining systems and processes for defining “features” in the first place.

Not communicating what a systems behavior is going to be until it has already happened.

Creating models based upon a small subset of available input data (e.g., receiving feedback based only on one field versus an entire product line).

Failing to track the lineage of your inputs once an ML model is live and being put into action.

Common Mistakes When Building a Trusted Data Layer

Common mistakes being made by many companies as they strive to create a trusted data layer when trying to build AI or analytic capabilities include:

Forcing analytics and AI to use different toolsets and technologies. Having separate stacks creates complexity, redundancy, and a loss of opportunity to leverage shared tools.

The creation of a common data layer that has many different paths to consume the data and has other considerations (e.g., speed vs. clarity). Also, define clearly what each environment contains (raw, curated, consumption) and provide an interface to view the data.

Many companies believe that trust in their models only needs to be developed once, while in fact, trust in a model needs to be developed continuously.In order to avoid making mistakes with their dependability on trust, companies must be disciplined about their feedback loops.

Data Resources use cases that rely on trustworthy data layers

There are several high-value use cases where the establishment of trust is essential.

Executive Decision Making

Executive leaders depend on precise and consistent metrics to correctly allocate and deploy resources.

Trusted AI-based Personalization and Recommendation Systems

Stable and clearly defined features are essential to trusted AI-based personalization and recommendation systems.

Data Regulatory Reporting

Establishing and maintaining a complete history of data failures to comply with all regulatory requirements is imperative.

Customer-facing Analytical Deliverables

Errors in customer-facing analytical reporting can have a direct impact on customer credibility and retention.

Cross-product Insight Development

By agreeing upon a common definition, organizations can compare cross-product insights to one another to enable effective decision-making.

Overall, trust enforces whether or not an organization will achieve a successful implementation.

Signals of Trust in the Data Layer

For organizations, trust is a subjective concept, however, it does generate observable signals that organizations can observe along with the development process of trust, as well as track as an ongoing process.

Some examples of positive signals that indicate an organization has achieved an effective level of trust include:

Reduced time taken to validate metrics
Increased reuse of common data resources across business units
Increased speed of data incident resolution
Increased trust in Artificial Intelligence outputs
Decreased use of conflicting data dashboards

By monitoring these positive signals over an extended period of time, organizations can establish directional insight into their developing level of trust.

Practical Pathway to Building a Trustworthy Data Layer

In order to build a trustworthy data layer, there is a practical pathway organizations can utilize, which is to:

Identify Data Assets with High Business Impact
Clearly Define Ownership for Data Administration
Create a Standardized Definition and Model Set for Data Resources
Provide Clear Visibility around Data Quality Checks and Balancing of Data Resources
Document the Data Flow from Origin to Use and the Story that Connects the Two
Continuously Review and Improve the Processes Broken Down in Steps 1-5

By combining each of these steps in the appropriate manner, organizations can make progress without generating excessive operational hurdles.

Scenarios Requiring the Highest Level of Trust

The following is a listing of scenarios that require the largest degree of organizational focus and support:

Scaling the Application of Artificial Intelligence
Shared Data Across Multiple Teams
Supporting the Reporting Requirements of Executive and External Stakeholders
Engaging in Regulated Markets
Rebuilding Data Credibility

In each of the five scenarios above, trust needs to be established prior to the successful execution of the project.

Perspective of Logiciel Solutions

Logiciel Solutions is committed to helping organizations build trustworthy data infrastructures through the design of platforms that allow the timely delivery of valuable insights with fewer resources. By creating a trustworthy data infrastructure, organizational analytics can scale quickly with the trusted implementation of AI from experimentation to real-time production.

Extended FAQs

What is the difference between a trustworthy data layer and data governance?

Data governance is an effective support for the establishment of trust; however, there are additional criteria such as data quality, usability, and adoption.

Is it feasible for a startup to concentrate on establishing trust at the very beginning of its operations?

Yes. Establishing light weight standards for establishing trust will eliminate costly re-work later on.

Will modern data processing tools by themselves provide trust in data?

No. The ownership and process for utilizing the data are the primary drivers of trust in data. The tools facilitate the association of the data with trust.

How will trust affect the adoption of AI systems?

If trust is missing, organizations will ignore or override AI outputs.

Who owns trust?

Trust is a collective experience; therefore, trust can be built by team members of the data team, implemented by executive leaders, and maintained by the end users.

Evaluation Differentiator Framework

Great CTOs don’t just build; they benchmark and optimise. Get the Evaluation Differentiator Framework to spot bottlenecks before they slow you down.