Introducing Pact

Contact Me • February 12, 2026

AI made data work faster — but it didn’t change what data production is

If you’ve built or operated real data systems, the AI era has already delivered genuine convenience:

You can draft SQL and documentation faster.
You can explore schemas and datasets with less friction.
You can turn questions into first queries much more quickly.

That’s real progress.

But if you’re the person on-call for dashboards, reporting, and downstream consumers—AI hasn’t fundamentally changed the part that hurts most:

The data production chain is still a long, operational system.

It still looks like:

sources → ingestion/CDC/streams → modeling → transformations → orchestration → serving → governance/quality → incident response

AI improves the interface to this chain. It doesn’t remove the chain. And it doesn’t solve the two problems that determine whether your data platform is trustworthy:

Semantic correctness — does the metric match the business definition?
Operational reliability — does the system produce it consistently, within cost and time constraints?

Pact exists because those fundamentals remain unchanged.

The core problem: “shipping metrics” is not the same as “writing SQL”

A common mistake (especially in product demos) is assuming that if we can generate SQL, we’ve solved analytics.

In production, SQL is not the product. Reliable metrics are the product.

Here’s the uncomfortable truth users live with:

A SQL query can be perfectly correct as a query—and still be wrong as a metric.

Not because SQL is “bad,” but because business metrics depend on semantics that aren’t guaranteed by query correctness alone:

Grain: Is “active” measured per user, per device, per session, per day?
Time semantics: Which timezone? What is a “business day”? Are we using event time or ingestion time? How do we handle late events?
Exclusions: Bots, internal accounts, test traffic, refunds/cancellations—these are business rules, not syntax.
Identity stitching: user_id vs device_id vs anonymous IDs—what counts as a single user?
Join policy: Fanout and dedup decisions can change results without changing whether the SQL “makes sense.”
Versioned definitions: The definition of “conversion” changes, and the system needs to know which definition produced which numbers.

If you’ve ever had a “why did revenue drop?” incident where everything was green, you already understand this.

So the bottleneck is not “text-to-SQL.” The bottleneck is turning business meaning into something the system can operate on deterministically—and keeping it correct over time.

Why LLMs alone won’t build (or operate) your pipelines

LLMs are great at generating text and code. But pipeline production is not a code-generation problem. It’s an infrastructure + orchestration + control-loop problem.

To run production, you need capabilities that don’t come from prompts:

multi-engine scheduling and execution
retries, idempotency, replay keys
backfills and rollbacks
cost control and resource isolation
lineage and evidence for “why this number exists”
governance boundaries and approvals
operational observability and incident response

Without that substrate, an agent can only suggest. It can’t operate.

This is where many “AI for data” narratives break down: they assume intelligence replaces infrastructure. In reality, intelligence only becomes useful when infrastructure provides safe actuation and reliable feedback.

Where durable value actually lives

If you’re a user, you don’t buy “AI for data.” You buy outcomes:

Do my metrics stay correct as definitions evolve?
Does the production chain keep running smoothly as the world changes?
Can I explain any number quickly—with evidence—and recover safely when something breaks?

From a product perspective, that means the real value isn’t in generating SQL. It’s in building a system that reliably turns intent into production results.

Convenience layers (chat, SQL suggestions) are useful, but they’re easy to replicate and easy to bundle. What’s harder to copy is a platform that continuously closes the loop between intent and production reality:

When upstream schemas drift, data arrives late, or dependencies fail, the system doesn’t just explain—it reconciles.
When definitions change, the system doesn’t just document—it versions, enforces, and proves what produced each result.
When costs or freshness drift, the system doesn’t just alert—it re-plans and executes within policy.

That’s where durable value lives: not in “helping you write queries,” but in shipping correct metrics reliably, with traceability and stability as first-class guarantees.

The idea behind Pact: contracts + reconciliation

Pact is built on a principle that has worked across system design for decades:

Users declare what they want. The platform continuously reconciles it into the actual state.

In data terms:

Users declare metric meaning and constraints.
The platform compiles that intent into pipelines, execution plans, and governance.
The system observes outcomes (history, lineage, cost) and continuously reconciles drift with safe actions.

Pact in one sentence

Pact is an agent-native data platform where users define metrics and constraints as contracts, and the platform compiles and operates the entire data production chain with traceability, reliability, and cost control.

The agent is the entry point. But unlike a simple chat UI, Pact is a complete agent data platform designed to make the agent operational.

This is not an illusion: Pact is built on a real production platform

To be explicit: Pact is not a thin assistant layer.

Under the hood, Pact is backed by a complete set of platform services built from the ground up:

cloud-native orchestration across multiple engines
ingestion support for streaming and batch synchronization
transformation support (including Spark SQL) and warehouse modeling workflows
resource allocation and scheduling integrated with cloud provider primitives
full execution history and statistics captured end-to-end
field-level lineage persisted (e.g., in a graph model) to support traceability and impact analysis

This foundation matters because it provides what AI needs to be reliable: a safe, observable control loop with evidence.

What users declare: the minimum interface that makes the system correct

In the initial stage (0→1), Pact does not ask users to hand-build pipelines.

Instead, Pact asks for what only humans can reliably provide: business semantics and constraints—captured as structured, versioned contracts.

1) Metric Contract (definition correctness)

A metric contract captures meaning explicitly:

grain (entity + aggregation level)
time semantics (timezone, business day, lateness policy)
required filters/exclusions
join policy / fanout constraints
computation logic (reference SQL is allowed, but the definition is treated as a governed contract)

This is the “semantic truth” the platform can enforce.

2) Production Intent (how the system should produce data)

This describes how data is produced and refreshed:

sources (RDBMS tables, Kafka topics, files)
batch/stream/CDC approach
cadence, dependencies, incremental/backfill strategy
target layers and modeling conventions

3) Constraints (operational truth targets)

Because production must respect reality:

freshness goals (e.g., “ready by 08:30”)
cost budgets (per run/day/tenant)
resource and isolation policies
governance requirements (traceability, audit, approvals)

4) Quality Gates (data health, separate from definition)

Quality is not the metric definition. It’s the evaluation of current state: freshness, volume anomalies, null spikes, drift, etc.

Keeping definition and quality separate avoids endless confusion—and makes remediation actionable.

Deterministic compilation: contracts in, pipelines out

Once Pact has those contracts, the platform does the part that LLMs alone can’t do reliably:

It compiles intent into production deterministically.

That means Pact can generate and operate:

ingestion tasks (streaming and batch synchronization)
hierarchical warehouse structures (bronze/silver/gold or ODS/DWD/DWS/ADS)
transformation jobs (including Spark SQL task plans)
DAG workflows (including DAG-of-DAG patterns)
resource plans tuned to constraints and informed by execution history

The key boundary is:

the agent proposes edits to contracts
the platform compiles, validates, executes
the system records evidence (history + lineage)
reconciliation happens based on truth, not guesses

This is how you avoid “AI demo correctness” and get production-grade behavior.

Closed-loop reconciliation: the part most stacks still don’t give you

Pact is not a generator. It’s a reconciler.

Declare desired state Metric contracts + production intent + constraints.
Plan Compile workflows and resource plans.
Execute Run ingestion and transformations, publish outputs.
Observe Record execution history, costs, runtimes, and field-level lineage.
Evaluate Did we meet definition, freshness, budget, and quality gates?
Reconcile If reality diverges, Pact can propose and (when permitted) execute safe actions: rerun, backfill, rollback, or re-plan resources—always with traceability.

From a user perspective, this is the difference between “an assistant” and “a system you can trust.”

A concrete 0→1 example

Start state:

MySQL/PostgreSQL tables (users, orders, payments)
behavioral events as JSON in Kafka
no agreed metric definitions, no stable warehouse model, pipelines are manual

With Pact:

Connect sources Pact discovers schema and event shapes.
Minimal expert input (only what can’t be inferred) Pact asks domain owners to confirm:
- what is an “active user”?
- what counts as “conversion”?
- timezone/business day rules
- exclusions (bots/internal/test/refunds)
- identity stitching rules
Contracts become the system’s source of truth These definitions are versioned and enforceable.
Pact compiles production It generates the warehouse layers, pipelines, Spark SQL transforms, orchestration DAGs, and resource plans.
Operations become closed-loop When late events arrive, when costs drift, or when upstream changes break assumptions:
- the system can explain impact via lineage and run history
- it can propose remediation with an auditable plan
- and it can execute safely under policy

The practical outcome is what users actually want:

You spend less time fighting pipelines and more time deciding what metrics mean and how they should be used.

What Pact is—and isn’t

Pact is a platform product, not a new storage engine.

Pact does not replace cloud compute or storage.
Pact integrates existing cloud primitives and execution engines.
Pact provides the product layer where durable value lives:
- semantic contracts (definitions that don’t drift silently)
- deterministic compilation (pipelines generated and validated)
- traceability (lineage + execution evidence)
- cost-aware planning (constraints as first-class)
- closed-loop operations (safe remediation)

Why Pact matters now

AI made it easier to write and query. That’s valuable, but it doesn’t solve production correctness.

The next step is not “more AI.” The next step is operationalizing AI by pairing it with:

explicit contracts,
deterministic compilers,
and closed-loop reconciliation driven by real system evidence.

That’s the idea behind Pact: a platform where the agent can genuinely operate—not just suggest.