Starter page — expand with a sample eval report and the harness interface.
The distinctive thing about building context through an interview is that the measurement comes
for free. Every disambiguation the analyst makes (“active client means X, not Y”) is at once a
context entry and a labeled eval pair. Building context is harvesting ground truth.
The local eval delta
After you define a domain, run the delta to see how much the context helped:
"Run the eval delta on session-financials."
The open-source eval_harness/ runs your agent with the context and without it against the
harvested pairs and reports the accuracy difference — a concrete number you can show.
The harness reads ACF, dbt models and docs, or raw markdown, normalizes them, and measures the
delta the same way — so you can evaluate context you already have, not just ACF.
From one-shot to continuous
The one-shot, run-locally eval delta is free. Continuous re-evaluation, drift detection, and
observability across a team are the hosted product —
see enterprise evaluation.