ADR-007: Metric-aware Clarifier with first-class metric catalog¶
Status: Accepted Source: prd.md §19
Context¶
ADR-005 (Clarifier) and ADR-006 (custom variant) shipped widgets with a single metric_label: string field on every spec. Two real-Bedrock baseline runs (see docs/plans/active/test-clarifier-codegen-on-alerts-feed.md, Phase 0 FINDINGS) made the cost of that shortcut concrete:
- The user could not see what a widget actually measures — the renderer printed a label, not a definition. There was no way to answer "what does 'cost avoided' mean here?" without reading code.
- The custom path asked the user to paste a TypeScript interface for
Props. Holistics, ThoughtSpot, and Lightdash all converged on the same opposite UX — pick from a semantic catalog or describe the metric in plain English; never ask for a type signature. The TS-paste UX was a developer ergonomics shortcut, not a product decision. - With every widget defining its own
metric_labelstring, two widgets that meant the same thing ("Cost avoided"vs"$ avoided MTD") rendered as if they were different metrics. There was no governance, no owner, no formula. - Bedrock baseline runs surfaced a hidden bug: the data-path
WidgetSpecwas fed to Bedrock with a top-leveloneOfdiscriminator, which Anthropic's tool-use API rejects (tools.0.custom.input_schema.type: Field required). The synthesizer was silently falling back toMockLlmon every Bedrock call — a regression invisible without a real-credentials run. Fixing this is a prerequisite to honest metric synthesis.
Decision¶
metrics_catalogis a first-class table (db/init.sql). Every widget'sspec.metricis aMetricDefinitionblock (backend/app/metrics/schemas.py) withmetric_id,name,label,definition,formula,entity,unit,default_filter,default_refresh_seconds,owner,version. The legacymetric_label: stringis removed fromKpiSpec,ChartSpec,TableSpec, andCustomSpec. Ten dashboard-derived metrics are seeded on first boot (backend/app/metrics/seed.py).- The Clarifier resolves metrics before asking questions. A new
metricMatchernode (backend/app/widgets/nodes/metric_matcher.py) runs immediately afterintentExtractorand tries an exact-name match onintent.metric_id_guess, then anilikefallback overname/label/definition. A high-confidence match writesstate.catalog_matchandstate.metric_draft, whichgapDetectortreats as the metric gap being closed — so the user is not asked to pick something the system already knows. - Universal-core gaps come first.
gapDetector(backend/app/widgets/nodes/gap_detector.py) checksmetricandtime_windowbefore any variant-specific gap (value_format,chart_kind, columns, layout, accent…). The metric question is a single-select rendered from the live catalog (withhintpopulated from each metric'sdefinition) plus a "Define a new metric" escape hatch that gathersname | label | definition | formula | entity | unit | default_filteras plain-English sub-questions. No TypeScript interfaces, ever. - Custom widgets use a two-stage synthesis. Bedrock Haiku 4.5 reliably drops deeply-nested fields when asked to emit the full
CustomSpecenvelope.specSynthesizer(backend/app/widgets/nodes/spec_synthesizer.py) now asks the LLM for a flatComponentSpeconly (TSX source + metadata + props inferred fromintent.custom_examples), then deterministically wraps it with the resolvedmetricblock, a deriveddata_intent, and a derivedmock_datapayload. Code-generation calls explicitly passmax_tokens=4096so React components don't truncate. - Per-variant schemas at the Bedrock boundary. The data path dispatches on
intent.typeand feeds Bedrock theKpiSpec/ChartSpec/TableSpecJSON Schema directly — flat objects that pass Anthropic tool-use validation. TheWidgetSpecdiscriminated union remains the single source of truth for the database and the frontend; we only flatten at the LLM boundary. - Every widget surfaces its definition in the UI. A new
<MetricInfoBadge>(frontend/src/widgets/MetricInfoBadge.tsx) renders a smalliicon on each tile (preview and dashboard rail). Clicking it pops the fullMetricDefinition— name, definition, formula, entity, unit, default filter, owner,metric_id(or "one-off"). The Spec JSON viewer (frontend/src/widgets/SpecJsonView.tsx) gains a "Metric" tab, default-selected when a metric is present. - Atomic catalog promotion at persist.
POST /v1/widgets(backend/app/widgets/routes.py_validate_and_promote_metric) validatesspec.metricis internally consistent, and ifmetric_id is null(a one-off metric authored via "Define a new metric") it inserts the metric intometrics_catalogand rewrites the spec with the resulting catalogmetric_idin the same transaction as the widget insert. There is no path by which a persisted widget references a metric that does not exist in the catalog.
Consequences¶
- The user can answer "what is this widget actually showing?" for every widget, without leaving the dashboard. This was previously impossible.
- A new widget that asks for "Cost avoided this month" resolves to the same
metric_idas an existing tile asking the same question; the Clarifier no longer fragments the catalog. - The custom path's UX no longer leaks the implementation language to the user. Examples-driven inference (
intent.custom_examples) replaces the TS interface paste; props are inferred from example rows in_custom_synth. - The hidden Bedrock-fallback bug is fixed. A baseline-run script and an extension to
scripts/verify-acceptance.shround-trip a real Bedrock spec → persist → catalog row, so the regression cannot recur silently. - Bedrock latency for the data path rose modestly (per-variant prompt + metric block in context);
widget_llm_timeout_swas raised from 4.0s → 8.0s inbackend/app/settings.py. - db/init.sql gains the
metrics_catalogtable and an idempotent seed;make demo-resetkeeps the seed but truncateswidgets. No destructive migration was needed for hackathon-scope deployments — the table is created onlifespanstartup if absent. - The frontend gains one new API client (
frontend/src/widgets/api-metrics.ts) and one new dependency-free component (MetricInfoBadge); no new npm packages. - ADR-006's
metric_labellesson is now obsolete — captured as aSUPERSEDEDentry pointing at this ADR.
Cross-references¶
- ADR-005, ADR-006 — supersedes the
metric_label: stringfield they introduced. - ADR-PROTO-002 — Part C anchors SQL gen to
metrics_catalog, building on this ADR. - Lessons-learned: Bedrock tool-use rejects top-level oneOf schemas, Haiku 4.5 silently drops deeply-nested fields.
- Implementation: backend/app/metrics/, backend/app/widgets/nodes/metric_matcher.py.