ADR-007: Metric-aware Clarifier with first-class metric catalog¶

Status: Accepted Source: prd.md §19

Context¶

ADR-005 (Clarifier) and ADR-006 (custom variant) shipped widgets with a single metric_label: string field on every spec. Two real-Bedrock baseline runs (see docs/plans/active/test-clarifier-codegen-on-alerts-feed.md, Phase 0 FINDINGS) made the cost of that shortcut concrete:

The user could not see what a widget actually measures — the renderer printed a label, not a definition. There was no way to answer "what does 'cost avoided' mean here?" without reading code.
The custom path asked the user to paste a TypeScript interface for Props. Holistics, ThoughtSpot, and Lightdash all converged on the same opposite UX — pick from a semantic catalog or describe the metric in plain English; never ask for a type signature. The TS-paste UX was a developer ergonomics shortcut, not a product decision.
With every widget defining its own metric_label string, two widgets that meant the same thing ("Cost avoided" vs "$ avoided MTD") rendered as if they were different metrics. There was no governance, no owner, no formula.
Bedrock baseline runs surfaced a hidden bug: the data-path WidgetSpec was fed to Bedrock with a top-level oneOf discriminator, which Anthropic's tool-use API rejects (tools.0.custom.input_schema.type: Field required). The synthesizer was silently falling back to MockLlm on every Bedrock call — a regression invisible without a real-credentials run. Fixing this is a prerequisite to honest metric synthesis.

Decision¶

metrics_catalog is a first-class table (db/init.sql). Every widget's spec.metric is a MetricDefinition block (backend/app/metrics/schemas.py) with metric_id, name, label, definition, formula, entity, unit, default_filter, default_refresh_seconds, owner, version. The legacy metric_label: string is removed from KpiSpec, ChartSpec, TableSpec, and CustomSpec. Ten dashboard-derived metrics are seeded on first boot (backend/app/metrics/seed.py).
The Clarifier resolves metrics before asking questions. A new metricMatcher node (backend/app/widgets/nodes/metric_matcher.py) runs immediately after intentExtractor and tries an exact-name match on intent.metric_id_guess, then an ilike fallback over name/label/definition. A high-confidence match writes state.catalog_match and state.metric_draft, which gapDetector treats as the metric gap being closed — so the user is not asked to pick something the system already knows.
Universal-core gaps come first. gapDetector (backend/app/widgets/nodes/gap_detector.py) checks metric and time_window before any variant-specific gap (value_format, chart_kind, columns, layout, accent…). The metric question is a single-select rendered from the live catalog (with hint populated from each metric's definition) plus a "Define a new metric" escape hatch that gathers name | label | definition | formula | entity | unit | default_filter as plain-English sub-questions. No TypeScript interfaces, ever.
Custom widgets use a two-stage synthesis. Bedrock Haiku 4.5 reliably drops deeply-nested fields when asked to emit the full CustomSpec envelope. specSynthesizer (backend/app/widgets/nodes/spec_synthesizer.py) now asks the LLM for a flat ComponentSpec only (TSX source + metadata + props inferred from intent.custom_examples), then deterministically wraps it with the resolved metric block, a derived data_intent, and a derived mock_data payload. Code-generation calls explicitly pass max_tokens=4096 so React components don't truncate.
Per-variant schemas at the Bedrock boundary. The data path dispatches on intent.type and feeds Bedrock the KpiSpec / ChartSpec / TableSpec JSON Schema directly — flat objects that pass Anthropic tool-use validation. The WidgetSpec discriminated union remains the single source of truth for the database and the frontend; we only flatten at the LLM boundary.
Every widget surfaces its definition in the UI. A new <MetricInfoBadge> (frontend/src/widgets/MetricInfoBadge.tsx) renders a small i icon on each tile (preview and dashboard rail). Clicking it pops the full MetricDefinition — name, definition, formula, entity, unit, default filter, owner, metric_id (or "one-off"). The Spec JSON viewer (frontend/src/widgets/SpecJsonView.tsx) gains a "Metric" tab, default-selected when a metric is present.
Atomic catalog promotion at persist. POST /v1/widgets (backend/app/widgets/routes.py _validate_and_promote_metric) validates spec.metric is internally consistent, and if metric_id is null (a one-off metric authored via "Define a new metric") it inserts the metric into metrics_catalog and rewrites the spec with the resulting catalog metric_id in the same transaction as the widget insert. There is no path by which a persisted widget references a metric that does not exist in the catalog.

Consequences¶

The user can answer "what is this widget actually showing?" for every widget, without leaving the dashboard. This was previously impossible.
A new widget that asks for "Cost avoided this month" resolves to the same metric_id as an existing tile asking the same question; the Clarifier no longer fragments the catalog.
The custom path's UX no longer leaks the implementation language to the user. Examples-driven inference (intent.custom_examples) replaces the TS interface paste; props are inferred from example rows in _custom_synth.
The hidden Bedrock-fallback bug is fixed. A baseline-run script and an extension to scripts/verify-acceptance.sh round-trip a real Bedrock spec → persist → catalog row, so the regression cannot recur silently.
Bedrock latency for the data path rose modestly (per-variant prompt + metric block in context); widget_llm_timeout_s was raised from 4.0s → 8.0s in backend/app/settings.py.
db/init.sql gains the metrics_catalog table and an idempotent seed; make demo-reset keeps the seed but truncates widgets. No destructive migration was needed for hackathon-scope deployments — the table is created on lifespan startup if absent.
The frontend gains one new API client (frontend/src/widgets/api-metrics.ts) and one new dependency-free component (MetricInfoBadge); no new npm packages.
ADR-006's metric_label lesson is now obsolete — captured as a SUPERSEDED entry pointing at this ADR.

Cross-references¶

ADR-005, ADR-006 — supersedes the metric_label: string field they introduced.
ADR-PROTO-002 — Part C anchors SQL gen to metrics_catalog, building on this ADR.
Lessons-learned: Bedrock tool-use rejects top-level oneOf schemas, Haiku 4.5 silently drops deeply-nested fields.
Implementation: backend/app/metrics/, backend/app/widgets/nodes/metric_matcher.py.