Add Widget Clarifier — Implementation Notes¶

Companion to prd.md ADR-005, ADR-006, ADR-007, and §5.1 row 15.

This document is the source of truth for the SSE contract, the LangGraph topology, and the WidgetSpec schema. The frontend hook (frontend/src/widgets/useWidgetClarifier.ts) and the FastAPI router (backend/app/widgets/routes.py) must stay aligned with what is documented here.

Lineage¶

The pattern is a deliberate port of the Spine Clarifier from the sister sdlc-agent-swarms repo:

packages/agents-clarifier/src/graph/clarifier-graph.ts — node names + topology
packages/agents-clarifier/src/graph/state.ts — annotation reducers
packages/agents-clarifier/src/run.ts — interrupt + resume protocol
packages/dashboard/src/app/api/clarifier/route.ts + respond/route.ts — SSE event taxonomy
packages/dashboard/src/lib/hooks/use-clarifier-stream.ts — client phases

The differences from the source are intentional and small:

Source (Spine Clarifier)	Here (Add Widget Clarifier)
TypeScript LangGraph	Python LangGraph
Zod typed artifacts (`PRD`, `EnrichedRequirement`, `FeaturePlan`)	Pydantic discriminated `WidgetSpec` (`kpi` \| `chart` \| `table` \| `custom` per ADR-006)
Terminal phase = `complete`	Terminal phase = `preview`, then explicit Add to dashboard click persists
Optional RAG via Voyage + Qdrant + Cohere	Inline catalog only (no RAG in v1)
`interruptBefore: ['storyWriter', 'escalationGate']`	`interrupt_before=['specSynthesizer']`
`claude-sonnet-4` via Anthropic SDK	`claude-sonnet-4` via Bedrock + boto3 (ADR-002)

LangGraph topology¶

flowchart LR
    Start([__start__]) --> Ctx[contextLoader]
    Ctx --> Intent[intentExtractor]
    Intent --> Match[metricMatcher]
    Match --> Gap{gapDetector}
    Gap -->|universal + variant gaps| Q[questionPrioritizer]
    Q -.HITL pause.-> Wait((interrupt_before<br/>specSynthesizer))
    Wait -->|update_state + invoke None| Synth[specSynthesizer]
    Gap -->|no gaps| Synth
    Synth --> Crit{critic}
    Crit -->|valid| Done([END])
    Crit -->|invalid<br/>and round less than max| Update[specUpdater]
    Update --> Gap
    Crit -->|invalid<br/>and out of rounds| DoneErr([END with error])

metricMatcher (ADR-007) runs immediately after intentExtractor. It resolves intent.metric_id_guess against metrics_catalog (exact-name, then ilike on name/label/definition). On a high-confidence hit it writes state.catalog_match and state.metric_draft; the gap detector then treats the metric gap as already closed and skips that question. The graph can advance straight to specSynthesizer if the catalog already covers everything — in which case /clarify returns questions: [] and the frontend resumes via /respond with an empty answers array.

Source: backend/app/widgets/graph.py.

The graph is compiled once at import time and reused (functools.lru_cache) because InMemorySaver is process-local and per-request instances would lose state between SSE roundtrips.

State schema¶

backend/app/widgets/state.py.

class WidgetClarifierState(TypedDict, total=False):
    raw_input: str
    context: dict[str, Any]
    intent: WidgetIntent | None
    gaps: list[Gap]
    questions: list[Question]
    human_responses: Annotated[list[HumanResponse], _append]   # appending reducer
    spec_draft: dict[str, Any] | None
    spec: dict[str, Any] | None
    round: int
    max_rounds: int                     # default = settings.widget_clarifier_max_rounds (2)
    error: str | None
    thread_id: str

Only human_responses uses an appending reducer; everything else is last-write-wins, matching the Spine Clarifier's annotation defaults.

WidgetIntent carries a mode: "data" | "custom" flag (ADR-006). The synthesizer node body branches on this; the topology is unchanged. WidgetIntent also carries metric_id_guess (snake_case catalog name or null), time_window (e.g. last_7_days, MTD), and — for the custom path — custom_examples: list[dict] (1-3 example rows) instead of the deprecated data_shape TypeScript-interface field (ADR-007).

gapDetector checks universal-core gaps (metric, time_window) first. The metric gap is auto-satisfied when metricMatcher produced a high-confidence match. Variant-specific gaps come second:

Variant	Variant-specific gaps
`kpi`	`value_format`, `accent`
`chart`	`chart_kind`, `dimensions`
`table`	`columns`
`custom`	`custom_examples` (if missing), `layout`, `accent`

questionPrioritizer builds the metric question from the live catalog (single-select with hint populated from each row's definition) and appends a "Define a new metric" option. Selecting it triggers plain-English sub-questions for name/label/definition/formula/ entity/unit/default_filter — the user is never asked for a type signature.

SSE contract¶

All SSE bodies are JSON. Frames follow the standard event: <name>\ndata: <json>\n\n shape.

`POST /v1/widgets/clarify`¶

Start a new session.

Request body

{ "raw_input": "Show cost avoided by region for the last 7 days as a chart" }

Stream

event: stage
data: {"stage": "started", "thread_id": "0bbe6dc1..."}

event: stage
data: {"stage": "contextLoader", "thread_id": "0bbe6dc1..."}

event: stage
data: {"stage": "intentExtractor", "thread_id": "0bbe6dc1..."}

event: stage
data: {"stage": "gapDetector", "thread_id": "0bbe6dc1..."}

event: stage
data: {"stage": "questionPrioritizer", "thread_id": "0bbe6dc1..."}

event: stage
data: {"stage": "__interrupt__", "thread_id": "0bbe6dc1..."}

event: result
data: {
  "thread_id": "0bbe6dc1...",
  "interrupted": true,
  "next": ["specSynthesizer"],
  "questions": [{ "id": "q_chart:metric_label", "field": "metric_label", "prompt": "...", "kind": "free_text", "options": [], "required": true }],
  "gaps": [...],
  "intent": { "type": "chart", "chart_kind": "line", "dimensions": ["region"], "entity": "outcomes" },
  "spec": null,
  "spec_draft": null,
  "round": 0,
  "max_rounds": 2,
  "error": null,
  "raw_input": "Show cost avoided by region for the last 7 days as a chart"
}

`POST /v1/widgets/clarify/respond`¶

Resume an interrupted run.

Request body

{
  "thread_id": "0bbe6dc1...",
  "answers": [
    {
      "question_id": "q_chart:metric_label",
      "answer": "Cost avoided in USD",
      "selected_option": null
    }
  ]
}

Or to abort:

{ "thread_id": "0bbe6dc1...", "abandon": true }

Stream — same shape as /clarify. The terminal result event will have interrupted: false and either:

spec populated (Pydantic-validated WidgetSpec), or
error populated (e.g. validation failure) plus interrupted: false, or
abandoned: true (if the user cancelled).

Event taxonomy¶

event name	when	data shape
`stage`	After each LangGraph node executes (and on `__interrupt__`).	`{ stage: string, thread_id: string }`
`result`	At the end of every `/clarify` and `/respond` request.	Full snapshot — see above.
`error`	On any unhandled exception inside the runner.	`{ message: string, thread_id?: string }`

Persistence endpoints¶

Method	Path	Purpose
`POST`	`/v1/widgets`	Persist a `WidgetSpec` after the user clicks Add to dashboard. Atomically promotes one-off metrics into `metrics_catalog` (ADR-007).
`GET`	`/v1/widgets`	List persisted widgets where `placement = 'rail'` (used by `MyWidgetsRail`).
`PATCH`	`/v1/widgets/{widget_id}`	Update `placement` to `dismissed` (the X button on each rail card).
`GET`	`/v1/metrics`	List the metric catalog (used by the Clarifier prompt and the frontend).
`GET`	`/v1/metrics/{metric_id}`	Read a single catalog entry.
`POST`	`/v1/metrics`	Create a new catalog entry directly (used by the "Define a new metric" sub-flow when the user wants to register without a widget).

Schema: backend/app/widgets/schemas.py StoredWidget, backend/app/metrics/schemas.py MetricDefinition / CatalogMetric.

The widgets and metrics_catalog tables are created by db/init.sql; defensive CREATE TABLE IF NOT EXISTS calls run on app startup so dev DBs that booted before the DDL was added pick up the tables without manual migration. On first boot the catalog is seeded with 10 dashboard-derived metrics (backend/app/metrics/seed.py); the seed is idempotent on name.

Atomic metric promotion at persist (ADR-007)¶

POST /v1/widgets calls _validate_and_promote_metric before persisting. The block is rejected if spec.metric is missing or internally inconsistent (e.g. metric_id references a catalog row that doesn't exist). When spec.metric.metric_id is null (a one-off metric authored via "Define a new metric"), the metric is inserted into metrics_catalog and the spec is rewritten with the resulting metric_id in the same transaction as the widget insert. There is no path by which a persisted widget references a metric that does not exist in the catalog.

`WidgetSpec` JSON Schema (summary)¶

The discriminator is type. See backend/app/widgets/schemas.py for the authoritative Pydantic models — what follows is a reading guide.

Every variant carries a metric: MetricDefinition block (ADR-007):

"metric": {
  "metric_id": "e6181adf-4ccb-4f59-99b4-68e16ac31b0b",  // null for one-off metrics; promoted on persist
  "name": "cost_avoided_mtd",                           // snake_case catalog identifier
  "label": "Cost avoided (MTD)",                        // user-facing
  "definition": "Total dollars saved by remote-fix outcomes month-to-date.",
  "formula": "SUM(outcomes.cost_avoided_usd) WHERE month = current_month",
  "entity": "outcomes",
  "unit": "currency",
  "default_filter": { "window": "MTD" },
  "default_refresh_seconds": 60,
  "owner": "jane.smith@asurion.com",
  "version": 1
}

`kpi`¶

{
  "type": "kpi",
  "title": "Avg repair completion time",
  "metric": { "...MetricDefinition..." },
  "value_format": "duration_minutes",            // number | currency | percent | duration_minutes
  "accent": "violet",                             // indigo | emerald | violet | amber | rose | sky
  "show_sparkline": true,
  "show_delta": true,
  "mock_data": {
    "value": 47.2,
    "delta_pct": -0.085,
    "delta_label": "vs prior 7 days",
    "sparkline": [52, 51, 49, 50, 48, 47, 46, 47.2]
  },
  "data_intent": {
    "entity": "outcomes",
    "metric": "avg(repair_minutes)",
    "dimensions": [],
    "filters": { "window": "last_7_days" },
    "refresh_seconds": 60
  }
}

`chart`¶

{
  "type": "chart",
  "chart_kind": "line",                            // line | bar | area
  "title": "Cost avoided by region (last 7 days)",
  "metric": { "...MetricDefinition..." },
  "x_axis": { "label": "Day", "field": "day" },
  "y_axis": { "label": "Cost avoided ($)", "field": "cost_avoided" },
  "series": [
    { "label": "TX", "field": "tx", "accent": "indigo" },
    { "label": "CA", "field": "ca", "accent": "emerald" }
  ],
  "mock_data": [
    { "day": "Mon", "tx": 14000, "ca": 12000, "cost_avoided": 26000 }
  ],
  "data_intent": { ... }
}

Every field referenced by x_axis / y_axis / series must exist in every row of mock_data — enforced by the synthesizer prompt (backend/app/widgets/prompts/spec_synthesizer.md).

`table`¶

{
  "type": "table",
  "title": "Top regions by open issues",
  "metric": { "...MetricDefinition..." },
  "columns": [
    { "label": "Region", "field": "region", "kind": "text" },
    { "label": "Open issues", "field": "count", "kind": "number" },
    { "label": "Trend", "field": "trend", "kind": "percent" }
  ],
  "sort_default": "count",
  "mock_data": [
    { "region": "TX", "count": 412, "trend": 0.08 }
  ],
  "data_intent": { ... }
}

kind controls renderer formatting (currency, percent, number, datetime, badge, default text).

`custom` (ADR-006)¶

{
  "type": "custom",
  "title": "Alerts & Activity Feed",
  "metric": { "...MetricDefinition..." },
  "component": {
    "display_name": "GeneratedAlertsFeed",
    "props_interface": "interface Props { alerts: Alert[]; nowIso: string }",
    "tsx_source": "interface Props { ... }\nexport function GeneratedAlertsFeed({ alerts, nowIso }: Props) { return ( ... ); }\n",
    "imports_used": [],
    "tailwind_classes_used": ["rounded-lg", "border", "..."],
    "assumptions": ["Severity icons use first letter (! / i)"],
    "severity_color_map": { "critical": "rose", "warning": "amber", "info": "sky" }
  },
  "mock_data": { "alerts": [ ... ], "nowIso": "2026-05-05T22:00:00Z" },
  "data_intent": { ... }
}

component.tsx_source MUST contain ZERO import statements — the renderer evaluates it in a sealed scope where the only injected globals are React and Icon (per ADR-010, the lucide-react wrapper at frontend/src/components/icons.tsx; see frontend/src/widgets/CustomWidgetRenderer.tsx). The synthesizer prompt (backend/app/widgets/prompts/component_synthesizer.md) enforces this; the persistence endpoint (POST /v1/widgets) re-checks via backend/app/widgets/validators.py and rejects with HTTP 422 on failure.

mock_data keys MUST match the inline Props interface — this is what the renderer passes to the generated component.

LLM behavior¶

Setting	Default	Effect
`builder_mode`	`live`	ADR-008. The canonical switch: `live` requires Bedrock (`intentExtractor` and `specSynthesizer` MUST reach Bedrock; failures raise `BuilderModeError` and surface as SSE `kind: builder_unavailable`). `offline` routes both nodes to `MockLlm` and shows an "Offline mode" pill in the dashboard header.
`use_bedrock`	`True`	Legacy co-equal opt-out. Either env var alone is enough to flip resolved mode to `offline` — `resolved_builder_mode()` in `backend/app/settings.py` ORs them. `MockLlm` is reached only when resolved mode is `offline`.
`aws_region`	`us-east-1`	Passed to `boto3.client("bedrock-runtime")`.
`bedrock_model_id`	`us.anthropic.claude-sonnet-4-20250514-v1:0`	Inference profile id (matches PRD §10.5).
`widget_llm_timeout_s`	`8.0`	Wall-clock budget per Bedrock call. In `live` mode (default), a timeout or transport error raises `BuilderModeError` and surfaces as an SSE `event: error` with `kind: builder_unavailable` — the modal shows an error banner; there is no silent MockLlm substitution (ADR-008). In `offline` mode (`make up-offline`), `MockLlm` is the backend from the start. Raised from 4.0s → 8.0s in ADR-007 once per-variant data-path schemas added a metric block to every prompt.
`widget_clarifier_max_rounds`	`2`	After this many synthesizer attempts the graph ends with the last error.

Bedrock is forced into JSON via tool-use:

{
  "tools": [{ "name": "emit_payload", "input_schema": <json_schema> }],
  "tool_choice": { "type": "tool", "name": "emit_payload" }
}

For the data path of specSynthesizer, the input_schema is the JSON Schema for the per-variant Pydantic model — KpiSpec, ChartSpec, or TableSpec — selected by intent.type. We do not pass the top-level WidgetSpec discriminated union because Anthropic's tool-use API rejects schemas that root in oneOf instead of a flat type: "object" (see lessons-learned: Bedrock tool-use rejects top-level oneOf schemas). The discriminated union remains the source of truth for the database and the frontend; we flatten only at the LLM boundary.

For the custom path, ADR-007 introduced a two-stage synthesis to work around Haiku 4.5 silently dropping deeply-nested fields:

Stage 1 (LLM): _custom_synth asks Bedrock for a flat ComponentSpec only — TSX source, props interface (inferred from intent.custom_examples), imports_used, tailwind_classes_used, assumptions, severity_color_map. The schema is the JSON Schema from pydantic.TypeAdapter(ComponentSpec).json_schema(). The call uses max_tokens=4096 (default 1024 truncates React components).
Stage 2 (Python): the node deterministically wraps the ComponentSpec in a CustomSpec envelope. metric is the resolved block from metricMatcher / user answers; data_intent is derived from MetricDefinition (_derive_data_intent); mock_data is derived from intent.custom_examples (_derive_mock_data).

The custom path still uses a longer Bedrock timeout (20s vs the global 8s default) because TSX bodies are larger than discriminated configs.

Custom-path renderer (ADR-006, ADR-010)¶

frontend/src/widgets/CustomWidgetRenderer.tsx compiles tsx_source at render time using @babel/standalone (presets: typescript + react) and evaluates the resulting JS via new Function("React", "Icon", code)(React, Icon). The injected scope contains exactly two globals — React and Icon (a kebab-case wrapper around lucide-react defined at frontend/src/components/icons.tsx, ADR-010); defensive import-stripping runs first.

The Icon global accepts <Icon name="alert-triangle" className="..." /> where name is any kebab-case Lucide icon name. The curated catalog (~80 names listed in the synthesizer prompt) is the LLM's strong default; any of the ~1500 Lucide icons resolves at runtime via dynamic PascalCase lookup, with a <HelpCircle/> + console.warn fallback for truly unknown names. The no-imports static check is unchanged.

Failures land in one of two error cards:

Failure	UI
Babel `transform` throws (parse error, banned syntax)	"Compile failed (compile)" red card
No `export function\|const ComponentName` line found	"Compile failed (lookup)" red card
`new Function(...)` throws	"Compile failed (factory)" red card
Component throws while rendering	React error-boundary "Render failed" card

The persistence endpoint (POST /v1/widgets) runs run_static_checks on spec.component for every type == "custom" payload. Failures return HTTP 422 with a list of failed checks, not the raw payload.

Pipeline visualization¶

The WidgetBuilderModal right panel includes a Pipeline tab (alongside the existing Preview tab) that renders a ReactFlow DAG of the 8-node LangGraph topology. The graph animates in real time as SSE stage events arrive.

Source: frontend/src/widgets/pipeline/.

Graph state tracking¶

The useWidgetClarifier hook (frontend/src/widgets/useWidgetClarifier.ts) exposes three additional fields for the graph:

Field	Type	Purpose
`activeNode`	`string \\| null`	Node currently executing (pulsing blue dot)
`completedNodes`	`ReadonlySet<string>`	Nodes that have finished (green dot)
`interruptedAt`	`string \\| null`	Node where the HITL gate paused execution (amber dot)

Only the 8 real graph nodes are tracked (GRAPH_NODES set). Synthetic stages (started, answers_merged, __interrupt__) are excluded.

When a stage SSE event arrives for a graph node, the hook marks it completed and advances activeNode to the next node in the linear flow via the NEXT_NODE lookup table. This is a best-guess approximation — conditional edges (e.g. gapDetector skipping to specSynthesizer when no gaps exist) may briefly show the wrong next node, but the result event at stream end provides the definitive state.

Auto-switch behavior¶

The ViewToggle auto-switches based on clarifier phase:

running with no spec → Pipeline tab activates (unless the user manually toggled).
Spec arrives → Preview tab activates, user override resets.
User clicks a tab → Override is set; auto-switch is suppressed until spec arrives.

E2E tests¶

Playwright E2E tests live at frontend/e2e/pipeline-graph.spec.ts. Run with cd frontend && npm run test:e2e (requires a running stack via make up-offline).

What's deferred¶

These are explicitly out of scope for v1 of the Add Widget Clarifier (see plan §"Out of scope"):

Live data binding for chat-built widgets — handled by a follow-up story using the data_intent block as the contract.
Drag-to-reorder, resize, multi-page layouts. The rail is a single grid.
Editing an existing widget — v1 supports create + dismiss only.
RAG / "evolution" mode of the Spine Clarifier. No Qdrant / Voyage in v1.
Multi-user authz — single-user demo per PRD §14.