Add Widget Clarifier — Implementation Notes¶
Companion to prd.md ADR-005, ADR-006, ADR-007, and §5.1 row 15.
This document is the source of truth for the SSE contract, the LangGraph
topology, and the WidgetSpec schema. The frontend hook
(frontend/src/widgets/useWidgetClarifier.ts)
and the FastAPI router (backend/app/widgets/routes.py)
must stay aligned with what is documented here.
Lineage¶
The pattern is a deliberate port of the Spine Clarifier from the
sister sdlc-agent-swarms repo:
packages/agents-clarifier/src/graph/clarifier-graph.ts— node names + topologypackages/agents-clarifier/src/graph/state.ts— annotation reducerspackages/agents-clarifier/src/run.ts— interrupt + resume protocolpackages/dashboard/src/app/api/clarifier/route.ts+respond/route.ts— SSE event taxonomypackages/dashboard/src/lib/hooks/use-clarifier-stream.ts— client phases
The differences from the source are intentional and small:
| Source (Spine Clarifier) | Here (Add Widget Clarifier) |
|---|---|
| TypeScript LangGraph | Python LangGraph |
Zod typed artifacts (PRD, EnrichedRequirement, FeaturePlan) |
Pydantic discriminated WidgetSpec (kpi | chart | table | custom per ADR-006) |
Terminal phase = complete |
Terminal phase = preview, then explicit Add to dashboard click persists |
| Optional RAG via Voyage + Qdrant + Cohere | Inline catalog only (no RAG in v1) |
interruptBefore: ['storyWriter', 'escalationGate'] |
interrupt_before=['specSynthesizer'] |
claude-sonnet-4 via Anthropic SDK |
claude-sonnet-4 via Bedrock + boto3 (ADR-002) |
LangGraph topology¶
flowchart LR
Start([__start__]) --> Ctx[contextLoader]
Ctx --> Intent[intentExtractor]
Intent --> Match[metricMatcher]
Match --> Gap{gapDetector}
Gap -->|universal + variant gaps| Q[questionPrioritizer]
Q -.HITL pause.-> Wait((interrupt_before<br/>specSynthesizer))
Wait -->|update_state + invoke None| Synth[specSynthesizer]
Gap -->|no gaps| Synth
Synth --> Crit{critic}
Crit -->|valid| Done([END])
Crit -->|invalid<br/>and round less than max| Update[specUpdater]
Update --> Gap
Crit -->|invalid<br/>and out of rounds| DoneErr([END with error])
metricMatcher (ADR-007) runs immediately after intentExtractor. It
resolves intent.metric_id_guess against metrics_catalog (exact-name,
then ilike on name/label/definition). On a high-confidence hit
it writes state.catalog_match and state.metric_draft; the gap
detector then treats the metric gap as already closed and skips that
question. The graph can advance straight to specSynthesizer if the
catalog already covers everything — in which case /clarify returns
questions: [] and the frontend resumes via /respond with an empty
answers array.
Source: backend/app/widgets/graph.py.
The graph is compiled once at import time and reused (functools.lru_cache)
because InMemorySaver is process-local and per-request instances would
lose state between SSE roundtrips.
State schema¶
Only human_responses uses an appending reducer; everything else is
last-write-wins, matching the Spine Clarifier's annotation defaults.
WidgetIntent carries a mode: "data" | "custom" flag (ADR-006). The
synthesizer node body branches on this; the topology is unchanged.
WidgetIntent also carries metric_id_guess (snake_case catalog name
or null), time_window (e.g. last_7_days, MTD), and — for the
custom path — custom_examples: list[dict] (1-3 example rows) instead
of the deprecated data_shape TypeScript-interface field (ADR-007).
gapDetector checks universal-core gaps (metric, time_window)
first. The metric gap is auto-satisfied when metricMatcher produced
a high-confidence match. Variant-specific gaps come second:
| Variant | Variant-specific gaps |
|---|---|
kpi |
value_format, accent |
chart |
chart_kind, dimensions |
table |
columns |
custom |
custom_examples (if missing), layout, accent |
questionPrioritizer builds the metric question from the live catalog
(single-select with hint populated from each row's definition) and
appends a "Define a new metric" option. Selecting it triggers
plain-English sub-questions for name/label/definition/formula/
entity/unit/default_filter — the user is never asked for a
type signature.
SSE contract¶
All SSE bodies are JSON. Frames follow the standard event: <name>\ndata: <json>\n\n shape.
POST /v1/widgets/clarify¶
Start a new session.
Request body
Stream
POST /v1/widgets/clarify/respond¶
Resume an interrupted run.
Request body
Or to abort:
Stream — same shape as /clarify. The terminal result event will have
interrupted: false and either:
specpopulated (Pydantic-validatedWidgetSpec), orerrorpopulated (e.g. validation failure) plusinterrupted: false, orabandoned: true(if the user cancelled).
Event taxonomy¶
| event name | when | data shape |
|---|---|---|
stage |
After each LangGraph node executes (and on __interrupt__). |
{ stage: string, thread_id: string } |
result |
At the end of every /clarify and /respond request. |
Full snapshot — see above. |
error |
On any unhandled exception inside the runner. | { message: string, thread_id?: string } |
Persistence endpoints¶
| Method | Path | Purpose |
|---|---|---|
POST |
/v1/widgets |
Persist a WidgetSpec after the user clicks Add to dashboard. Atomically promotes one-off metrics into metrics_catalog (ADR-007). |
GET |
/v1/widgets |
List persisted widgets where placement = 'rail' (used by MyWidgetsRail). |
PATCH |
/v1/widgets/{widget_id} |
Update placement to dismissed (the X button on each rail card). |
GET |
/v1/metrics |
List the metric catalog (used by the Clarifier prompt and the frontend). |
GET |
/v1/metrics/{metric_id} |
Read a single catalog entry. |
POST |
/v1/metrics |
Create a new catalog entry directly (used by the "Define a new metric" sub-flow when the user wants to register without a widget). |
Schema: backend/app/widgets/schemas.py StoredWidget,
backend/app/metrics/schemas.py MetricDefinition / CatalogMetric.
The widgets and metrics_catalog tables are created by
db/init.sql; defensive CREATE TABLE IF NOT EXISTS
calls run on app startup so dev DBs that booted before the DDL was
added pick up the tables without manual migration. On first boot the
catalog is seeded with 10 dashboard-derived metrics
(backend/app/metrics/seed.py); the
seed is idempotent on name.
Atomic metric promotion at persist (ADR-007)¶
POST /v1/widgets calls _validate_and_promote_metric before
persisting. The block is rejected if spec.metric is missing or
internally inconsistent (e.g. metric_id references a catalog row that
doesn't exist). When spec.metric.metric_id is null (a one-off metric
authored via "Define a new metric"), the metric is inserted into
metrics_catalog and the spec is rewritten with the resulting
metric_id in the same transaction as the widget insert. There is
no path by which a persisted widget references a metric that does not
exist in the catalog.
WidgetSpec JSON Schema (summary)¶
The discriminator is type. See backend/app/widgets/schemas.py
for the authoritative Pydantic models — what follows is a reading guide.
Every variant carries a metric: MetricDefinition block (ADR-007):
kpi¶
chart¶
Every field referenced by x_axis / y_axis / series must exist in
every row of mock_data — enforced by the synthesizer prompt
(backend/app/widgets/prompts/spec_synthesizer.md).
table¶
kind controls renderer formatting (currency, percent, number,
datetime, badge, default text).
custom (ADR-006)¶
component.tsx_source MUST contain ZERO import statements — the
renderer evaluates it in a sealed scope where the only injected globals
are React and Icon (per ADR-010, the lucide-react wrapper at
frontend/src/components/icons.tsx;
see frontend/src/widgets/CustomWidgetRenderer.tsx).
The synthesizer prompt
(backend/app/widgets/prompts/component_synthesizer.md)
enforces this; the persistence endpoint (POST /v1/widgets) re-checks
via backend/app/widgets/validators.py
and rejects with HTTP 422 on failure.
mock_data keys MUST match the inline Props interface — this is what
the renderer passes to the generated component.
LLM behavior¶
| Setting | Default | Effect |
|---|---|---|
builder_mode |
live |
ADR-008. The canonical switch: live requires Bedrock (intentExtractor and specSynthesizer MUST reach Bedrock; failures raise BuilderModeError and surface as SSE kind: builder_unavailable). offline routes both nodes to MockLlm and shows an "Offline mode" pill in the dashboard header. |
use_bedrock |
True |
Legacy co-equal opt-out. Either env var alone is enough to flip resolved mode to offline — resolved_builder_mode() in backend/app/settings.py ORs them. MockLlm is reached only when resolved mode is offline. |
aws_region |
us-east-1 |
Passed to boto3.client("bedrock-runtime"). |
bedrock_model_id |
us.anthropic.claude-sonnet-4-20250514-v1:0 |
Inference profile id (matches PRD §10.5). |
widget_llm_timeout_s |
8.0 |
Wall-clock budget per Bedrock call. In live mode (default), a timeout or transport error raises BuilderModeError and surfaces as an SSE event: error with kind: builder_unavailable — the modal shows an error banner; there is no silent MockLlm substitution (ADR-008). In offline mode (make up-offline), MockLlm is the backend from the start. Raised from 4.0s → 8.0s in ADR-007 once per-variant data-path schemas added a metric block to every prompt. |
widget_clarifier_max_rounds |
2 |
After this many synthesizer attempts the graph ends with the last error. |
Bedrock is forced into JSON via tool-use:
For the data path of specSynthesizer, the input_schema is the JSON
Schema for the per-variant Pydantic model — KpiSpec, ChartSpec,
or TableSpec — selected by intent.type. We do not pass the
top-level WidgetSpec discriminated union because Anthropic's tool-use
API rejects schemas that root in oneOf instead of a flat type:
"object" (see lessons-learned: Bedrock tool-use rejects top-level
oneOf schemas). The discriminated union remains the source of truth
for the database and the frontend; we flatten only at the LLM boundary.
For the custom path, ADR-007 introduced a two-stage synthesis to work around Haiku 4.5 silently dropping deeply-nested fields:
- Stage 1 (LLM):
_custom_synthasks Bedrock for a flatComponentSpeconly — TSX source, props interface (inferred fromintent.custom_examples),imports_used,tailwind_classes_used,assumptions,severity_color_map. The schema is the JSON Schema frompydantic.TypeAdapter(ComponentSpec).json_schema(). The call usesmax_tokens=4096(default 1024 truncates React components). - Stage 2 (Python): the node deterministically wraps the
ComponentSpecin aCustomSpecenvelope.metricis the resolved block frommetricMatcher/ user answers;data_intentis derived fromMetricDefinition(_derive_data_intent);mock_datais derived fromintent.custom_examples(_derive_mock_data).
The custom path still uses a longer Bedrock timeout (20s vs the global 8s default) because TSX bodies are larger than discriminated configs.
Custom-path renderer (ADR-006, ADR-010)¶
frontend/src/widgets/CustomWidgetRenderer.tsx
compiles tsx_source at render time using @babel/standalone (presets:
typescript + react) and evaluates the resulting JS via
new Function("React", "Icon", code)(React, Icon). The injected scope
contains exactly two globals — React and Icon (a kebab-case wrapper
around lucide-react defined at
frontend/src/components/icons.tsx,
ADR-010); defensive import-stripping runs first.
The Icon global accepts <Icon name="alert-triangle" className="..." />
where name is any kebab-case Lucide icon name. The curated catalog
(~80 names listed in the synthesizer prompt) is the LLM's strong default;
any of the ~1500 Lucide icons resolves at runtime via dynamic
PascalCase lookup, with a <HelpCircle/> + console.warn fallback for
truly unknown names. The no-imports static check is unchanged.
Failures land in one of two error cards:
| Failure | UI |
|---|---|
Babel transform throws (parse error, banned syntax) |
"Compile failed (compile)" red card |
No export function|const ComponentName line found |
"Compile failed (lookup)" red card |
new Function(...) throws |
"Compile failed (factory)" red card |
| Component throws while rendering | React error-boundary "Render failed" card |
The persistence endpoint (POST /v1/widgets) runs run_static_checks
on spec.component for every type == "custom" payload. Failures
return HTTP 422 with a list of failed checks, not the raw payload.
Pipeline visualization¶
The WidgetBuilderModal right panel includes a Pipeline tab
(alongside the existing Preview tab) that renders a ReactFlow DAG
of the 8-node LangGraph topology. The graph animates in real time as
SSE stage events arrive.
Source: frontend/src/widgets/pipeline/.
Graph state tracking¶
The useWidgetClarifier hook
(frontend/src/widgets/useWidgetClarifier.ts)
exposes three additional fields for the graph:
| Field | Type | Purpose |
|---|---|---|
activeNode |
string \| null |
Node currently executing (pulsing blue dot) |
completedNodes |
ReadonlySet<string> |
Nodes that have finished (green dot) |
interruptedAt |
string \| null |
Node where the HITL gate paused execution (amber dot) |
Only the 8 real graph nodes are tracked (GRAPH_NODES set). Synthetic
stages (started, answers_merged, __interrupt__) are excluded.
When a stage SSE event arrives for a graph node, the hook marks it
completed and advances activeNode to the next node in the linear
flow via the NEXT_NODE lookup table. This is a best-guess
approximation — conditional edges (e.g. gapDetector skipping to
specSynthesizer when no gaps exist) may briefly show the wrong next
node, but the result event at stream end provides the definitive
state.
Auto-switch behavior¶
The ViewToggle auto-switches based on clarifier phase:
runningwith no spec → Pipeline tab activates (unless the user manually toggled).- Spec arrives → Preview tab activates, user override resets.
- User clicks a tab → Override is set; auto-switch is suppressed until spec arrives.
E2E tests¶
Playwright E2E tests live at
frontend/e2e/pipeline-graph.spec.ts.
Run with cd frontend && npm run test:e2e (requires a running stack
via make up-offline).
What's deferred¶
These are explicitly out of scope for v1 of the Add Widget Clarifier (see plan §"Out of scope"):
- Live data binding for chat-built widgets — handled by a follow-up story
using the
data_intentblock as the contract. - Drag-to-reorder, resize, multi-page layouts. The rail is a single grid.
- Editing an existing widget — v1 supports create + dismiss only.
- RAG / "evolution" mode of the Spine Clarifier. No Qdrant / Voyage in v1.
- Multi-user authz — single-user demo per PRD §14.