What's Mocked in the Prototype¶

Q&A defense for stakeholder review. Per PRD v2.1 §C.7. Pair with architecture.md (the system view) and PRD v2.1 §B (the production target). The principle is non-negotiable: never claim something is real when it isn't. Every demo question gets an honest answer that points at the Part B section where the real version is specified.

Quick reference¶

The 1-day prototype slice (PRD v2.1 §C, docs/plans/active/part-c-databricks-prototype.md) deliberately mocks or defers everything not on the critical path of the data-binding story. The honest accounting:

Element on screen	Real or Mocked	Honest answer if asked
Existing v1 KPI tiles (Active Issues, Claims, etc.)	Synthetic (Postgres seed)	"v1 demo data, unchanged from hackathon. Routes via `backend: postgres` per config/metric_routing.yaml."
Cost Avoided (MTD)	Postgres by default; flips to Databricks during §C.6.1	"Default fast path is the v1 Postgres synthetic figure (~$1.41M). The §C.6.1 demo moment is one YAML edit (`backend: postgres → databricks`) + `make up`; the same KPI tile re-renders against `l3_asurion.ev_claim` (~$375k trailing-30d), and the SourceBadge swaps green→purple. Both paths return the same single-row `[{value: N}]` shape — no renderer change. See demo-queries.md § Metric 4."
Claim Volume (Asurion)	REAL (Databricks SQL)	"Live query against Asurion's Databricks SQL Warehouse, against `workspace.l3_asurion.ev_claim` seeded from the canonical data dictionary."
Claims by Product (Asurion)	REAL (Databricks SQL)	Same. Joined to `ev_product_catalog` per the dictionary's approved `join_map.csv`.
Claim Status Mix (Asurion)	REAL (Databricks SQL)	Same.
AI recommendation rationale	Bedrock LLM (real call)	"Real Bedrock; deterministic scorer drives the decision per ADR-002."
Custom widget code generation	Bedrock LLM (real call)	"Real Bedrock; sandboxed by `@babel/standalone` for TSX (ADR-006). Iframe sandbox is Phase 2 — see ADR-006 evolution path."
Kafka in the architecture diagram	DIAGRAM ONLY	"Production transport. Phase 2 work, designed in PRD §B.4. The prototype proves the more uncertain piece (data binding); Kafka is straightforward production wiring per ADR-PROTO-001."
Bronze / Silver / Gold pipeline	DIAGRAM ONLY	"Designed in PRD §B.5. The data engineer's existing tables in `workspace.asurion_prototype` are our 'Bronze' for the v1 mirror; `workspace.l3_asurion` is the dictionary-shaped slice."
Mosaic AI Model Serving	Not present	"Trained models are PRD §B.6. Today's decisions are rules + Bedrock."
Multi-tenant security	Not present	"Single-user demo. iframe sandbox is PRD §B.7.2."
Vector DB for dictionary similarity	Not present	"ADR-PROTO-004 — entire dictionary fits in Claude's 200K context window. Vector DB becomes a scale concern post-prototype."
API Gateway + Lambda for SQL gen	Not present	"ADR-PROTO-003 — SQL generator runs in FastAPI. ~50 lines of integration vs ~200+ for Lambda packaging."

Detail by element¶

Kafka (DIAGRAM ONLY)¶

Status. Annotated in the architecture diagram with "Phase 2 — production transport". Not built.

Why deferred. Sub-second transport is invisible in a 90-second demo. Hours saved go to Databricks integration, where they produce visible demo value. Per ADR-PROTO-001.

What's actually running today. Direct event ingest into Postgres via POST /v1/events. The dashboard receives updates over pubsub:dashboard (Redis), which the WebSocket at /v1/dashboard/stream mirrors out.

Where the production version is specified. PRD v2.1 §B.4. Schema, partition strategy, retention — fully designed.

Bronze / Silver / Gold lakehouse (DIAGRAM ONLY)¶

Status. Designed in PRD §B.5. Not built as Bronze/Silver/Gold; the prototype uses the data engineer's existing tables directly.

What's actually running today. - workspace.asurion_prototype.* — v1 mirror tables seeded by make seed-databricks. Hits /v1/databricks/health. - workspace.l3_asurion.* — dictionary-shaped tables seeded by make seed-databricks-l3 (Prompt 2). Backs the 3 Databricks-routed demo metrics.

Both schemas coexist in the same Free Trial workspace.

Where the production version is specified. PRD v2.1 §B.5. Per ADR-PROTO-001's note: "the data engineer's existing tables are our Bronze for the prototype."

Mosaic AI Model Serving (NOT PRESENT)¶

Status. Not present in the prototype. Bedrock is the only LLM surface.

What's actually running today. Direct Anthropic Claude calls via Bedrock from backend/app/widgets/llm.py (Clarifier) and (Phase 2 of Part C) backend/app/sql_gen/generator.py (SQL gen — not yet in code).

Where the production version is specified. PRD v2.1 §B.6. Trained models, gateway, rate limiting, guardrails.

Multi-tenant security / iframe sandbox (NOT PRESENT)¶

Status. Single-user demo. Custom widgets compile via @babel/standalone in-process per ADR-006.

Why deferred. Security hardening doesn't change the demo arc. The TSX-evaluation contract is stable; the sandbox swap-out is mechanical.

Where the production version is specified. PRD v2.1 §B.7.2 (iframe sandbox); §B.6 (multi-tenant). Per ADR-006 evolution path.

Vector DB for dictionary RAG (NOT PRESENT)¶

Status. Not present. The full dictionary loads into memory at boot via backend/app/sql_gen/dictionary_loader.py; per-request, the SQL generator (Prompt 3) calls DataDictionary.subset_for_tables(...) to scope to the relevant tables and splices the slice into the Bedrock prompt.

Why deferred. Per ADR-PROTO-004: the full Asurion data dictionary (~52 tables × 1663 columns × 138 joins × 42 KPIs) fits comfortably in Claude's 200K context window. Vector DB setup + ingestion + retrieval tuning is ~1 day by itself.

When this matters. Vector DB becomes a scale concern post-prototype, when the dictionary outgrows context. Has its own future ADR.

API Gateway + Lambda for SQL gen (NOT PRESENT)¶

Status. SQL generation runs as a module inside backend/app/sql_gen/ (the package this session shipped). No new AWS infrastructure.

Why. Per ADR-PROTO-003: the existing FastAPI process already has a Bedrock client (ADR-005) and runs in the same container as the widget data resolver. ~50 lines of integration vs ~200+ for Lambda packaging. Cold-start latency eliminated.

When this matters. If a future use case needs the SQL generator from a different runtime (e.g. a Slack bot), it can be lifted to its own service later. Not a 1-day concern.

Status. All widgets stay private visibility. The metrics_catalog carries governance_status and approved_by columns reserved for the §B.5.4 governance flow, but no review pipeline exists.

Where the production version is specified. PRD v2.1 §B.5.4. The lineage columns are forward-compatible.

SQL generation safety (REAL, but anchored)¶

Worth calling out separately because it's the piece most likely to draw skepticism.

The fear. "LLM-generated SQL" sounds like every freeform NL-to-SQL system that hallucinates joins.

What we actually do. Per ADR-PROTO-002: SQL generation is anchored to a metrics_catalog entry. The LLM's job is bounded — take the catalog entry's source_query template, fill in dimensions and filters from the data_intent, return validated SQL. Free-text generation is rejected at the resolver layer (no metric_id → no Databricks query). The dictionary's join_map.csv is the allowlist; the safety layer (Prompt 3) rejects joins outside it.

What this buys. SQL quality is dramatically higher than freeform NL-to-SQL. Safety is structural, not bolted on.

The trade-off. Adding a new Databricks metric requires a metrics_catalog row first — by design, not by accident. The "ad hoc Databricks query" use case is not supported in the prototype.

Demo discipline¶

The risk register in PRD v2.1 §C.9 calls out the specific Q&A scenarios. The mitigation in every case is: have this doc open in a tab and quote it.

Risk	Mitigation
"Why isn't Kafka working?"	Quote ADR-PROTO-001 + PRD §B.4. "Diagram-only by design; Phase 2 work."
"Where's the vector DB?"	Quote ADR-PROTO-004. "Whole dictionary fits in context; vector DB is a scale concern."
"Is this real LLM-generated SQL?"	Yes, but anchored to `metrics_catalog`. Quote ADR-PROTO-002.
"Are these numbers real?"	Yes for the 3 Databricks-routed metrics; synthetic for the 10 Postgres metrics. Cost Avoided (MTD) is real on whichever side the routing currently points at — the SourceBadge tells you which. Show the SourceBadge on each tile.
"Why does the Cost Avoided number jump when you flip routing?"	The Postgres path is v1 synthetic data (`outcomes` table, MTD window). The Databricks path is `l3_asurion.ev_claim.cost_avoided_usd` (trailing-30d, Asurion-shaped seed). Different data, different windows, both real for their respective backends. The point of the §C.6.1 moment is precisely that the audience can SEE that swap happen live.

Cross-references¶

PRD v2.1 §C.7 — the source of truth for this table.
PRD v2.1 §B — the production target every "DIAGRAM ONLY" entry points at.
adrs/ — ADR-PROTO-001..005 cited above.
demo-queries.md — the actual SQL behind the three "REAL" rows.
data-dictionary.md — what makes the SQL generation safe.