Skip to content

What's Mocked in the Prototype

Q&A defense for stakeholder review. Per PRD v2.1 §C.7. Pair with architecture.md (the system view) and PRD v2.1 §B (the production target). The principle is non-negotiable: never claim something is real when it isn't. Every demo question gets an honest answer that points at the Part B section where the real version is specified.

Quick reference

The 1-day prototype slice (PRD v2.1 §C, docs/plans/active/part-c-databricks-prototype.md) deliberately mocks or defers everything not on the critical path of the data-binding story. The honest accounting:

Element on screen Real or Mocked Honest answer if asked
Existing v1 KPI tiles (Active Issues, Claims, etc.) Synthetic (Postgres seed) "v1 demo data, unchanged from hackathon. Routes via backend: postgres per config/metric_routing.yaml."
Cost Avoided (MTD) Postgres by default; flips to Databricks during §C.6.1 "Default fast path is the v1 Postgres synthetic figure (~$1.41M). The §C.6.1 demo moment is one YAML edit (backend: postgres → databricks) + make up; the same KPI tile re-renders against l3_asurion.ev_claim (~$375k trailing-30d), and the SourceBadge swaps green→purple. Both paths return the same single-row [{value: N}] shape — no renderer change. See demo-queries.md § Metric 4."
Claim Volume (Asurion) REAL (Databricks SQL) "Live query against Asurion's Databricks SQL Warehouse, against workspace.l3_asurion.ev_claim seeded from the canonical data dictionary."
Claims by Product (Asurion) REAL (Databricks SQL) Same. Joined to ev_product_catalog per the dictionary's approved join_map.csv.
Claim Status Mix (Asurion) REAL (Databricks SQL) Same.
AI recommendation rationale Bedrock LLM (real call) "Real Bedrock; deterministic scorer drives the decision per ADR-002."
Custom widget code generation Bedrock LLM (real call) "Real Bedrock; sandboxed by @babel/standalone for TSX (ADR-006). Iframe sandbox is Phase 2 — see ADR-006 evolution path."
Kafka in the architecture diagram DIAGRAM ONLY "Production transport. Phase 2 work, designed in PRD §B.4. The prototype proves the more uncertain piece (data binding); Kafka is straightforward production wiring per ADR-PROTO-001."
Bronze / Silver / Gold pipeline DIAGRAM ONLY "Designed in PRD §B.5. The data engineer's existing tables in workspace.asurion_prototype are our 'Bronze' for the v1 mirror; workspace.l3_asurion is the dictionary-shaped slice."
Mosaic AI Model Serving Not present "Trained models are PRD §B.6. Today's decisions are rules + Bedrock."
Multi-tenant security Not present "Single-user demo. iframe sandbox is PRD §B.7.2."
Vector DB for dictionary similarity Not present "ADR-PROTO-004 — entire dictionary fits in Claude's 200K context window. Vector DB becomes a scale concern post-prototype."
API Gateway + Lambda for SQL gen Not present "ADR-PROTO-003 — SQL generator runs in FastAPI. ~50 lines of integration vs ~200+ for Lambda packaging."

Detail by element

Kafka (DIAGRAM ONLY)

Status. Annotated in the architecture diagram with "Phase 2 — production transport". Not built.

Why deferred. Sub-second transport is invisible in a 90-second demo. Hours saved go to Databricks integration, where they produce visible demo value. Per ADR-PROTO-001.

What's actually running today. Direct event ingest into Postgres via POST /v1/events. The dashboard receives updates over pubsub:dashboard (Redis), which the WebSocket at /v1/dashboard/stream mirrors out.

Where the production version is specified. PRD v2.1 §B.4. Schema, partition strategy, retention — fully designed.

Bronze / Silver / Gold lakehouse (DIAGRAM ONLY)

Status. Designed in PRD §B.5. Not built as Bronze/Silver/Gold; the prototype uses the data engineer's existing tables directly.

What's actually running today. - workspace.asurion_prototype.* — v1 mirror tables seeded by make seed-databricks. Hits /v1/databricks/health. - workspace.l3_asurion.* — dictionary-shaped tables seeded by make seed-databricks-l3 (Prompt 2). Backs the 3 Databricks-routed demo metrics.

Both schemas coexist in the same Free Trial workspace.

Where the production version is specified. PRD v2.1 §B.5. Per ADR-PROTO-001's note: "the data engineer's existing tables are our Bronze for the prototype."

Mosaic AI Model Serving (NOT PRESENT)

Status. Not present in the prototype. Bedrock is the only LLM surface.

What's actually running today. Direct Anthropic Claude calls via Bedrock from backend/app/widgets/llm.py (Clarifier) and (Phase 2 of Part C) backend/app/sql_gen/generator.py (SQL gen — not yet in code).

Where the production version is specified. PRD v2.1 §B.6. Trained models, gateway, rate limiting, guardrails.

Multi-tenant security / iframe sandbox (NOT PRESENT)

Status. Single-user demo. Custom widgets compile via @babel/standalone in-process per ADR-006.

Why deferred. Security hardening doesn't change the demo arc. The TSX-evaluation contract is stable; the sandbox swap-out is mechanical.

Where the production version is specified. PRD v2.1 §B.7.2 (iframe sandbox); §B.6 (multi-tenant). Per ADR-006 evolution path.

Vector DB for dictionary RAG (NOT PRESENT)

Status. Not present. The full dictionary loads into memory at boot via backend/app/sql_gen/dictionary_loader.py; per-request, the SQL generator (Prompt 3) calls DataDictionary.subset_for_tables(...) to scope to the relevant tables and splices the slice into the Bedrock prompt.

Why deferred. Per ADR-PROTO-004: the full Asurion data dictionary (~52 tables × 1663 columns × 138 joins × 42 KPIs) fits comfortably in Claude's 200K context window. Vector DB setup + ingestion + retrieval tuning is ~1 day by itself.

When this matters. Vector DB becomes a scale concern post-prototype, when the dictionary outgrows context. Has its own future ADR.

API Gateway + Lambda for SQL gen (NOT PRESENT)

Status. SQL generation runs as a module inside backend/app/sql_gen/ (the package this session shipped). No new AWS infrastructure.

Why. Per ADR-PROTO-003: the existing FastAPI process already has a Bedrock client (ADR-005) and runs in the same container as the widget data resolver. ~50 lines of integration vs ~200+ for Lambda packaging. Cold-start latency eliminated.

When this matters. If a future use case needs the SQL generator from a different runtime (e.g. a Slack bot), it can be lifted to its own service later. Not a 1-day concern.

Widget governance workflow (NOT PRESENT)

Status. All widgets stay private visibility. The metrics_catalog carries governance_status and approved_by columns reserved for the §B.5.4 governance flow, but no review pipeline exists.

Where the production version is specified. PRD v2.1 §B.5.4. The lineage columns are forward-compatible.

SQL generation safety (REAL, but anchored)

Worth calling out separately because it's the piece most likely to draw skepticism.

The fear. "LLM-generated SQL" sounds like every freeform NL-to-SQL system that hallucinates joins.

What we actually do. Per ADR-PROTO-002: SQL generation is anchored to a metrics_catalog entry. The LLM's job is bounded — take the catalog entry's source_query template, fill in dimensions and filters from the data_intent, return validated SQL. Free-text generation is rejected at the resolver layer (no metric_id → no Databricks query). The dictionary's join_map.csv is the allowlist; the safety layer (Prompt 3) rejects joins outside it.

What this buys. SQL quality is dramatically higher than freeform NL-to-SQL. Safety is structural, not bolted on.

The trade-off. Adding a new Databricks metric requires a metrics_catalog row first — by design, not by accident. The "ad hoc Databricks query" use case is not supported in the prototype.

Demo discipline

The risk register in PRD v2.1 §C.9 calls out the specific Q&A scenarios. The mitigation in every case is: have this doc open in a tab and quote it.

Risk Mitigation
"Why isn't Kafka working?" Quote ADR-PROTO-001 + PRD §B.4. "Diagram-only by design; Phase 2 work."
"Where's the vector DB?" Quote ADR-PROTO-004. "Whole dictionary fits in context; vector DB is a scale concern."
"Is this real LLM-generated SQL?" Yes, but anchored to metrics_catalog. Quote ADR-PROTO-002.
"Are these numbers real?" Yes for the 3 Databricks-routed metrics; synthetic for the 10 Postgres metrics. Cost Avoided (MTD) is real on whichever side the routing currently points at — the SourceBadge tells you which. Show the SourceBadge on each tile.
"Why does the Cost Avoided number jump when you flip routing?" The Postgres path is v1 synthetic data (outcomes table, MTD window). The Databricks path is l3_asurion.ev_claim.cost_avoided_usd (trailing-30d, Asurion-shaped seed). Different data, different windows, both real for their respective backends. The point of the §C.6.1 moment is precisely that the audience can SEE that swap happen live.

Cross-references

  • PRD v2.1 §C.7 — the source of truth for this table.
  • PRD v2.1 §B — the production target every "DIAGRAM ONLY" entry points at.
  • adrs/ — ADR-PROTO-001..005 cited above.
  • demo-queries.md — the actual SQL behind the three "REAL" rows.
  • data-dictionary.md — what makes the SQL generation safe.