Part C — Databricks Prototype (1-day, completed 2026-05-06)¶
Status: completed (2026-05-06). Executed PRD v2.1 §C against a Databricks SQL Warehouse provided by Asurion's data engineering team. Shipped Databricks connectivity + SQL Generator + per-metric routing for 2-3 metrics, plus the §C.6.1 cost_avoided_mtd reroute path that flips a tile from Postgres to Databricks via a single YAML edit. Other Part B items (Kafka, Bronze/Silver/Gold, Mosaic AI, governance, iframe sandbox, vector DB) remain mocked or deferred per ADR-PROTO-001..005. Mirrors ADR-008 (
docs/lessons-learned.md§ Mocks must be opt-in, never silent fallback) into the SQL gen path: Bedrock unavailable returns 503 withapplication/problem+json, never a silent template fallback. The remaining demo-arc work — Prompt 6 docs + the §C.6.1 reroute path + acceptance receipts #1/#3/#4/#6/#11 + the operator-gated #12 — closed out underdocs/plans/completed/part-c-demo-ready.md.
Why this plan exists¶
A conversation with Asurion's data engineering team unlocked Databricks SQL Warehouse access plus a hand-curated set of S3-stored schema files. This is the highest-leverage credibility win available before the next sponsor checkpoint:
- Real data on screen beats every architectural diagram. A senior leader seeing "Cost Avoided MTD by Region" populated from actual Asurion operational data shifts the project's posture from "interesting hackathon" to "this works against our environment."
- The SQL generator is the riskiest unproven piece of Part B. Building it against real Databricks today either validates the pattern (ship the rest of Part B with confidence) or surfaces the issue now, not in week 3.
- Kafka adds zero demonstrable value in a 90-second demo. It costs hours that should go to Databricks integration. Defer to architecture-diagram-only annotation per ADR-PROTO-001.
The full rationale is in PRD v2.1 §C.1; this plan is the execution-tracking surface.
Prerequisites¶
The companion PRD v2.1 hardening plan (PRD §C.4 fail-loud rewrite, §C.4.5 model-discipline subsection, §C.5.3 boot validation, instruction-file updates in CLAUDE.md / AGENTS.md / .cursor/rules/hackathon-base.mdc) is required pre-work. Without those edits, an executing agent will silently violate ADR-008 by re-introducing template fallback as a silent default. Specifically:
prd-v2.1.md§C.4.2 step 4d — fail-loud contract for SQL gen on Bedrock failureprd-v2.1.md§C.4.5 — Haiku 4.5 + flat schema + sqlglot dialect rulesprd-v2.1.md§C.5.3 — boot-time routing-config validationCLAUDE.md— Reading-order entry forprd-v2.1.md(precedence rule), Tech Stack lakehouse bullet, new "SQL generation discipline" sub-section, IMPORTANT bulletsAGENTS.md—prd-v2.1.mdrow in canonical-docs table
Scope (mirrors PRD v2.1 §C.2)¶
IN scope (per §C.2.1)¶
| # | Item | Where in Part B |
|---|---|---|
| C-IN-01 | Databricks SQL Warehouse connectivity (Serverless Starter; auth method TBD per Q-PROTO-1) | New |
| C-IN-02 | Data dictionary loader pointing at the existing S3 schema files | Subset of B.5.4 |
| C-IN-03 | SQL Generator service: Bedrock + safety layer, anchored to metrics_catalog |
New (referenced by B.7.1) |
| C-IN-04 | Per-metric routing config: config/metric_routing.yaml |
New (extends B.5.5 / B.7.1) |
| C-IN-05 | POST /v1/widgets/{id}/data endpoint with the routing layer wired |
B.8.1 |
| C-IN-06 | 2-3 metrics with real Databricks data lineage | Subset of B.5.4 |
| C-IN-07 | Frontend: widget renderers call /data, SourceBadge + freshness on each tile |
B.7.1 |
| C-IN-08 | Updated architecture diagram with Kafka annotated "Phase 2" | Architecture doc |
| C-IN-09 | Demo runbook addition (~3-minute arc on top of existing 90-second arc) | New |
| C-IN-10 | docs/whats-mocked-in-prototype.md honest accounting |
New |
MOCKED (per §C.2.2)¶
| # | Item | Why mocked |
|---|---|---|
| C-MOCK-01 | Kafka transport | Zero demo value vs setup cost; deferred to Part B |
| C-MOCK-02 | Bronze/Silver/Gold pipeline | Out of 1-day scope; existing tables are "Bronze enough" |
| C-MOCK-03 | Vector DB for data dictionary RAG | Dictionary fits in 200K context; ADR-PROTO-004 |
| C-MOCK-04 | Mosaic AI Model Serving | Trained models out of 1-day scope |
| C-MOCK-05 | Iframe sandbox for custom widgets | Security hardening deferred |
| C-MOCK-06 | Widget governance workflow | Approval flows deferred |
| C-MOCK-07 | Other 7+ metrics (synthetic) | Only 2-3 metrics need real lineage to prove the pattern |
OUT of scope (per §C.2.3)¶
- Real source-system event integration (CRM/Claims/Telemetry feed)
- Schema registry, Avro, dual-write pattern (Part B Week 1)
- Multi-tenant security, encryption at rest, OAuth (Part B Week 4)
- API Gateway + Lambda for SQL generation (collapsed into FastAPI per ADR-PROTO-003)
- Vector DB ingest pipeline for the data dictionary (ADR-PROTO-004)
Hour-by-hour map (mirrors PRD §C.11 + prompts.md)¶
| Hour | Focus | Deliverable | Plan todo |
|---|---|---|---|
| 0 – 1 | Pre-flight with data engineer; confirm creds work from laptop | One curl-equivalent query returning rows | hour_0_preflight |
| 1 – 2.5 | Databricks client + settings + smoke test | GET /v1/databricks/health returns 200 |
prompt_1_databricks_client |
| 2.5 – 4 | Dictionary loader from S3 + metrics_catalog enrichment | validate_dictionary.py exits 0 |
prompt_2_dictionary_loader |
| 4 – 6 | SQL Generator service (Bedrock + safety layer + catalog anchor) | POST /v1/widgets/sql/generate dry-run returns valid SQL |
prompt_3_sql_generator |
| 6 – 7.5 | Widget data resolver + per-metric routing + boot validation | At least one widget renders real Databricks rows | prompt_4_data_resolver |
| 7.5 – 8.5 | Frontend wiring (useWidgetData, SourceBadge, MetricInfoBadge) |
Live dashboard tile from real data | prompt_5_frontend_wiring |
| 8.5 – 9.5 | Architecture diagram + demo runbook + whats-mocked doc + sql-generator.md | Demo dry-run completes cleanly twice | prompt_6_demo_runbook_docs |
| 9.5 – 10+ | Buffer + backup video recording | Backup MP4 saved | acceptance_dryrun_12 |
Cross-link: each Hour row maps to one prompt section in prompts.md. The prompts are the executable form; this plan is the tracking surface.
Acceptance criteria (mirrors PRD §C.10)¶
The prototype is "done" when all 12 of these pass in dry-run. Each one has a matching acceptance_dryrun_* todo above; flip those to completed only after the gate physically passes.
- #1 —
make upbrings the full stack up in <60s (warm 5.62s / cold 6.02s —artifacts/part-c-demo-ready/20260506-131614/make_up_{cold,warm}.log) - #2 —
curl http://localhost:8000/v1/databricks/healthreturns 200 withrows_sampled > 0(returns 5000 — see docs/plans/completed/databricks-mock-data-and-prompt-1.md) - #3 —
python backend/scripts/validate_dictionary.pyexits 0 (2.93s, 0 errors, 50 not-seeded warnings —validate_dictionary.log) - #4 —
POST /v1/widgets/sql/generatewithdry_run=truereturns valid SQL for each selected metric (4 receipts:sqlgen_claim_volume_l3_asurion.json,sqlgen_claims_by_product_l3_asurion.json,sqlgen_claim_status_mix_l3_asurion.json,sqlgen_cost_avoided_mtd.json) - #5 — Generated SQL passes the safety layer (25 unit tests in
backend/tests/test_sql_safety.pycovering SELECT-only, allowlist, LIMIT injection, forbidden DDL/DML, dialect='databricks') - #6 — Adversarial test:
DROP TABLE-coercivedata_intentis rejected with structuredforbidden_constructerror (live + unit receipts:gate6_live_adversarial.json+gate6_safety_violation_unit.log) - #7 —
POST /v1/widgets/{id}/datareturns real Databricks rows in <3s p95 (steady-state p50=49.2ms / p95=56.3ms across 5 cached calls —artifacts/prompt-5-frontend-wiring/20260506-120610/databricks_latency.txt; cost_avoided_mtd reroute path validated at 765ms execution —cost_avoided_mtd_databricks.json) - #8 — Same endpoint returns Postgres rows for at least one synthetic metric in <300ms p95 (live: 5-13ms p95 — artifacts/prompt-4-data-resolver/20260506/postgres_latency.txt)
- #9 — Frontend dashboard renders all tiles; Databricks-backed tiles show "Source: Databricks · Last updated: Xs ago" (purple
Databricks · 4s ago ▾chip with click-to-expand SQL —artifacts/prompt-5-frontend-wiring/20260506-120610/dashboard_healthy.png) - #10 — Killing Databricks → graceful degradation with
live_data_unavailable: true+ amber SourceBadge + warn log (NOT silent MockLlm; PRD §C.10 #10 reworded). End-to-end verified on the UI: bogus token flips ONLY the Databricks-routed tile to amberMock · live data unavailable; Postgres tiles + v1 KPI strip stay visually identical. Restoring the real token returns the tile to purple. Receipts:artifacts/prompt-5-frontend-wiring/20260506-120610/dashboard_databricks_down.png+dashboard_restored.png. - #11 — Demo dry-run completes the full ~3-minute arc cleanly twice in a row, including the §C.6.1 existing-tile-reroute moment (programmatic dry-run twice clean: $1.41M Postgres → YAML flip → $374,714.39 Databricks → YAML restore → $1.41M Postgres back. run1=14s, run2=14s. Receipts:
dryrun_run1.log+dryrun_run2.log. Operator-driven UI rehearsal happens at demo time per the runbook's Part C arc.) - #12 — Backup video recorded (operator-gated, see
artifacts/part-c-demo-ready/20260506-131614/E2_OPERATOR_HANDOFF.md) — every prerequisite shipped (backup-recording.md Part C steps, demo-runbook backup-recording slot pointer, .gitignore coverage). The MP4 itself requires a human driver against the running stack; the operator handoff file is the single landing page for that step.
Risks (cross-link to PRD §C.9)¶
Full risk table lives in prd-v2.1.md §C.9. The most likely-to-bite ones during execution:
- Databricks auth blocks Hour 0 → Get auth method confirmed before Hour 1; PAT is acceptable for prototype; have backup creds ready
- Data dictionary mismatches actual Databricks schema →
validate_dictionary.pyruns Hour 1 and exits non-zero on any mismatch - Bedrock generates invalid SQL → Pre-validate the 2-3 demo queries Hour 6; if generation is unreliable, opt the metric into
template_fallback(visible amber SourceBadge, NOT silent) - Demo query latency exceeds 5s on Serverless Starter cold-start → Pre-warm warehouse Hour 7; cache demo queries with
cache_seconds=300 - Token expiration kills demo mid-run → Long-lived PAT; document expiry in env config
Build discipline call-outs (non-negotiable)¶
Five rules each linked to its source. Skipping any one of these is the sound of a lesson being paid for a second time.
- SQL gen Bedrock failure → 503, never silent template fallback. Mirror of ADR-008 for the Clarifier. Source: PRD §C.4.2 step 4d, §C.4.5;
docs/lessons-learned.md§ Mocks must be opt-in, never silent fallback. - Tool-input schema is flat object only —
{ sql, tables_used, explanation }. No top-leveloneOf. Source: lessons-learned § Bedrock tool-use rejects top-level oneOf schemas. - sqlglot dialect explicit on every parse —
sqlglot.parse_one(sql, dialect='databricks'). Source:prompts.md§ Common failure modes — sqlglot rejects valid SQL because of dialect. - Re-run
make upafter every backend or frontend source change + verify env vars withdocker exec api env | grep DATABRICKS_. Source: lessons-learned § Stale containers hide UI work, § Watch the live env vars onmake up. - Real-credential smoke test for any Bedrock-routed feature — mock-only tests cannot catch tool-use access regressions. Source: lessons-learned § Bedrock tool-use rejects top-level oneOf schemas (recommended-mitigation paragraph).
- Local data dictionary is the single source of truth — never edit, never bypass. The four CSVs +
ai_query_guidelines.mdunderdata-dictionary/(mounted into the api container at/app/data-dictionary) drive (a) Pydantic models inapp.sql_gen.data_dictionary, (b) DDL generation for the l3_asurion seeder, ©validate-dictionarystructural and live passes, (d) the SQL generator's prompt-time table/join allowlist. Adding a metric or a table means adding a row in the CSVs, not hardcoding in Python. Source:docs/plans/active/prompt-2-dictionary-loader.md. metrics_catalogDDL has TWO authoritative locations.db/init.sql(fresh-DB path onmake demo-reset) ANDbackend/app/metrics/catalog.py::_TABLE_DDL(idempotent migration viaensure_metrics_tablein lifespan). Edits land in BOTH or fresh DBs and dev DBs diverge. NoALTER TABLEmigration shim — edit theCREATE TABLEand requiremake demo-reset. Source:docs/plans/active/promote-metric-direction-to-catalog.mdddl_extend;docs/plans/active/prompt-2-dictionary-loader.md.- Boot-time routing validator is fail-loud. Every
metrics_catalog.namemust have an entry inconfig/metric_routing.yaml; missing entries raiseRuntimeErrorinapp.main.lifespanand the api container exits non-zero. Reverse-direction (extra YAML rows without a catalog match) is a WARN, not fatal — staging routing for an upcoming seeder is fine. Source: PRD v2.1 §C.5.3, ADR-PROTO-005.
Time-box discipline (from prompts.md)¶
If you're 30 minutes over on any prompt, stop and assess rather than push through:
| Prompt | Budget | If overrun, cut |
|---|---|---|
| 1 (Databricks client) | 90 min | Skip OAuth M2M and Service Principal; PAT-only |
| 2 (Dictionary loader) | 90 min | Skip S3 loader; hand-write config/data_dictionary.yaml for the 5 tables |
| 3 (SQL Generator) | 120 min | Skip free-text rejection in route layer (assume only catalog-anchored calls) |
| 4 (Data resolver) | 90 min | Skip cache; every request hits the backend |
| 5 (Frontend wiring) | 60 min | Skip MetricInfoBadge enhancements; just SourceBadge |
| 6 (Demo + docs) | 60 min | Skip whats-mocked doc — write it after the demo |
Non-negotiables: working /v1/databricks/health, working SQL generation for 2 demo metrics, dashboard tile populated from real data end-to-end. Everything else can be smaller or deferred.
Definitely not in scope¶
Don't even start any of these mid-prototype:
- Kafka producer / consumer code (ADR-PROTO-001 — diagram only)
- Bronze/Silver/Gold DLT pipelines (deferred to Part B)
- Vector DB for the data dictionary (ADR-PROTO-004)
- Mosaic AI Model Serving endpoints (deferred to Part B)
- Iframe sandbox for custom widgets (deferred to Part B)
- Widget governance workflow / approval UI (deferred to Part B)
- API Gateway + Lambda for SQL gen (ADR-PROTO-003 — collapsed into FastAPI)
- Re-routing more than 2-3 metrics to Databricks (the "everything else stays synthetic" line is the demo)
Done when¶
- All 12 acceptance checkboxes above are green
make verifypasses (extended with the new acceptance criteria where applicable)- Demo dry-run twice clean, including the §C.6.1 existing-tile-reroute moment
- Backup video recorded
- This plan moves to
docs/plans/completed/part-c-databricks-prototype.mdwithstatus: completedandcompleted_on: <date> - Lessons surfaced via
harvest_lessonsare appended todocs/lessons-learned.mdusing the four-field format docs/sql-generator.mdexists (created in Prompt 6) and is referenced fromAGENTS.mdcanonical-docs table
Close-out (2026-05-06)¶
Prompt 5 shipped under docs/plans/completed/prompt-5-frontend-wiring.md. The remaining demo close-out — Prompt 6 docs, the §C.6.1 live cost_avoided_mtd reroute path (seeder + dictionary + catalog source_query), receipts for gates #1/#3/#4/#6/#11/#12, and the plan move to completed/ — was tracked under part-c-demo-ready.md and shipped on the same date.
Final close-out (2026-05-06). All 12 PRD §C.10 acceptance gates green (gate #12 operator-gated per artifacts/part-c-demo-ready/20260506-131614/E2_OPERATOR_HANDOFF.md; every prerequisite landed). 117 backend tests + 74 frontend tests passing. The §C.6.1 live reroute path proven twice clean programmatically. Demo runbook + README + architecture diagram + lessons-learned all updated. Plan moved to docs/plans/completed/part-c-databricks-prototype.md in the same commit as the part-c-demo-ready close-out. CLAUDE.md "Active plans" line updated to drop Part C; "Current State" Part C bullet rewritten to reflect demo-ready status.
References¶
prd-v2.1.md§C — Part C spec (the source of truth)prompts.md— hour-by-hour Claude Code prompts (the executable form)prd.md§19 ADR-008 — mocks-as-opt-in / fail-loud (the discipline this plan extends to SQL gen)docs/lessons-learned.md:- § Mocks must be opt-in, never silent fallback — the discipline that locks the §C.4.2 fail-loud rewrite
- § Bedrock tool-use rejects top-level oneOf schemas — locks the flat tool-input schema for SQL gen
- § Haiku 4.5 silently drops deeply-nested fields — locks the
{ sql, tables_used, explanation }shape - § Watch the live env vars on
make up, not just the file — Databricks env passthrough into Docker - § Stale containers hide UI work — re-run
make upafter every change
- ADR-PROTO-001..005 in
prd-v2.1.md§C.8 (Kafka diagram-only; SQL gen anchored to metrics_catalog; SQL gen in FastAPI not Lambda; dictionary in prompt context; per-metric routing)