Architecture¶
Top-level architecture of the Asurion Command Center hackathon stack. Read this first; subsystem-specific docs (widget-builder.md, sql-generator.md, data-dictionary.md) take you deeper. Source files cited inline so you can verify any claim against the code.
What this stack is¶
A real-time NPS-impact command center for support ops, built as a hackathon MVP. Five moving parts, all wired through a single FastAPI process:
- A Postgres datastore seeded with synthetic dashboard data (issue sessions, claims, devices, alerts, recommendations) plus a first-class
metrics_catalogtable (ADR-007) and a JSONBdashboard_layoutsdoc per user (ADR-009). - A Redis pub/sub channel (
pubsub:dashboard) the WebSocket endpoint at/v1/dashboard/streammirrors out to connected browsers. - The FastAPI app at backend/app/main.py — ingest, dashboard, decisions, widgets, metrics, demo admin, plus a Databricks health endpoint.
- The React frontend at frontend/ — dashboard tiles, the Add Widget Clarifier modal, the SourceBadge / MetricInfoBadge surfaces.
- Two outbound integrations:
- AWS Bedrock for the Add Widget Clarifier (LangGraph-driven, ADR-005) and the Part C SQL generator (PRD v2.1 §C.4, in flight).
- Databricks SQL Warehouse for the Part C live-data path. The personal workspace runs both
workspace.asurion_prototype.*(v1 mirror, hits/v1/databricks/health) andworkspace.l3_asurion.*(the dictionary-shaped tables Prompt 2 seeded — see data-dictionary.md).
flowchart LR
browser["React UI<br/>(:3080)"]
api["FastAPI<br/>(:8000)<br/>app.main"]
pg[("Postgres<br/>cmdcenter")]
rd[("Redis<br/>pubsub:dashboard")]
bedrock["AWS Bedrock<br/>(Anthropic Claude)"]
dbx_v1[("Databricks<br/>workspace.asurion_prototype")]
dbx_l3[("Databricks<br/>workspace.l3_asurion<br/>(dictionary-shaped)")]
dict["data-dictionary/<br/>(4 CSVs + guidelines)"]
routing["config/metric_routing.yaml"]
kafka["Kafka<br/>Phase 2 — production transport<br/>(ADR-PROTO-001, diagram only)"]
browser -->|REST| api
browser -->|WebSocket| api
api -->|SQL| pg
api -->|pub/sub| rd
api -->|tool-use| bedrock
api -->|/health probe| dbx_v1
api -->|live SQL Part C| dbx_l3
dict -.->|ro mount| api
routing -.->|ro mount| api
kafka -.->|deferred| api
classDef phase2 stroke-dasharray:5 5,stroke:#888,color:#666;
class kafka phase2;
The dashed kafka node is a placeholder per ADR-PROTO-001 — production replaces the direct POST /v1/events ingest path with Kafka, but the prototype proves the more uncertain piece (live data binding) and keeps event transport synchronous. See whats-mocked-in-prototype.md § Kafka (DIAGRAM ONLY).
Boot order — app.main.lifespan¶
Every gate runs inside one engine.connect() block so the routing validator observes the seeded rows. Source: backend/app/main.py lines 34-53.
sequenceDiagram
autonumber
participant uvi as uvicorn
participant lifespan as app.main.lifespan
participant pg as Postgres
participant routing as routing.validate_routing_against_catalog
uvi->>lifespan: startup
lifespan->>pg: _ensure_widgets_table
lifespan->>pg: ensure_metrics_table (DDL + lineage ALTERs)
lifespan->>pg: ensure_dashboard_layouts_table
lifespan->>pg: seed_metrics_if_empty (13 rows)
lifespan->>pg: seed_if_empty (synthetic dashboard data)
lifespan->>routing: validate_routing_against_catalog(conn)
routing-->>lifespan: ok or RuntimeError (ADR-008 fail-loud)
lifespan-->>uvi: yield
Failure semantics: if any step raises, the lifespan never yields and Docker keeps the api container in a restart loop. The operator sees the full traceback in docker compose logs api. There is no silent recovery path — every gate is fail-loud per ADR-008. The classic example is the routing validator catching a metrics_catalog row without a corresponding entry in config/metric_routing.yaml; see sql-generator.md § Boot validator.
Service topology — docker-compose.yml¶
| Service | Image | Role | Notes |
|---|---|---|---|
db |
postgres:16-alpine |
Source of truth for issue sessions, claims, devices, widgets, metric catalog, dashboard layouts | db/init.sql runs once on first boot. The lifespan migrations (ensure_*_table) handle additive schema changes for warm dev DBs. |
redis |
redis:7-alpine |
pubsub:dashboard fan-out for the WebSocket |
No persistence required. |
api |
Built from backend/Dockerfile |
The FastAPI app + LangGraph Clarifier + Databricks client + dictionary loader | Mounts ./data-dictionary:/app/data-dictionary:ro and ./config:/app/config:ro for Part C. |
web |
Built from frontend/Dockerfile |
The React UI | Talks to api via relative URLs in production-mode Vite build. |
The api container also has read-write access to ${HOME}/.aws so boto3 can refresh SSO credentials on demand — :ro was tried and silently breaks Bedrock auth (lessons-learned § Stale containers hide UI work).
API surface¶
Routers live in backend/app/. All paths are prefixed with /v1 except /health. The full curated OpenAPI spec lives at api/openapi.yaml; the live one is always at http://localhost:8000/openapi.json.
| Router | Mount | Owner doc |
|---|---|---|
app.main |
top-level routes (/health, /v1/databricks/health, /v1/dashboard/state, /v1/decisions/next-best-action, /v1/feedback/outcome, /v1/admin/reset-demo, WebSocket /v1/dashboard/stream) |
this doc |
app.widgets.routes |
/v1/widgets/* (Add Widget Clarifier SSE + persist) |
widget-builder.md |
app.metrics.routes |
/v1/metrics/* (catalog CRUD) |
this doc + ADR-007 |
app.dashboard.routes |
/v1/dashboard/layout/* (drag-reorder layout per ADR-009) |
this doc + ADR-009 |
Configuration surface¶
Env vars (declared in docker-compose.yml)¶
| Group | Key vars | Purpose |
|---|---|---|
| Database | DATABASE_URL, REDIS_URL |
Container internal; never exposed externally. |
| Bedrock | BUILDER_MODE (live|offline), USE_BEDROCK, AWS_PROFILE, AWS_REGION, BEDROCK_MODEL_ID |
ADR-008. Default live — Bedrock failures surface as 503, never silent MockLlm. |
| Databricks | DATABRICKS_HOST, DATABRICKS_HTTP_PATH, DATABRICKS_TOKEN, DATABRICKS_CATALOG, DATABRICKS_SCHEMA, DATABRICKS_HEALTH_TABLE, DATABRICKS_QUERY_TIMEOUT_S, DATABRICKS_POOL_SIZE |
Part C. Default catalog workspace, schema asurion_prototype. |
| Dictionary / routing | DATA_DICTIONARY_ROOT (/app/data-dictionary), METRIC_ROUTING_PATH (/app/config/metric_routing.yaml) |
Part C. Mounted read-only into the container. |
Mounted config¶
- data-dictionary/ — four CSVs +
ai_query_guidelines.md. Canonical dictionary, single source of truth. See data-dictionary.md. - config/metric_routing.yaml — per-metric routing rules consumed by the boot validator. See sql-generator.md § Routing.
Postgres schema¶
Defined in db/init.sql. The metrics_catalog table has a dual source of truth — its CREATE TABLE is duplicated in backend/app/metrics/catalog.py _TABLE_DDL so the lifespan migration path agrees with the fresh-DB path. Edits land in BOTH or fresh DBs and dev DBs diverge. This convention is documented in docs/lessons-learned.md § Dual-DDL source of truth.
ADRs that shape this architecture¶
The full set lives under docs/adrs/. The ones that most directly shape this top-level view:
- ADR-001 — Postgres + Redis Compose, deferring the production data path.
- ADR-008 — Mocks-as-opt-in / fail-loud. Drives the lifespan boot gates, the Bedrock 503 path, and the routing validator.
- ADR-007 —
metrics_catalogfirst-class + per-widgetMetricDefinitionblock. Why widgets carry an embedded metric instead of a string label. - ADR-009 — Drag-reorder layout via a single JSONB
dashboard_layoutsdoc. - ADR-PROTO-001..005 — Part C decisions: Kafka diagram-only, SQL gen anchored to catalog, SQL gen in FastAPI not Lambda, dictionary in prompt context (not vector DB), per-metric routing.
Forward-looking — Part B and beyond¶
prd-v2.1.md Part B describes the 4-week production sprint. The headline shifts:
- Kafka transport replaces direct event ingest (ADR-V2-001).
- Bronze / Silver / Gold lakehouse layers in Databricks replace the synthetic Postgres for the analytical surface.
- A Mosaic AI gateway fronts Bedrock with rate limiting and guardrails.
- Vector DB joins the dictionary loader for similarity-aware metric matching.
- Iframe sandbox replaces
@babel/standalonein-process compilation (ADR-006 evolution).
Part C (the slice in flight) deliberately mocks or defers all of those — see whats-mocked-in-prototype.md for the line-by-line accounting.
Operations¶
Local stack¶
TechDocs preview¶
This site is rendered by Backstage TechDocs. To preview locally, see Phase 4 of the rollout plan: make docs-serve runs the spotify/techdocs container against the repo's mkdocs.yml.
Publish path¶
Out of scope for the hackathon — the plan ships the manifests so an operator can register the repo with a Backstage instance, but no CI publish action is wired. See the rollout plan's Risks + open questions section.