Skip to content

Demo Queries

Reference for the four Databricks-aware metrics that power the live data binding demo. The first three are Databricks-routed by default; the fourth (cost_avoided_mtd) is Postgres-routed by default and exists here so the §C.6.1 reroute moment has both sides documented side-by-side. Each entry shows the metrics_catalog source row, the parameter binding, the generated SQL, and the expected output shape. Sourced from backend/app/metrics/seed.py and config/metric_routing.yaml. Pair with sql-generator.md and data-dictionary.md.

Setup

Before any of these will return data:

make seed-databricks-l3   # populates workspace.l3_asurion.ev_claim + ev_product_catalog
make up                   # starts the api with the routing validator green

The validator at backend/app/sql_gen/routing.py checks that every metric below has a routing entry — see sql-generator.md § Boot validator.

Routing context

All three metrics share the same routing shape:

1
2
3
4
<metric_name>:
  backend: databricks
  sql_template: source_query
  cache_seconds: 300

Why cache_seconds: 300 — Serverless Starter cold-starts are real (~5-10s). 300s caches across a single demo while staying fresh enough to feel live.

Why sql_template: source_query — the data resolver pulls the SQL template from metrics_catalog.source_query for the row. No inline override.

Parameter binding

All three queries use one parameter: {time_window_start}. The data resolver computes it from the metric's default_filter.window (today: trailing_30d):

Window Bound value (UTC)
trailing_24h now() - INTERVAL '24 HOURS'
trailing_7d now() - INTERVAL '7 DAYS'
trailing_30d (default) now() - INTERVAL '30 DAYS'
trailing_90d now() - INTERVAL '90 DAYS'

The actual binding renders as a Databricks-compatible TIMESTAMP literal (e.g. TIMESTAMP '2026-04-06 04:50:00') at the SQL string level. Prompt 3's safety layer validates the rendered SQL is SELECT-only against allowlisted tables before execution.

Metric 1 — Claim Volume (Asurion)

Catalog row. name=claim_volume_l3_asurion, entity=ev_claim, source_schema=l3_asurion, source_table=ev_claim. Definition per dictionary kpi_catalog.csv row "Claim Volume".

Tile shape. Single-value KPI. Formats as integer count.

Generated SQL (window = trailing_30d):

1
2
3
SELECT COUNT(DISTINCT claim_id) AS claim_count
FROM l3_asurion.ev_claim
WHERE claim_started_dttm_utc >= TIMESTAMP '2026-04-06 04:50:00'

Expected output shape.

1
2
3
4
{
  "rows": [{"claim_count": 1842}],
  "metadata": {"source": "databricks", "cached_until": "..."}
}

Single row, single integer column. Frontend KpiTile renders the count.

What gets shown on the SourceBadge. Databricks · l3_asurion.ev_claim.

Metric 2 — Claims by Product (Asurion)

Catalog row. name=claims_by_product_l3_asurion, entity=ev_claim. Groups by product_catalog_name — the canonical product dimension per the dictionary's column_dictionary.csv (the column product_id is marked unstable and must not be used for grouping).

Tile shape. Bar chart, top-N by count, descending.

Generated SQL (window = trailing_30d):

1
2
3
4
5
SELECT product_catalog_name, COUNT(DISTINCT claim_id) AS claim_count
FROM l3_asurion.ev_claim
WHERE claim_started_dttm_utc >= TIMESTAMP '2026-04-06 04:50:00'
GROUP BY product_catalog_name
ORDER BY 2 DESC LIMIT 1000

Expected output shape.

1
2
3
4
5
6
7
8
9
{
  "rows": [
    {"product_catalog_name": "iPhone 15 Pro", "claim_count": 287},
    {"product_catalog_name": "Galaxy S24 Ultra", "claim_count": 211},
    {"product_catalog_name": "Pixel 8", "claim_count": 158},
    ...
  ],
  "metadata": {"source": "databricks", "cached_until": "..."}
}

The seeder fixes the catalog at 500 product names from a ~20-model pool, so the top 5-10 are populated and the long tail is sparse — exactly the shape a real Asurion product dimension would have.

Note on the LIMIT 1000. This is a defensive ceiling. The data resolver enforces a separate per-tile row limit (Prompt 4); the SQL-level cap exists so a Databricks-side glitch can't return a million rows by accident.

What gets shown on the SourceBadge. Databricks · l3_asurion.ev_claim (no join — this metric does NOT use ev_product_catalog; product_catalog_name is denormalised onto ev_claim per the dictionary).

Metric 3 — Claim Status Mix (Asurion)

Catalog row. name=claim_status_mix_l3_asurion, entity=ev_claim. Groups by claim_status_code. KPI catalog default visualisation: stacked bar.

Tile shape. Bar chart (or stacked bar with a future status-axis pivot — current renderer ships horizontal bars).

Generated SQL (window = trailing_30d):

1
2
3
4
SELECT claim_status_code, COUNT(DISTINCT claim_id) AS claim_count
FROM l3_asurion.ev_claim
WHERE claim_started_dttm_utc >= TIMESTAMP '2026-04-06 04:50:00'
GROUP BY claim_status_code

Expected output shape.

{
  "rows": [
    {"claim_status_code": "RESOLVED", "claim_count": 1014},
    {"claim_status_code": "IN_PROGRESS", "claim_count": 332},
    {"claim_status_code": "PENDING_PARTS", "claim_count": 129},
    {"claim_status_code": "OPEN", "claim_count": 187},
    {"claim_status_code": "CANCELLED", "claim_count": 92},
    {"claim_status_code": "DENIED", "claim_count": 88}
  ],
  "metadata": {"source": "databricks", "cached_until": "..."}
}

Status weighting in the seeder is intentional (RESOLVED ~55%, IN_PROGRESS ~18%, others single-digit %) so the chart isn't pathologically uniform — it tells a real-feeling story about ops workload.

What gets shown on the SourceBadge. Databricks · l3_asurion.ev_claim.

Metric 4 — Cost avoided (MTD) — the §C.6.1 reroute path

Why this metric is special. It is the only metric the prototype runs through both backends. The default routing in config/metric_routing.yaml is backend: postgres — the v1 fast path. The §C.6.1 demo moment flips that single line to backend: databricks, runs make up (≤7s warm rebuild), and the same KPI tile re-renders against l3_asurion.ev_claim with no other code change. The audience sees the SourceBadge swap green→purple while the value updates from the Postgres synthetic figure to the Databricks-derived one. Both paths return the same single-row [{value: N}] shape so the renderer code path is unchanged.

Catalog row. name=cost_avoided_mtd, entity=outcomes (the Postgres-side entity for backwards compatibility with v1 widgets), source_schema=l3_asurion, source_table=ev_claim. The Postgres path uses the hardcoded WHERE in backend/app/widgets/data_resolver.py::_POSTGRES_QUERIES; the Databricks path uses the source_query column in metrics_catalog.

Tile shape. Single-value KPI. Currency-formatted ($1.41M / $375k).

Postgres path (default)

Hardcoded SQL (in _POSTGRES_QUERIES):

1
2
3
SELECT COALESCE(SUM(cost_avoided), 0)::float AS value
FROM outcomes
WHERE recorded_at >= date_trunc('month', now())

Live response (2026-05-06 pre-validation). value=1410000.0, source=postgres, execution_ms=5, live_data_unavailable=false. Receipt: artifacts/part-c-demo-ready/20260506-131614/cost_avoided_mtd_postgres.json.

Databricks path (after on-stage YAML flip)

metrics_catalog.source_query template:

1
2
3
SELECT COALESCE(SUM(cost_avoided_usd), 0) AS value
FROM l3_asurion.ev_claim
WHERE claim_started_dttm_utc >= {time_window_start}

Bedrock-generated SQL (window = trailing_30d):

1
2
3
4
SELECT COALESCE(SUM(cost_avoided_usd), 0) AS value
FROM l3_asurion.ev_claim
WHERE claim_started_dttm_utc >= current_timestamp() - INTERVAL 30 DAYS
LIMIT 5000

Live response (2026-05-06 pre-validation). value=374714.39, source=bedrock, execution_ms=765 (warm; first call after a cold warehouse adds 5-10s), live_data_unavailable=false. Generated SQL is preserved on the response so the SpecJsonView "Generated SQL" tab can show it. Receipt: artifacts/part-c-demo-ready/20260506-131614/cost_avoided_mtd_databricks.json.

The reroute itself is one YAML edit:

1
2
3
4
5
   cost_avoided_mtd:
-    backend: postgres
+    backend: databricks
     sql_template: source_query
     cache_seconds: 60

Followed by make up (warm rebuild ≤7s; the boot validator re-checks routing). The KPI tile re-fetches on the next useWidgetData refresh, the SourceBadge flips green→purple, and the value drops from the synthetic $1.41M to the Databricks-derived $375k. Restoring the YAML to postgres + make up swaps it back.

What gets shown on the SourceBadge (after the flip): Databricks · l3_asurion.ev_claim (purple). Click expands the generated SQL.

Pre-warming for the demo. Hit curl http://localhost:8000/v1/widgets/{cost_avoided_mtd_widget_id}/data -d '{"refresh":true}' ~30s before the §C.6.1 moment so Bedrock + the Serverless Starter warehouse are warm; the on-stage flip then lands in the 700ms-1s p95 band.

Verifying live

After make up, hit each metric's data endpoint (Prompt 4 will mount this; until then verify by direct catalog inspection):

# Confirm the three rows exist in metrics_catalog with source_query populated
docker exec 2026-hackathon-db-1 psql -U cmdcenter -d cmdcenter -c "
  SELECT name, source_schema, source_table, LENGTH(source_query) AS query_len
  FROM metrics_catalog
  WHERE source_schema = 'l3_asurion';
"

# Confirm the routing validator passed (no error in api logs)
docker compose logs api | grep -E "routing|metric"

# Confirm the seeded tables have rows
make validate-dictionary

When something goes wrong

Symptom Most likely cause Fix
RuntimeError: Unmapped metric_ids in metrics_catalog: [...] at api startup A metrics_catalog row exists without a routing entry Add to config/metric_routing.yaml or TRUNCATE metrics_catalog and rebuild — see lessons-learned.md § demo-reset chicken-and-egg.
Validator complains about l3_asurion.ev_claim not found The Databricks tables haven't been seeded make seed-databricks-l3.
Query returns 0 rows time_window_start is older than the seeded data, or the seeded data hasn't loaded The seeder generates timestamps within the last 90 days — trailing_30d should always have data. Re-run make seed-databricks-l3 --reset.
Query latency > 5s Serverless Starter cold-start First query of the demo gets the cold tax; the rest hit the 300s cache. Pre-warm with curl /v1/databricks/health 30s before showtime.

Cross-references