SQL Generation

When a user drags fields onto a card, Semaphor translates the domain template into SQL. Understanding this pipeline helps you write better templates.

The Translation Pipeline

Template Definition (YAML/JSON)
        |
        v
  Domain Resolver
  - Resolves calculated metrics/dimensions
  - Expands nested references
  - Converts named tokens to indexed tokens
        |
        v
  Card Configuration
  - Complete field definitions with expressions
  - All metadata needed for query generation
        |
        v
  Query Builder
  - Auto-join resolution (BFS path finding)
  - Table aliasing (t0, t1, t2...)
  - Token qualification (tokens → SQL)
  - Security policy injection (RCLS/CLS/TLS)
        |
        v
  Executable SQL

A key architectural decision: resolution happens at design time, not query time. When a user selects a domain metric for a card, the full expression and all input fields are resolved and saved into the card configuration. At query time, there's no domain lookup -- the card already has everything needed.

Single-Dataset Query

The simplest case: one dataset, one metric, one dimension.

User selects: orders.amount (SUM) grouped by orders.order_date

Generated SQL:

SELECT
  t0.order_date AS order_date,
  SUM(t0.amount) AS amount
FROM orders AS t0
GROUP BY t0.order_date
ORDER BY order_date

The orders table gets alias t0. The dimension goes into SELECT and GROUP BY. The metric gets wrapped in its modeled aggregate, SUM() in this example.

Auto-Join Query

When fields come from multiple datasets, Semaphor uses BFS to find the shortest join path through your relationships.

User selects: orders.amount (SUM) grouped by customers.segment

Generated SQL:

SELECT
  t1.segment AS segment,
  SUM(t0.amount) AS amount
FROM orders AS t0
LEFT JOIN customers AS t1 ON t0.customer_id = t1.id
GROUP BY t1.segment
ORDER BY segment

The query builder detects fields from two datasets, finds the relationship via BFS, and applies the LEFT JOIN from defaultJoinType. Multi-hop joins chain automatically (e.g., orders -> stores -> countries gives t0, t1, t2).

The maximum join depth is 2 hops. Fields that require 3+ hops will appear grayed out in the field explorer.

Nested Metric Expansion

Using the profit_margin metric from Calculated Metrics, here's how the resolution chain works:

Start with profit_margin: {profit} / NULLIF({revenue}, 0) * 100
{profit} expands to {revenue} - {cost}
{revenue} expands to SUM(t0.amount)
{cost} expands to SUM(t0.cost)

Generated SQL:

SELECT
  ((SUM(t0.amount) - SUM(t0.cost)) / NULLIF(SUM(t0.amount), 0) * 100) AS profit_margin
FROM orders t0

Four metrics resolved into a single SQL expression.

Calculated Dimension in GROUP BY

Calculated dimensions get their tokens replaced without aggregation:

User selects: order_month dimension, orders.amount (SUM) metric

Generated SQL:

SELECT
  DATE_TRUNC('month', t0.order_date) AS order_month,
  SUM(t0.amount) AS revenue
FROM orders AS t0
GROUP BY DATE_TRUNC('month', t0.order_date)
ORDER BY order_month

The expression appears in both SELECT and GROUP BY.

Cross-Dataset Metrics: Aggregate-Then-Join

When a calculated metric references fields from different datasets, a naive JOIN would cause metric inflation (fan-out). Semaphor solves this with separate CTEs.

User selects: net_revenue grouped by order_month (from the complete example)

Generated SQL:

WITH returns_agg AS (
  SELECT
    DATE_TRUNC('month', t0.return_date) AS order_month,
    SUM(t0.amount) AS net_revenue__field_0
  FROM returns AS t0
  GROUP BY DATE_TRUNC('month', t0.return_date)
),
orders_agg AS (
  SELECT
    DATE_TRUNC('month', t0.order_date) AS order_month,
    SUM(t0.amount) AS net_revenue__field_1
  FROM orders AS t0
  GROUP BY DATE_TRUNC('month', t0.order_date)
)
SELECT
  COALESCE(returns_agg.order_month, orders_agg.order_month) AS order_month,
  orders_agg.net_revenue__field_1 - COALESCE(returns_agg.net_revenue__field_0, 0) AS net_revenue
FROM returns_agg
FULL OUTER JOIN orders_agg ON returns_agg.order_month = orders_agg.order_month
ORDER BY order_month

Each dataset gets its own CTE that aggregates independently. The grain mapping tells Semaphor to use return_date for the returns CTE. A FULL OUTER JOIN ensures months that exist in only one dataset still appear.

Fan-Out Prevention

Even within a single metric, fan-out occurs when aggregating from one dataset but grouping by a joined one-to-many dataset. Semaphor pre-aggregates in a CTE:

Example: SUM(orders.amount) grouped by line_items.category

WITH base_agg AS (
  SELECT
    t0.order_id AS order_id,
    SUM(t0.amount) AS order_total
  FROM orders AS t0
  GROUP BY t0.order_id
)
SELECT
  t1.category AS category,
  SUM(t0.order_total) AS order_total
FROM base_agg AS t0
LEFT JOIN line_items AS t1 ON t0.order_id = t1.order_id
GROUP BY t1.category
ORDER BY category

For AVG metrics, Semaphor decomposes into SUM/COUNT in the base CTE and recomputes the average in the outer query to ensure mathematical correctness.

SQL Dialect Differences

Semaphor generates dialect-specific SQL based on your connectionType:

Operation	PostgreSQL	BigQuery	Snowflake
Date truncation	`DATE_TRUNC('month', col)`	`DATE_TRUNC(col, MONTH)`	`DATE_TRUNC('month', col)`
String concat	`CONCAT(a, b)` or `a \|\| b`	`CONCAT(a, b)`	`CONCAT(a, b)`
Column quoting	`"column"`	`column`	`"COLUMN"`

Use standard SQL functions in expressions. Semaphor handles dialect-specific translation automatically where possible. For database-specific functions, ensure your expression matches the dialect of your connection.

SQL Generation

On this page