SQL Generation
Understand how semantic domain templates translate to executable SQL queries
When a user drags fields onto a card, Semaphor translates the domain template into SQL. Understanding this pipeline helps you write better templates.
The Translation Pipeline
Template Definition (YAML/JSON)
|
v
Domain Resolver
- Resolves calculated metrics/dimensions
- Expands nested references
- Converts named tokens to indexed tokens
|
v
Card Configuration
- Complete field definitions with expressions
- All metadata needed for query generation
|
v
Query Builder
- Auto-join resolution (BFS path finding)
- Table aliasing (t0, t1, t2...)
- Token qualification (tokens → SQL)
- Security policy injection (RCLS/CLS/TLS)
|
v
Executable SQLA key architectural decision: resolution happens at design time, not query time. When a user selects a domain metric for a card, the full expression and all input fields are resolved and saved into the card configuration. At query time, there's no domain lookup -- the card already has everything needed.
Single-Dataset Query
The simplest case: one dataset, one metric, one dimension.
User selects: orders.amount (SUM) grouped by orders.order_date
Generated SQL:
SELECT
t0.order_date AS order_date,
SUM(t0.amount) AS amount
FROM orders AS t0
GROUP BY t0.order_date
ORDER BY order_dateThe orders table gets alias t0. The dimension goes into SELECT and GROUP BY. The metric gets wrapped in SUM().
Auto-Join Query
When fields come from multiple datasets, Semaphor uses BFS to find the shortest join path through your relationships.
User selects: orders.amount (SUM) grouped by customers.segment
Generated SQL:
SELECT
t1.segment AS segment,
SUM(t0.amount) AS amount
FROM orders AS t0
LEFT JOIN customers AS t1 ON t0.customer_id = t1.id
GROUP BY t1.segment
ORDER BY segmentThe query builder detects fields from two datasets, finds the relationship via BFS, and applies the LEFT JOIN from defaultJoinType. Multi-hop joins chain automatically (e.g., orders -> stores -> countries gives t0, t1, t2).
The maximum join depth is 2 hops. Fields that require 3+ hops will appear grayed out in the field explorer.
Nested Metric Expansion
Using the profit_margin metric from Calculated Metrics, here's how the resolution chain works:
- Start with
profit_margin:{profit} / NULLIF({revenue}, 0) * 100 {profit}expands to{revenue} - {cost}{revenue}expands toSUM(t0.amount){cost}expands toSUM(t0.cost)
Generated SQL:
SELECT
((SUM(t0.amount) - SUM(t0.cost)) / NULLIF(SUM(t0.amount), 0) * 100) AS profit_margin
FROM orders t0Four metrics resolved into a single SQL expression.
Calculated Dimension in GROUP BY
Calculated dimensions get their tokens replaced without aggregation:
User selects: order_month dimension, orders.amount (SUM) metric
Generated SQL:
SELECT
DATE_TRUNC('month', t0.order_date) AS order_month,
SUM(t0.amount) AS revenue
FROM orders AS t0
GROUP BY DATE_TRUNC('month', t0.order_date)
ORDER BY order_monthThe expression appears in both SELECT and GROUP BY.
Cross-Dataset Metrics: Aggregate-Then-Join
When a calculated metric references fields from different datasets, a naive JOIN would cause metric inflation (fan-out). Semaphor solves this with separate CTEs.
User selects: net_revenue grouped by order_month (from the complete example)
Generated SQL:
WITH returns_agg AS (
SELECT
DATE_TRUNC('month', t0.return_date) AS order_month,
SUM(t0.amount) AS net_revenue__field_0
FROM returns AS t0
GROUP BY DATE_TRUNC('month', t0.return_date)
),
orders_agg AS (
SELECT
DATE_TRUNC('month', t0.order_date) AS order_month,
SUM(t0.amount) AS net_revenue__field_1
FROM orders AS t0
GROUP BY DATE_TRUNC('month', t0.order_date)
)
SELECT
COALESCE(returns_agg.order_month, orders_agg.order_month) AS order_month,
orders_agg.net_revenue__field_1 - COALESCE(returns_agg.net_revenue__field_0, 0) AS net_revenue
FROM returns_agg
FULL OUTER JOIN orders_agg ON returns_agg.order_month = orders_agg.order_month
ORDER BY order_monthEach dataset gets its own CTE that aggregates independently. The grain mapping tells Semaphor to use return_date for the returns CTE. A FULL OUTER JOIN ensures months that exist in only one dataset still appear.
Fan-Out Prevention
Even within a single metric, fan-out occurs when aggregating from one dataset but grouping by a joined one-to-many dataset. Semaphor pre-aggregates in a CTE:
Example: SUM(orders.amount) grouped by line_items.category
WITH base_agg AS (
SELECT
t0.order_id AS order_id,
SUM(t0.amount) AS order_total
FROM orders AS t0
GROUP BY t0.order_id
)
SELECT
t1.category AS category,
SUM(t0.order_total) AS order_total
FROM base_agg AS t0
LEFT JOIN line_items AS t1 ON t0.order_id = t1.order_id
GROUP BY t1.category
ORDER BY categoryFor AVG metrics, Semaphor decomposes into SUM/COUNT in the base CTE and recomputes the average in the outer query to ensure mathematical correctness.
SQL Dialect Differences
Semaphor generates dialect-specific SQL based on your connectionType:
| Operation | PostgreSQL | BigQuery | Snowflake |
|---|---|---|---|
| Date truncation | DATE_TRUNC('month', col) | DATE_TRUNC(col, MONTH) | DATE_TRUNC('month', col) |
| String concat | CONCAT(a, b) or a || b | CONCAT(a, b) | CONCAT(a, b) |
| Column quoting | "column" | `column` | "COLUMN" |
Use standard SQL functions in expressions. Semaphor handles dialect-specific translation automatically where possible. For database-specific functions, ensure your expression matches the dialect of your connection.