SQL Generation
Understand how semantic domain templates translate to executable SQL queries
When a user drags fields onto a card, Semaphor translates the domain template into SQL. Understanding this pipeline helps you write better templates.
The Translation Pipeline
A key architectural decision: resolution happens at design time, not query time. When a user selects a domain metric for a card, the full expression and all input fields are resolved and saved into the card configuration. At query time, there's no domain lookup -- the card already has everything needed.
Single-Dataset Query
The simplest case: one dataset, one metric, one dimension.
User selects: orders.amount (SUM) grouped by orders.order_date
Generated SQL:
The orders table gets alias t0. The dimension goes into SELECT and GROUP BY. The metric gets wrapped in SUM().
Auto-Join Query
When fields come from multiple datasets, Semaphor uses BFS to find the shortest join path through your relationships.
User selects: orders.amount (SUM) grouped by customers.segment
Generated SQL:
The query builder detects fields from two datasets, finds the relationship via BFS, and applies the LEFT JOIN from defaultJoinType. Multi-hop joins chain automatically (e.g., orders -> stores -> countries gives t0, t1, t2).
The maximum join depth is 2 hops. Fields that require 3+ hops will appear grayed out in the field explorer.
Nested Metric Expansion
Using the profit_margin metric from Calculated Metrics, here's how the resolution chain works:
- Start with
profit_margin:{profit} / NULLIF({revenue}, 0) * 100 {profit}expands to{revenue} - {cost}{revenue}expands toSUM(t0.amount){cost}expands toSUM(t0.cost)
Generated SQL:
Four metrics resolved into a single SQL expression.
Calculated Dimension in GROUP BY
Calculated dimensions get their tokens replaced without aggregation:
User selects: order_month dimension, orders.amount (SUM) metric
Generated SQL:
The expression appears in both SELECT and GROUP BY.
Cross-Dataset Metrics: Aggregate-Then-Join
When a calculated metric references fields from different datasets, a naive JOIN would cause metric inflation (fan-out). Semaphor solves this with separate CTEs.
User selects: net_revenue grouped by order_month (from the complete example)
Generated SQL:
Each dataset gets its own CTE that aggregates independently. The grain mapping tells Semaphor to use return_date for the returns CTE. A FULL OUTER JOIN ensures months that exist in only one dataset still appear.
Fan-Out Prevention
Even within a single metric, fan-out occurs when aggregating from one dataset but grouping by a joined one-to-many dataset. Semaphor pre-aggregates in a CTE:
Example: SUM(orders.amount) grouped by line_items.category
For AVG metrics, Semaphor decomposes into SUM/COUNT in the base CTE and recomputes the average in the outer query to ensure mathematical correctness.
SQL Dialect Differences
Semaphor generates dialect-specific SQL based on your connectionType:
| Operation | PostgreSQL | BigQuery | Snowflake |
|---|---|---|---|
| Date truncation | DATE_TRUNC('month', col) | DATE_TRUNC(col, MONTH) | DATE_TRUNC('month', col) |
| String concat | CONCAT(a, b) or a || b | CONCAT(a, b) | CONCAT(a, b) |
| Column quoting | "column" | `column` | "COLUMN" |
Use standard SQL functions in expressions. Semaphor handles dialect-specific translation automatically where possible. For database-specific functions, ensure your expression matches the dialect of your connection.