Semaphor
Semantic Domains

Template Guide

Step-by-step tutorial for writing semantic domain templates by hand or programmatically

Semantic domains can be built through the UI or by writing YAML/JSON templates directly. This guide walks you through building a complete domain template from scratch, one concept at a time.

When to use template editing:

  • Bulk configuration of many fields or datasets
  • Version-controlled domains stored in Git
  • CI/CD pipelines that deploy domain changes automatically
  • Copying configurations between environments
  • AI agents or scripts generating domains programmatically

How to access the Code Editor:

  1. Open your domain in the Domain Editor
  2. Click the Code Editor tab
  3. Toggle between YAML and JSON format
  4. Edit the template and click Save

Minimal Template

The smallest valid domain template:

schemaVersion: "2.0"
datasets: []

All other top-level fields are optional and default to empty. The full skeleton:

schemaVersion: "2.0"
datasets: []
relationships: []
calculatedMetrics: {}
calculatedDimensions: {}
grainMappings: []

Step 1: Writing Datasets

Datasets represent your data sources. There are two types:

Physical Datasets

A physical dataset maps directly to a database table or view:

datasets:
  - name: orders
    label: "Orders"
    description: "Customer order transactions"
    type: physical
    connectionId: "conn_abc123"
    connectionType: PostgreSQL
    schema: public
    table: orders
    primaryKey:
      - order_id
    fields:
      identifiers: []
      dimensions: []
      metrics: []

Virtual Datasets

A virtual dataset uses a custom SQL query instead of a table:

datasets:
  - name: monthly_revenue
    label: "Monthly Revenue"
    type: virtual
    connectionId: "conn_abc123"
    connectionType: PostgreSQL
    sql: |
      SELECT
        DATE_TRUNC('month', order_date) AS month,
        SUM(amount) AS revenue,
        COUNT(*) AS order_count
      FROM orders
      GROUP BY DATE_TRUNC('month', order_date)
    fields:
      dimensions:
        - name: month
          label: "Month"
          dataType: date
      metrics:
        - name: revenue
          label: "Revenue"
          dataType: number
          sourceField: revenue
        - name: order_count
          label: "Order Count"
          dataType: number
          sourceField: order_count

Use virtual datasets for pre-aggregated data, complex transformations, or combining tables with custom logic.

Dataset Field Reference

FieldTypeRequiredDescription
namestringYesUnique identifier (snake_case)
labelstringYesUser-friendly display name
descriptionstringNoHelp text for users
typephysical | virtualYesTable-backed or SQL-backed
connectionIdstringYesID of the data connection
connectionTypeenumYesDatabase type
catalogstringNoDatabase catalog (BigQuery, Snowflake)
schemastringNoDatabase schema name
tablestringConditionalTable name (required for physical)
sqlstringConditionalSQL query (required for virtual)
primaryKeystring[]NoPrimary key columns (important for fan-out detection)
fieldsobjectYesContains identifiers, dimensions, metrics arrays

Supported connectionType values: PostgreSQL, MySQL, MSSQL, BigQuery, Redshift, Snowflake, clickhouse, S3, S3Tables, GoogleSheets, FileUpload, API


Step 2: Defining Fields

Every dataset has three field buckets: identifiers, dimensions, and metrics. Each serves a distinct role in query generation.

Identifiers

Identifiers are primary and foreign key fields used for joins:

identifiers:
  - name: order_id
    label: "Order ID"
    dataType: number
    type: primary
 
  - name: customer_id
    label: "Customer ID"
    dataType: number
    type: foreign
FieldTypeRequiredDescription
namestringYesColumn name in the database
labelstringNoDisplay name
descriptionstringNoHelp text
dataTypeenumYesstring, number, boolean, date, datetime, json, geo
typeprimary | foreignYesPrimary keys enable fan-out detection
hiddenbooleanNoHide from end users
tagsstring[]NoCategorization tags

Always mark primary keys with type: primary. This enables Semaphor's fan-out prevention, which avoids metric inflation when joining one-to-many relationships.

Dimensions

Dimensions are groupable fields that appear in GROUP BY clauses:

dimensions:
  - name: order_date
    label: "Order Date"
    dataType: date
    granularity: day
 
  - name: status
    label: "Status"
    dataType: string
 
  - name: region
    label: "Region"
    dataType: string
    isDisplayField: true
FieldTypeRequiredDescription
namestringYesColumn name in the database
labelstringNoDisplay name
descriptionstringNoHelp text
dataTypeenumYesstring, number, boolean, date, datetime, json, geo
granularityenumNoTime granularity: day, week, month, quarter, year, hour, minute, second
isDisplayFieldbooleanNoMark as the default display label for this dataset
dateFormatstringNoCustom date display format
customFormatstringNoCustom format string
hiddenbooleanNoHide from end users
tagsstring[]NoCategorization tags

Metrics

Metrics are aggregatable fields that produce numeric results:

metrics:
  - name: amount
    label: "Revenue"
    dataType: number
    sourceField: amount
 
  - name: quantity
    label: "Quantity"
    dataType: number
    sourceField: quantity
 
  - name: discounted_total
    label: "Discounted Total"
    dataType: number
    expression: "amount * (1 - discount_rate)"
FieldTypeRequiredDescription
namestringYesUnique metric name
labelstringNoDisplay name
descriptionstringNoHelp text
dataTypeenumYesUsually number
sourceFieldstringConditionalPhysical column name. Required if no expression
expressionstringConditionalSQL expression. Required if no sourceField
formatstringNoDisplay format hint
unitstringNoUnit label (e.g., "USD", "kg")
isDefaultbooleanNoMark as the default metric for this dataset
hiddenbooleanNoHide from end users
filtersarrayNoDefault filters applied to this metric
tagsstring[]NoCategorization tags

Every metric must have either sourceField or expression. If neither is provided, validation will fail.

Complete Fields Example

Putting it all together for an orders dataset:

datasets:
  - name: orders
    label: "Orders"
    type: physical
    connectionId: "conn_abc123"
    connectionType: PostgreSQL
    schema: public
    table: orders
    primaryKey: [order_id]
    fields:
      identifiers:
        - name: order_id
          label: "Order ID"
          dataType: number
          type: primary
        - name: customer_id
          label: "Customer ID"
          dataType: number
          type: foreign
      dimensions:
        - name: order_date
          label: "Order Date"
          dataType: date
          granularity: day
        - name: status
          label: "Status"
          dataType: string
        - name: channel
          label: "Sales Channel"
          dataType: string
      metrics:
        - name: amount
          label: "Revenue"
          dataType: number
          sourceField: amount
        - name: quantity
          label: "Quantity"
          dataType: number
          sourceField: quantity

Step 3: Defining Relationships

Relationships define how datasets join together. They enable automatic join resolution -- when a user selects fields from multiple datasets, Semaphor finds the shortest join path automatically.

relationships:
  - name: orders_to_customers
    sourceDataset: orders
    sourceFields: [customer_id]
    targetDataset: customers
    targetFields: [id]
    cardinality: many_to_one
    defaultJoinType: LEFT
    isAutoJoin: true
    discoveredBy: user
    isActive: true

Relationship Field Reference

FieldTypeRequiredDescription
idstringNoUnique ID (auto-generated if omitted)
namestringYesDescriptive name (e.g., orders_to_customers)
sourceDatasetstringYesDataset name where the join starts
sourceFieldsstring[]YesColumn(s) in the source dataset
targetDatasetstringYesDataset name being joined to
targetFieldsstring[]YesColumn(s) in the target dataset
cardinalityenumNoone_to_one, one_to_many, many_to_one, many_to_many. Defaults to one_to_one
defaultJoinTypeenumYesINNER, LEFT, RIGHT, FULL
isAutoJoinbooleanYesMust be true for automatic join resolution
joinPrioritynumberNoHigher priority used first when multiple paths exist
discoveredByenumYesuser for hand-authored, ai for AI-discovered, fk_constraint for database FK
confidenceenumNohigh, medium, low
isActivebooleanYesSet to false to disable without deleting
descriptionstringNoExplanation of the relationship

Cardinality Guide

CardinalityMeaningExample
one_to_oneEach source row matches exactly one target rowusers to user_profiles
one_to_manyEach source row can match multiple target rowsorders to line_items
many_to_oneMultiple source rows match one target roworders to customers
many_to_manyBoth sides can have multiple matchesstudents to courses (via bridge table)

Setting cardinality correctly is important. Semaphor uses it for fan-out detection -- preventing metric inflation that happens when aggregating across one-to-many joins.

Star Schema Example

A classic fact-and-dimension star schema with 3 relationships:

relationships:
  - name: orders_to_customers
    sourceDataset: orders
    sourceFields: [customer_id]
    targetDataset: customers
    targetFields: [id]
    cardinality: many_to_one
    defaultJoinType: LEFT
    isAutoJoin: true
    discoveredBy: user
    isActive: true
 
  - name: orders_to_products
    sourceDataset: orders
    sourceFields: [product_id]
    targetDataset: products
    targetFields: [id]
    cardinality: many_to_one
    defaultJoinType: LEFT
    isAutoJoin: true
    discoveredBy: user
    isActive: true
 
  - name: orders_to_stores
    sourceDataset: orders
    sourceFields: [store_id]
    targetDataset: stores
    targetFields: [id]
    cardinality: many_to_one
    defaultJoinType: LEFT
    isAutoJoin: true
    discoveredBy: user
    isActive: true

Composite Key Joins

For joins on multiple columns, list all fields:

relationships:
  - name: sales_to_targets
    sourceDataset: sales
    sourceFields: [region, product_id, quarter]
    targetDataset: targets
    targetFields: [region, product_id, quarter]
    cardinality: many_to_one
    defaultJoinType: LEFT
    isAutoJoin: true
    discoveredBy: user
    isActive: true

Multi-Hop Joins

Semaphor supports up to 2-hop joins. For example, if you define:

  • orders -> stores (1 hop)
  • stores -> countries (1 hop)

Then users can query orders.amount grouped by countries.name (2 hops). Semaphor's BFS resolver will find the path: orders -> stores -> countries.


Step 4: Creating Calculated Metrics

Calculated metrics define reusable business KPIs using token-based expressions. They are stored in calculatedMetrics as a record where the key is the metric name.

Expression Syntax

Expressions use {input_name} tokens as placeholders for field references. The tokens are replaced with fully-qualified SQL at query time.

calculatedMetrics:
  average_order_value:
    label: "Average Order Value"
    expression: "{total_revenue} / NULLIF({order_count}, 0)"
    inputs:
      total_revenue:
        dataset: orders
        field: amount
        aggregate: SUM
      order_count:
        dataset: orders
        field: order_id
        aggregate: COUNT
    format:
      type: currency
      decimals: 2
      currency: USD

Two Input Types

Column reference -- points to a physical column in a dataset:

inputs:
  total_revenue:
    dataset: orders     # Dataset name
    field: amount       # Column name
    aggregate: SUM      # Aggregation function

Metric reference -- points to another calculated metric (enables nesting):

inputs:
  profit:
    metric: profit      # Name of another calculated metric

Aggregation Rules

There are two places aggregation can be specified, and the rules are:

1. Expression wraps the token in an aggregate function:

# The SUM is in the expression -- no aggregate needed on the input
revenue:
  expression: "SUM({amount})"
  inputs:
    amount: { dataset: orders, field: amount }

2. Expression uses bare tokens:

# The tokens are bare -- each input MUST specify an aggregate
avg_order_value:
  expression: "{total_revenue} / NULLIF({order_count}, 0)"
  inputs:
    total_revenue: { dataset: orders, field: amount, aggregate: SUM }
    order_count: { dataset: orders, field: order_id, aggregate: COUNT }

If a token has no aggregate in either place, validation fails with error E010.

Supported aggregates: SUM, COUNT, AVG, MIN, MAX, MEDIAN, DISTINCT

Nesting Calculated Metrics

Calculated metrics can reference other calculated metrics. This is how you build layered KPIs:

calculatedMetrics:
  # Layer 1: Base metrics
  revenue:
    label: "Revenue"
    expression: "SUM({amount})"
    inputs:
      amount: { dataset: orders, field: amount }
 
  cost:
    label: "Cost"
    expression: "SUM({cost_amount})"
    inputs:
      cost_amount: { dataset: orders, field: cost }
 
  # Layer 2: Derived from base metrics
  profit:
    label: "Profit"
    expression: "{revenue} - {cost}"
    inputs:
      revenue: { metric: revenue }
      cost: { metric: cost }
 
  # Layer 3: Derived from derived metrics
  profit_margin:
    label: "Profit Margin %"
    expression: "{profit} / NULLIF({revenue}, 0) * 100"
    inputs:
      profit: { metric: profit }
      revenue: { metric: revenue }

At query time, Semaphor recursively expands nested references. The profit_margin metric resolves to:

((SUM(t0.amount) - SUM(t0.cost)) / NULLIF(SUM(t0.amount), 0) * 100)

Maximum nesting depth is 10 levels. Circular references (A references B which references A) are detected and rejected with error E001.

Calculated Metric Field Reference

FieldTypeRequiredDescription
labelstringYesDisplay name
descriptionstringNoHelp text
expressionstringYesSQL expression with {input} tokens
inputsobjectYesMap of input name to column or metric reference
metricTypeenumNobase, derived, calculated. Defaults to derived
aggregationStrategyenumNodefault, symmetric_aggregate, aggregate_then_join, weighted
formatobjectNoDisplay formatting (see Format Spec below)
tagsstring[]NoCategorization tags
filtersarrayNoDefault filters applied to this metric

Format Spec:

format:
  type: currency        # number, currency, percentage, date, string
  decimals: 2           # 0-10
  currency: USD         # For currency type
  prefix: "$"           # Optional prefix
  suffix: ""            # Optional suffix
  thousandsSeparator: true

Step 5: Creating Calculated Dimensions

Calculated dimensions are row-level computed fields that appear in GROUP BY clauses. Unlike calculated metrics, they produce non-aggregated values -- they transform each row before any grouping happens.

calculatedDimensions:
  order_month:
    label: "Order Month"
    expression: "DATE_TRUNC('month', {order_date})"
    inputs:
      order_date: { dataset: orders, field: order_date }
    dataType: date

Input Types

Column reference -- points to a physical column:

inputs:
  order_date: { dataset: orders, field: order_date }

Dimension reference -- points to another calculated dimension:

inputs:
  order_month: { dimension: order_month }

Calculated dimensions cannot reference metrics. If you try, validation fails with error E007.

Common Patterns

Date truncation:

order_quarter:
  label: "Order Quarter"
  expression: "DATE_TRUNC('quarter', {order_date})"
  inputs:
    order_date: { dataset: orders, field: order_date }
  dataType: date

CASE WHEN bucketing:

order_size:
  label: "Order Size"
  expression: |
    CASE
      WHEN {amount} > 1000 THEN 'Large'
      WHEN {amount} > 100 THEN 'Medium'
      ELSE 'Small'
    END
  inputs:
    amount: { dataset: orders, field: amount }
  dataType: string

Combining text fields:

full_name:
  label: "Full Name"
  expression: "CONCAT({first_name}, ' ', {last_name})"
  inputs:
    first_name: { dataset: customers, field: first_name }
    last_name: { dataset: customers, field: last_name }
  dataType: string

Handling nulls:

region_clean:
  label: "Region"
  expression: "COALESCE({region}, 'Unknown')"
  inputs:
    region: { dataset: customers, field: region }
  dataType: string

Calculated Dimension Field Reference

FieldTypeRequiredDescription
labelstringYesDisplay name
descriptionstringNoHelp text
expressionstringYesSQL expression with {input} tokens
inputsobjectYesMap of input name to column or dimension reference
dataTypeenumNostring, number, boolean, date, datetime
grainMappingobjectNoMaps dimension to columns in other datasets (see Grain Mappings)

Step 6: Grain Mappings

Grain mappings tell Semaphor how a calculated dimension maps to columns in other datasets. They are needed when a calculated metric references fields from multiple datasets and you group by a calculated dimension.

The problem: If you define order_month as DATE_TRUNC('month', orders.order_date), Semaphor needs to know what column in the returns dataset represents the same time grain.

grainMappings:
  - sourceDimension: order_month        # The calculated dimension
    targetDataset: returns              # The other dataset
    targetColumn: return_date           # The column to DATE_TRUNC in that dataset

When are grain mappings needed?

Only when all three conditions are true:

  1. You have a calculated metric that spans multiple datasets (cross-dataset metric)
  2. You're grouping by a calculated dimension
  3. The dimension's time grain needs to be applied to columns in other datasets

Example: Net Revenue (orders.amount - returns.amount) grouped by Order Month requires a grain mapping so Semaphor knows to use DATE_TRUNC('month', returns.return_date) for the returns dataset CTE.

Without the grain mapping, Semaphor wouldn't know how to align the returns data to the same monthly grain as orders.


Complete Example: E-Commerce Domain

Here's a complete domain template combining all concepts -- 3 datasets, 3 relationships, nested calculated metrics, calculated dimensions, and a grain mapping:

schemaVersion: "2.0"
 
datasets:
  - name: orders
    label: "Orders"
    type: physical
    connectionId: "conn_1"
    connectionType: PostgreSQL
    schema: public
    table: orders
    primaryKey: [order_id]
    fields:
      identifiers:
        - name: order_id
          dataType: number
          type: primary
        - name: customer_id
          dataType: number
          type: foreign
      dimensions:
        - name: order_date
          label: "Order Date"
          dataType: date
          granularity: day
        - name: status
          label: "Status"
          dataType: string
      metrics:
        - name: amount
          label: "Revenue"
          dataType: number
          sourceField: amount
        - name: cost
          label: "Cost"
          dataType: number
          sourceField: cost
 
  - name: customers
    label: "Customers"
    type: physical
    connectionId: "conn_1"
    connectionType: PostgreSQL
    schema: public
    table: customers
    primaryKey: [id]
    fields:
      identifiers:
        - name: id
          dataType: number
          type: primary
      dimensions:
        - name: segment
          label: "Customer Segment"
          dataType: string
        - name: region
          label: "Region"
          dataType: string
 
  - name: returns
    label: "Returns"
    type: physical
    connectionId: "conn_1"
    connectionType: PostgreSQL
    schema: public
    table: returns
    fields:
      identifiers: []
      dimensions:
        - name: return_date
          label: "Return Date"
          dataType: date
      metrics:
        - name: amount
          label: "Return Amount"
          dataType: number
          sourceField: amount
 
relationships:
  - name: orders_to_customers
    sourceDataset: orders
    sourceFields: [customer_id]
    targetDataset: customers
    targetFields: [id]
    cardinality: many_to_one
    defaultJoinType: LEFT
    isAutoJoin: true
    discoveredBy: user
    isActive: true
 
calculatedMetrics:
  revenue:
    label: "Total Revenue"
    expression: "SUM({amount})"
    inputs:
      amount: { dataset: orders, field: amount }
 
  cost:
    label: "Total Cost"
    expression: "SUM({cost_amount})"
    inputs:
      cost_amount: { dataset: orders, field: cost }
 
  profit:
    label: "Profit"
    expression: "{revenue} - {cost}"
    inputs:
      revenue: { metric: revenue }
      cost: { metric: cost }
 
  profit_margin:
    label: "Profit Margin %"
    expression: "{profit} / NULLIF({revenue}, 0) * 100"
    inputs:
      profit: { metric: profit }
      revenue: { metric: revenue }
    format:
      type: percentage
      decimals: 1
 
  net_revenue:
    label: "Net Revenue"
    description: "Revenue minus returns"
    expression: "{total_sales} - COALESCE({total_returns}, 0)"
    inputs:
      total_sales: { dataset: orders, field: amount, aggregate: SUM }
      total_returns: { dataset: returns, field: amount, aggregate: SUM }
 
calculatedDimensions:
  order_month:
    label: "Order Month"
    expression: "DATE_TRUNC('month', {order_date})"
    inputs:
      order_date: { dataset: orders, field: order_date }
    dataType: date
 
  order_size:
    label: "Order Size"
    expression: |
      CASE
        WHEN {amount} > 1000 THEN 'Large'
        WHEN {amount} > 100 THEN 'Medium'
        ELSE 'Small'
      END
    inputs:
      amount: { dataset: orders, field: amount }
    dataType: string
 
grainMappings:
  - sourceDimension: order_month
    targetDataset: returns
    targetColumn: return_date

How Templates Translate to SQL

When a user drags fields onto a card, Semaphor translates the domain template into SQL. Understanding this pipeline helps you write better templates.

The Translation Pipeline

Template Definition (YAML/JSON)
        |
        v
  Domain Resolver
  - Resolves calculated metrics/dimensions
  - Expands nested references
  - Converts named tokens to indexed tokens
        |
        v
  Card Configuration
  - Complete field definitions with expressions
  - All metadata needed for query generation
        |
        v
  Query Builder
  - Auto-join resolution (BFS path finding)
  - Table aliasing (t0, t1, t2...)
  - Token qualification (tokens → SQL)
  - Security policy injection (RCLS/CLS/TLS)
        |
        v
  Executable SQL

A key architectural decision: resolution happens at design time, not query time. When a user selects a domain metric for a card, the full expression and all input fields are resolved and saved into the card configuration. At query time, there's no domain lookup -- the card already has everything needed.

Single-Dataset Query

The simplest case: one dataset, one metric, one dimension.

User selects: orders.amount (SUM) grouped by orders.order_date

Generated SQL:

SELECT
  t0.order_date AS order_date,
  SUM(t0.amount) AS amount
FROM orders AS t0
GROUP BY t0.order_date
ORDER BY order_date

The orders table gets alias t0. The dimension goes into SELECT and GROUP BY. The metric gets wrapped in SUM().

Auto-Join Query

When fields come from multiple datasets, Semaphor uses BFS to find the shortest join path through your relationships.

User selects: orders.amount (SUM) grouped by customers.segment

Generated SQL:

SELECT
  t1.segment AS segment,
  SUM(t0.amount) AS amount
FROM orders AS t0
LEFT JOIN customers AS t1 ON t0.customer_id = t1.id
GROUP BY t1.segment
ORDER BY segment

The query builder detects fields from two datasets, finds the relationship via BFS, and applies the LEFT JOIN from defaultJoinType. Multi-hop joins chain automatically (e.g., orders -> stores -> countries gives t0, t1, t2).

The maximum join depth is 2 hops. Fields that require 3+ hops will appear grayed out in the field explorer.

Nested Metric Expansion

Using the profit_margin metric from Step 4 above, here's how the resolution chain works:

  1. Start with profit_margin: {profit} / NULLIF({revenue}, 0) * 100
  2. {profit} expands to {revenue} - {cost}
  3. {revenue} expands to SUM(t0.amount)
  4. {cost} expands to SUM(t0.cost)

Generated SQL:

SELECT
  ((SUM(t0.amount) - SUM(t0.cost)) / NULLIF(SUM(t0.amount), 0) * 100) AS profit_margin
FROM orders t0

Four metrics resolved into a single SQL expression.

Calculated Dimension in GROUP BY

Calculated dimensions get their tokens replaced without aggregation:

User selects: order_month dimension, orders.amount (SUM) metric

Generated SQL:

SELECT
  DATE_TRUNC('month', t0.order_date) AS order_month,
  SUM(t0.amount) AS revenue
FROM orders AS t0
GROUP BY DATE_TRUNC('month', t0.order_date)
ORDER BY order_month

The expression appears in both SELECT and GROUP BY.

Cross-Dataset Metrics: Aggregate-Then-Join

When a calculated metric references fields from different datasets, a naive JOIN would cause metric inflation (fan-out). Semaphor solves this with separate CTEs.

User selects: net_revenue grouped by order_month (from the complete example above)

Generated SQL:

WITH returns_agg AS (
  SELECT
    DATE_TRUNC('month', t0.return_date) AS order_month,
    SUM(t0.amount) AS net_revenue__field_0
  FROM returns AS t0
  GROUP BY DATE_TRUNC('month', t0.return_date)
),
orders_agg AS (
  SELECT
    DATE_TRUNC('month', t0.order_date) AS order_month,
    SUM(t0.amount) AS net_revenue__field_1
  FROM orders AS t0
  GROUP BY DATE_TRUNC('month', t0.order_date)
)
SELECT
  COALESCE(returns_agg.order_month, orders_agg.order_month) AS order_month,
  orders_agg.net_revenue__field_1 - COALESCE(returns_agg.net_revenue__field_0, 0) AS net_revenue
FROM returns_agg
FULL OUTER JOIN orders_agg ON returns_agg.order_month = orders_agg.order_month
ORDER BY order_month

Each dataset gets its own CTE that aggregates independently. The grain mapping tells Semaphor to use return_date for the returns CTE. A FULL OUTER JOIN ensures months that exist in only one dataset still appear.

Fan-Out Prevention

Even within a single metric, fan-out occurs when aggregating from one dataset but grouping by a joined one-to-many dataset. Semaphor pre-aggregates in a CTE:

Example: SUM(orders.amount) grouped by line_items.category

WITH base_agg AS (
  SELECT
    t0.order_id AS order_id,
    SUM(t0.amount) AS order_total
  FROM orders AS t0
  GROUP BY t0.order_id
)
SELECT
  t1.category AS category,
  SUM(t0.order_total) AS order_total
FROM base_agg AS t0
LEFT JOIN line_items AS t1 ON t0.order_id = t1.order_id
GROUP BY t1.category
ORDER BY category

For AVG metrics, Semaphor decomposes into SUM/COUNT in the base CTE and recomputes the average in the outer query to ensure mathematical correctness.

SQL Dialect Differences

Semaphor generates dialect-specific SQL based on your connectionType:

OperationPostgreSQLBigQuerySnowflake
Date truncationDATE_TRUNC('month', col)DATE_TRUNC(col, MONTH)DATE_TRUNC('month', col)
String concatCONCAT(a, b) or a || bCONCAT(a, b)CONCAT(a, b)
Column quoting"column"`column`"COLUMN"

Use standard SQL functions in expressions. Semaphor handles dialect-specific translation automatically where possible. For database-specific functions, ensure your expression matches the dialect of your connection.


Validation Errors

When you save a domain template, Semaphor validates it and reports errors. Here are the validation codes you may encounter:

Errors

CodeMeaningFix
E001Circular dependency detected (e.g., metric A references B which references A)Break the cycle by removing one of the references
E002Input references an unknown metric nameCheck that the referenced metric exists in calculatedMetrics
E003Input references an unknown field in a datasetVerify the dataset has that field in its field lists
E004Input references an unknown dataset nameEnsure the dataset is defined in the datasets array
E005Token {name} in expression not found in inputsAdd the missing token name to the inputs map
E007Calculated dimension references a metricDimensions can only reference columns or other dimensions
E008Invalid filter on a metricCheck filter syntax (field + operator + value)
E009Invalid expression syntaxRemove braces inside string literals
E010Input requires an aggregate but none specifiedAdd aggregate to the input or wrap token in aggregate function
E011Input references an unknown calculated dimensionCheck that the dimension exists in calculatedDimensions

Warnings

CodeMeaningAction
E006Input is defined but not used in expressionRemove the unused input or add it to the expression
W001Dataset has no primary key signalAdd primaryKey or mark identifier as type: primary
W002Relationship may cause metric inflationConsider using aggregate_then_join strategy or review cardinality

Next Steps

  • Agent Guide -- Build templates programmatically with validation rules
  • User Guide -- How end users interact with your domain in dashboards