Template Guide
Step-by-step tutorial for writing semantic domain templates by hand or programmatically
Semantic domains can be built through the UI or by writing YAML/JSON templates directly. This guide walks you through building a complete domain template from scratch, one concept at a time.
When to use template editing:
- Bulk configuration of many fields or datasets
- Version-controlled domains stored in Git
- CI/CD pipelines that deploy domain changes automatically
- Copying configurations between environments
- AI agents or scripts generating domains programmatically
How to access the Code Editor:
- Open your domain in the Domain Editor
- Click the Code Editor tab
- Toggle between YAML and JSON format
- Edit the template and click Save
Minimal Template
The smallest valid domain template:
All other top-level fields are optional and default to empty. The full skeleton:
Step 1: Writing Datasets
Datasets represent your data sources. There are two types:
Physical Datasets
A physical dataset maps directly to a database table or view:
Virtual Datasets
A virtual dataset uses a custom SQL query instead of a table:
Use virtual datasets for pre-aggregated data, complex transformations, or combining tables with custom logic.
Dataset Field Reference
| Field | Type | Required | Description |
|---|---|---|---|
name | string | Yes | Unique identifier (snake_case) |
label | string | Yes | User-friendly display name |
description | string | No | Help text for users |
type | physical | virtual | Yes | Table-backed or SQL-backed |
connectionId | string | Yes | ID of the data connection |
connectionType | enum | Yes | Database type |
catalog | string | No | Database catalog (BigQuery, Snowflake) |
schema | string | No | Database schema name |
table | string | Conditional | Table name (required for physical) |
sql | string | Conditional | SQL query (required for virtual) |
primaryKey | string[] | No | Primary key columns (important for fan-out detection) |
fields | object | Yes | Contains identifiers, dimensions, metrics arrays |
Supported connectionType values: PostgreSQL, MySQL, MSSQL, BigQuery, Redshift, Snowflake, clickhouse, S3, S3Tables, GoogleSheets, FileUpload, API
Step 2: Defining Fields
Every dataset has three field buckets: identifiers, dimensions, and metrics. Each serves a distinct role in query generation.
Identifiers
Identifiers are primary and foreign key fields used for joins:
| Field | Type | Required | Description |
|---|---|---|---|
name | string | Yes | Column name in the database |
label | string | No | Display name |
description | string | No | Help text |
dataType | enum | Yes | string, number, boolean, date, datetime, json, geo |
type | primary | foreign | Yes | Primary keys enable fan-out detection |
hidden | boolean | No | Hide from end users |
tags | string[] | No | Categorization tags |
Always mark primary keys with type: primary. This enables Semaphor's fan-out prevention, which avoids metric inflation when joining one-to-many relationships.
Dimensions
Dimensions are groupable fields that appear in GROUP BY clauses:
| Field | Type | Required | Description |
|---|---|---|---|
name | string | Yes | Column name in the database |
label | string | No | Display name |
description | string | No | Help text |
dataType | enum | Yes | string, number, boolean, date, datetime, json, geo |
granularity | enum | No | Time granularity: day, week, month, quarter, year, hour, minute, second |
isDisplayField | boolean | No | Mark as the default display label for this dataset |
dateFormat | string | No | Custom date display format |
customFormat | string | No | Custom format string |
hidden | boolean | No | Hide from end users |
tags | string[] | No | Categorization tags |
Metrics
Metrics are aggregatable fields that produce numeric results:
| Field | Type | Required | Description |
|---|---|---|---|
name | string | Yes | Unique metric name |
label | string | No | Display name |
description | string | No | Help text |
dataType | enum | Yes | Usually number |
sourceField | string | Conditional | Physical column name. Required if no expression |
expression | string | Conditional | SQL expression. Required if no sourceField |
format | string | No | Display format hint |
unit | string | No | Unit label (e.g., "USD", "kg") |
isDefault | boolean | No | Mark as the default metric for this dataset |
hidden | boolean | No | Hide from end users |
filters | array | No | Default filters applied to this metric |
tags | string[] | No | Categorization tags |
Every metric must have either sourceField or expression. If neither is provided, validation will fail.
Complete Fields Example
Putting it all together for an orders dataset:
Step 3: Defining Relationships
Relationships define how datasets join together. They enable automatic join resolution -- when a user selects fields from multiple datasets, Semaphor finds the shortest join path automatically.
Relationship Field Reference
| Field | Type | Required | Description |
|---|---|---|---|
id | string | No | Unique ID (auto-generated if omitted) |
name | string | Yes | Descriptive name (e.g., orders_to_customers) |
sourceDataset | string | Yes | Dataset name where the join starts |
sourceFields | string[] | Yes | Column(s) in the source dataset |
targetDataset | string | Yes | Dataset name being joined to |
targetFields | string[] | Yes | Column(s) in the target dataset |
cardinality | enum | No | one_to_one, one_to_many, many_to_one, many_to_many. Defaults to one_to_one |
defaultJoinType | enum | Yes | INNER, LEFT, RIGHT, FULL |
isAutoJoin | boolean | Yes | Must be true for automatic join resolution |
joinPriority | number | No | Higher priority used first when multiple paths exist |
discoveredBy | enum | Yes | user for hand-authored, ai for AI-discovered, fk_constraint for database FK |
confidence | enum | No | high, medium, low |
isActive | boolean | Yes | Set to false to disable without deleting |
description | string | No | Explanation of the relationship |
Cardinality Guide
| Cardinality | Meaning | Example |
|---|---|---|
one_to_one | Each source row matches exactly one target row | users to user_profiles |
one_to_many | Each source row can match multiple target rows | orders to line_items |
many_to_one | Multiple source rows match one target row | orders to customers |
many_to_many | Both sides can have multiple matches | students to courses (via bridge table) |
Setting cardinality correctly is important. Semaphor uses it for fan-out detection -- preventing metric inflation that happens when aggregating across one-to-many joins.
Star Schema Example
A classic fact-and-dimension star schema with 3 relationships:
Composite Key Joins
For joins on multiple columns, list all fields:
Multi-Hop Joins
Semaphor supports up to 2-hop joins. For example, if you define:
orders->stores(1 hop)stores->countries(1 hop)
Then users can query orders.amount grouped by countries.name (2 hops). Semaphor's BFS resolver will find the path: orders -> stores -> countries.
Step 4: Creating Calculated Metrics
Calculated metrics define reusable business KPIs using token-based expressions. They are stored in calculatedMetrics as a record where the key is the metric name.
Expression Syntax
Expressions use {input_name} tokens as placeholders for field references. The tokens are replaced with fully-qualified SQL at query time.
Two Input Types
Column reference -- points to a physical column in a dataset:
Metric reference -- points to another calculated metric (enables nesting):
Aggregation Rules
There are two places aggregation can be specified, and the rules are:
1. Expression wraps the token in an aggregate function:
2. Expression uses bare tokens:
If a token has no aggregate in either place, validation fails with error E010.
Supported aggregates: SUM, COUNT, AVG, MIN, MAX, MEDIAN, DISTINCT
Nesting Calculated Metrics
Calculated metrics can reference other calculated metrics. This is how you build layered KPIs:
At query time, Semaphor recursively expands nested references. The profit_margin metric resolves to:
Maximum nesting depth is 10 levels. Circular references (A references B which references A) are detected and rejected with error E001.
Calculated Metric Field Reference
| Field | Type | Required | Description |
|---|---|---|---|
label | string | Yes | Display name |
description | string | No | Help text |
expression | string | Yes | SQL expression with {input} tokens |
inputs | object | Yes | Map of input name to column or metric reference |
metricType | enum | No | base, derived, calculated. Defaults to derived |
aggregationStrategy | enum | No | default, symmetric_aggregate, aggregate_then_join, weighted |
format | object | No | Display formatting (see Format Spec below) |
tags | string[] | No | Categorization tags |
filters | array | No | Default filters applied to this metric |
Format Spec:
Step 5: Creating Calculated Dimensions
Calculated dimensions are row-level computed fields that appear in GROUP BY clauses. Unlike calculated metrics, they produce non-aggregated values -- they transform each row before any grouping happens.
Input Types
Column reference -- points to a physical column:
Dimension reference -- points to another calculated dimension:
Calculated dimensions cannot reference metrics. If you try, validation fails with error E007.
Common Patterns
Date truncation:
CASE WHEN bucketing:
Combining text fields:
Handling nulls:
Calculated Dimension Field Reference
| Field | Type | Required | Description |
|---|---|---|---|
label | string | Yes | Display name |
description | string | No | Help text |
expression | string | Yes | SQL expression with {input} tokens |
inputs | object | Yes | Map of input name to column or dimension reference |
dataType | enum | No | string, number, boolean, date, datetime |
grainMapping | object | No | Maps dimension to columns in other datasets (see Grain Mappings) |
Step 6: Grain Mappings
Grain mappings tell Semaphor how a calculated dimension maps to columns in other datasets. They are needed when a calculated metric references fields from multiple datasets and you group by a calculated dimension.
The problem: If you define order_month as DATE_TRUNC('month', orders.order_date), Semaphor needs to know what column in the returns dataset represents the same time grain.
When are grain mappings needed?
Only when all three conditions are true:
- You have a calculated metric that spans multiple datasets (cross-dataset metric)
- You're grouping by a calculated dimension
- The dimension's time grain needs to be applied to columns in other datasets
Example: Net Revenue (orders.amount - returns.amount) grouped by Order Month requires a grain mapping so Semaphor knows to use DATE_TRUNC('month', returns.return_date) for the returns dataset CTE.
Without the grain mapping, Semaphor wouldn't know how to align the returns data to the same monthly grain as orders.
Complete Example: E-Commerce Domain
Here's a complete domain template combining all concepts -- 3 datasets, 3 relationships, nested calculated metrics, calculated dimensions, and a grain mapping:
How Templates Translate to SQL
When a user drags fields onto a card, Semaphor translates the domain template into SQL. Understanding this pipeline helps you write better templates.
The Translation Pipeline
A key architectural decision: resolution happens at design time, not query time. When a user selects a domain metric for a card, the full expression and all input fields are resolved and saved into the card configuration. At query time, there's no domain lookup -- the card already has everything needed.
Single-Dataset Query
The simplest case: one dataset, one metric, one dimension.
User selects: orders.amount (SUM) grouped by orders.order_date
Generated SQL:
The orders table gets alias t0. The dimension goes into SELECT and GROUP BY. The metric gets wrapped in SUM().
Auto-Join Query
When fields come from multiple datasets, Semaphor uses BFS to find the shortest join path through your relationships.
User selects: orders.amount (SUM) grouped by customers.segment
Generated SQL:
The query builder detects fields from two datasets, finds the relationship via BFS, and applies the LEFT JOIN from defaultJoinType. Multi-hop joins chain automatically (e.g., orders -> stores -> countries gives t0, t1, t2).
The maximum join depth is 2 hops. Fields that require 3+ hops will appear grayed out in the field explorer.
Nested Metric Expansion
Using the profit_margin metric from Step 4 above, here's how the resolution chain works:
- Start with
profit_margin:{profit} / NULLIF({revenue}, 0) * 100 {profit}expands to{revenue} - {cost}{revenue}expands toSUM(t0.amount){cost}expands toSUM(t0.cost)
Generated SQL:
Four metrics resolved into a single SQL expression.
Calculated Dimension in GROUP BY
Calculated dimensions get their tokens replaced without aggregation:
User selects: order_month dimension, orders.amount (SUM) metric
Generated SQL:
The expression appears in both SELECT and GROUP BY.
Cross-Dataset Metrics: Aggregate-Then-Join
When a calculated metric references fields from different datasets, a naive JOIN would cause metric inflation (fan-out). Semaphor solves this with separate CTEs.
User selects: net_revenue grouped by order_month (from the complete example above)
Generated SQL:
Each dataset gets its own CTE that aggregates independently. The grain mapping tells Semaphor to use return_date for the returns CTE. A FULL OUTER JOIN ensures months that exist in only one dataset still appear.
Fan-Out Prevention
Even within a single metric, fan-out occurs when aggregating from one dataset but grouping by a joined one-to-many dataset. Semaphor pre-aggregates in a CTE:
Example: SUM(orders.amount) grouped by line_items.category
For AVG metrics, Semaphor decomposes into SUM/COUNT in the base CTE and recomputes the average in the outer query to ensure mathematical correctness.
SQL Dialect Differences
Semaphor generates dialect-specific SQL based on your connectionType:
| Operation | PostgreSQL | BigQuery | Snowflake |
|---|---|---|---|
| Date truncation | DATE_TRUNC('month', col) | DATE_TRUNC(col, MONTH) | DATE_TRUNC('month', col) |
| String concat | CONCAT(a, b) or a || b | CONCAT(a, b) | CONCAT(a, b) |
| Column quoting | "column" | `column` | "COLUMN" |
Use standard SQL functions in expressions. Semaphor handles dialect-specific translation automatically where possible. For database-specific functions, ensure your expression matches the dialect of your connection.
Validation Errors
When you save a domain template, Semaphor validates it and reports errors. Here are the validation codes you may encounter:
Errors
| Code | Meaning | Fix |
|---|---|---|
| E001 | Circular dependency detected (e.g., metric A references B which references A) | Break the cycle by removing one of the references |
| E002 | Input references an unknown metric name | Check that the referenced metric exists in calculatedMetrics |
| E003 | Input references an unknown field in a dataset | Verify the dataset has that field in its field lists |
| E004 | Input references an unknown dataset name | Ensure the dataset is defined in the datasets array |
| E005 | Token {name} in expression not found in inputs | Add the missing token name to the inputs map |
| E007 | Calculated dimension references a metric | Dimensions can only reference columns or other dimensions |
| E008 | Invalid filter on a metric | Check filter syntax (field + operator + value) |
| E009 | Invalid expression syntax | Remove braces inside string literals |
| E010 | Input requires an aggregate but none specified | Add aggregate to the input or wrap token in aggregate function |
| E011 | Input references an unknown calculated dimension | Check that the dimension exists in calculatedDimensions |
Warnings
| Code | Meaning | Action |
|---|---|---|
| E006 | Input is defined but not used in expression | Remove the unused input or add it to the expression |
| W001 | Dataset has no primary key signal | Add primaryKey or mark identifier as type: primary |
| W002 | Relationship may cause metric inflation | Consider using aggregate_then_join strategy or review cardinality |
Next Steps
- Agent Guide -- Build templates programmatically with validation rules
- User Guide -- How end users interact with your domain in dashboards