Datasets & Fields
Define data sources and their field structure in semantic domain templates
Datasets
Datasets represent your data sources. There are two types:
Physical Datasets
A physical dataset maps directly to a database table or view:
datasets:
- name: orders
label: "Orders"
description: "Customer order transactions"
type: physical
connectionId: "conn_abc123"
connectionType: PostgreSQL
schema: public
table: orders
primaryKey:
- order_id
fields:
identifiers: []
dimensions: []
metrics: []Virtual Datasets
A virtual dataset uses a custom SQL query instead of a table:
You can author SQL-backed virtual datasets entirely from the UI via Add Dataset → SQL Dataset -- no hand-editing YAML required. See Virtual Datasets for the walkthrough.
datasets:
- name: monthly_revenue
label: "Monthly Revenue"
type: virtual
connectionId: "conn_abc123"
connectionType: PostgreSQL
sql: |
SELECT
DATE_TRUNC('month', order_date) AS month,
SUM(amount) AS revenue,
COUNT(*) AS order_count
FROM orders
GROUP BY DATE_TRUNC('month', order_date)
fields:
dimensions:
- name: month
label: "Month"
dataType: date
metrics:
- name: revenue
label: "Revenue"
dataType: number
sourceField: revenue
- name: order_count
label: "Order Count"
dataType: number
sourceField: order_countUse virtual datasets for pre-aggregated data, complex transformations, or combining tables with custom logic.
Dataset Field Reference
| Field | Type | Required | Description |
|---|---|---|---|
name | string | Yes | Unique identifier (snake_case) |
label | string | Yes | User-friendly display name |
description | string | No | Help text for users |
type | physical | virtual | Yes | Table-backed or SQL-backed |
connectionId | string | Yes | ID of the data connection |
connectionType | enum | Yes | Database type |
catalog | string | No | Database catalog (BigQuery, Snowflake) |
schema | string | No | Database schema name |
table | string | Conditional | Table name (required for physical) |
sql | string | Conditional | SQL query (required for virtual) |
primaryKey | string[] | No | Primary key columns (important for fan-out detection) |
fields | object | Yes | Contains identifiers, dimensions, metrics arrays |
Supported connectionType values: PostgreSQL, MySQL, MSSQL, BigQuery, Redshift, Snowflake, clickhouse, S3, S3Tables, GoogleSheets, FileUpload, API
Primary Date Field
Each dataset can declare one date dimension as its primary date field. This is the dataset's canonical date — the one dashboard-level features use when they need a "default" date to filter on.
Today it powers the Global Date Filter: a single date range set once on a dashboard that applies to every eligible card, without per-card setup.
Declare it with primaryDateField on the dataset, referencing a dimension by name:
datasets:
- name: orders
label: "Orders"
type: physical
primaryDateField: order_date
fields:
dimensions:
- name: order_date
label: "Order Date"
dataType: date
- name: ship_date
label: "Ship Date"
dataType: datetimeIn the semantic domain editor UI, open a dataset's dimensions and toggle Set as primary date field on the column you want. The toggle only appears on date or datetime dimensions.
Rules
A few things to keep in mind when setting a primary date field:
- The field you reference must exist on the dataset.
- It must be a
dateordatetimetype — other data types aren't eligible, and the toggle won't appear on them. - Renames are safe. If you rename the field later, the
primaryDateFieldreference updates automatically — no dashboard re-wiring needed.
Each dataset has one primary date field at a time. To switch to a different field, toggle Set as primary date field on the new date dimension (the previous one clears automatically), or update primaryDateField directly in YAML.
See Global Date Filter for the dashboard-side feature this unlocks.
Fields
Every dataset has three field buckets: identifiers, dimensions, and metrics. Each serves a distinct role in query generation.
Identifiers
Identifiers are primary and foreign key fields used for joins:
identifiers:
- name: order_id
label: "Order ID"
dataType: number
type: primary
- name: customer_id
label: "Customer ID"
dataType: number
type: foreign| Field | Type | Required | Description |
|---|---|---|---|
name | string | Yes | Column name in the database |
label | string | No | Display name |
description | string | No | Help text |
dataType | enum | Yes | string, number, boolean, date, datetime, json, geo |
type | primary | foreign | Yes | Primary keys enable fan-out detection |
hidden | boolean | No | Hide from end users |
tags | string[] | No | Categorization tags |
Always mark primary keys with type: primary. This enables Semaphor's fan-out prevention, which avoids metric inflation when joining one-to-many relationships.
Dimensions
Dimensions are groupable fields that appear in GROUP BY clauses:
dimensions:
- name: order_date
label: "Order Date"
dataType: date
granularity: day
- name: status
label: "Status"
dataType: string
- name: region
label: "Region"
dataType: string
isDisplayField: true
- name: order_month
label: "Order Month"
dataType: date
expression: "DATE_TRUNC('month', order_date)"| Field | Type | Required | Description |
|---|---|---|---|
name | string | Yes | Column name in the database |
label | string | No | Display name |
description | string | No | Help text |
dataType | enum | Yes | string, number, boolean, date, datetime, json, geo |
granularity | enum | No | Time granularity: day, week, month, quarter, year, hour, minute, second |
isDisplayField | boolean | No | Mark as the default display label for this dataset |
dateFormat | string | No | Custom date display format |
customFormat | string | No | Custom format string |
hidden | boolean | No | Hide from end users |
tags | string[] | No | Categorization tags |
sourceField | string | No | Physical column name. Required if no expression and name doesn't match a column |
expression | string | No | SQL expression for an inline dimension transform. Required if no sourceField. See Inline Expressions |
When using expression, the dimension is evaluated as a row-level SQL transform. Aggregate functions (SUM, COUNT, etc.) are not allowed in dimension expressions.
The expression field supports inline SQL transforms for dimensions. See Inline Expressions for syntax details and examples.
Metrics
Metrics are aggregatable fields that produce numeric results:
metrics:
- name: amount
label: "Revenue"
dataType: number
sourceField: amount
- name: quantity
label: "Quantity"
dataType: number
sourceField: quantity
- name: discounted_total
label: "Discounted Total"
dataType: number
expression: "amount * (1 - discount_rate)"| Field | Type | Required | Description |
|---|---|---|---|
name | string | Yes | Unique metric name |
label | string | No | Display name |
description | string | No | Help text |
dataType | enum | Yes | Usually number |
sourceField | string | Conditional | Physical column name. Required if no expression |
expression | string | Conditional | SQL expression. Required if no sourceField |
format | string | object | No | Display format specification |
unit | string | No | Unit label (e.g., "USD", "kg") |
isDefault | boolean | No | Mark as the default metric for this dataset |
hidden | boolean | No | Hide from end users |
filters | array | No | Default filters applied to this metric |
tags | string[] | No | Categorization tags |
Every metric must have either sourceField or expression. If neither is provided, validation will fail.
The expression and format fields support inline SQL transforms and display formatting. See Inline Expressions for syntax details, aggregation behavior, and examples.
Complete Fields Example
Putting it all together for an orders dataset:
datasets:
- name: orders
label: "Orders"
type: physical
connectionId: "conn_abc123"
connectionType: PostgreSQL
schema: public
table: orders
primaryKey: [order_id]
fields:
identifiers:
- name: order_id
label: "Order ID"
dataType: number
type: primary
- name: customer_id
label: "Customer ID"
dataType: number
type: foreign
dimensions:
- name: order_date
label: "Order Date"
dataType: date
granularity: day
- name: status
label: "Status"
dataType: string
- name: channel
label: "Sales Channel"
dataType: string
- name: order_quarter
label: "Order Quarter"
dataType: date
expression: "DATE_TRUNC('quarter', order_date)"
metrics:
- name: amount
label: "Revenue"
dataType: number
sourceField: amount
- name: quantity
label: "Quantity"
dataType: number
sourceField: quantity