Datasets & Fields
Define data sources and their field structure in semantic domain templates
Datasets
Datasets represent your data sources. There are two types:
Physical Datasets
A physical dataset maps directly to a database table or view:
Virtual Datasets
A virtual dataset uses a custom SQL query instead of a table:
Use virtual datasets for pre-aggregated data, complex transformations, or combining tables with custom logic.
Dataset Field Reference
| Field | Type | Required | Description |
|---|---|---|---|
name | string | Yes | Unique identifier (snake_case) |
label | string | Yes | User-friendly display name |
description | string | No | Help text for users |
type | physical | virtual | Yes | Table-backed or SQL-backed |
connectionId | string | Yes | ID of the data connection |
connectionType | enum | Yes | Database type |
catalog | string | No | Database catalog (BigQuery, Snowflake) |
schema | string | No | Database schema name |
table | string | Conditional | Table name (required for physical) |
sql | string | Conditional | SQL query (required for virtual) |
primaryKey | string[] | No | Primary key columns (important for fan-out detection) |
fields | object | Yes | Contains identifiers, dimensions, metrics arrays |
Supported connectionType values: PostgreSQL, MySQL, MSSQL, BigQuery, Redshift, Snowflake, clickhouse, S3, S3Tables, GoogleSheets, FileUpload, API
Fields
Every dataset has three field buckets: identifiers, dimensions, and metrics. Each serves a distinct role in query generation.
Identifiers
Identifiers are primary and foreign key fields used for joins:
| Field | Type | Required | Description |
|---|---|---|---|
name | string | Yes | Column name in the database |
label | string | No | Display name |
description | string | No | Help text |
dataType | enum | Yes | string, number, boolean, date, datetime, json, geo |
type | primary | foreign | Yes | Primary keys enable fan-out detection |
hidden | boolean | No | Hide from end users |
tags | string[] | No | Categorization tags |
Always mark primary keys with type: primary. This enables Semaphor's fan-out prevention, which avoids metric inflation when joining one-to-many relationships.
Dimensions
Dimensions are groupable fields that appear in GROUP BY clauses:
| Field | Type | Required | Description |
|---|---|---|---|
name | string | Yes | Column name in the database |
label | string | No | Display name |
description | string | No | Help text |
dataType | enum | Yes | string, number, boolean, date, datetime, json, geo |
granularity | enum | No | Time granularity: day, week, month, quarter, year, hour, minute, second |
isDisplayField | boolean | No | Mark as the default display label for this dataset |
dateFormat | string | No | Custom date display format |
customFormat | string | No | Custom format string |
hidden | boolean | No | Hide from end users |
tags | string[] | No | Categorization tags |
Metrics
Metrics are aggregatable fields that produce numeric results:
| Field | Type | Required | Description |
|---|---|---|---|
name | string | Yes | Unique metric name |
label | string | No | Display name |
description | string | No | Help text |
dataType | enum | Yes | Usually number |
sourceField | string | Conditional | Physical column name. Required if no expression |
expression | string | Conditional | SQL expression. Required if no sourceField |
format | string | object | No | Display format specification |
unit | string | No | Unit label (e.g., "USD", "kg") |
isDefault | boolean | No | Mark as the default metric for this dataset |
hidden | boolean | No | Hide from end users |
filters | array | No | Default filters applied to this metric |
tags | string[] | No | Categorization tags |
Every metric must have either sourceField or expression. If neither is provided, validation will fail.
The expression and format fields support inline SQL transforms and display formatting. See Metric Expressions & Formatting for syntax details, aggregation behavior, and examples.
Complete Fields Example
Putting it all together for an orders dataset: