Semaphor
Semantic DomainsTemplate Authoring

Datasets & Fields

Define data sources and their field structure in semantic domain templates

Datasets

Datasets represent your data sources. There are two types:

Physical Datasets

A physical dataset maps directly to a database table or view:

datasets:
  - name: orders
    label: "Orders"
    description: "Customer order transactions"
    type: physical
    connectionId: "conn_abc123"
    connectionType: PostgreSQL
    schema: public
    table: orders
    primaryKey:
      - order_id
    fields:
      identifiers: []
      dimensions: []
      metrics: []

Virtual Datasets

A virtual dataset uses a custom SQL query instead of a table:

You can author SQL-backed virtual datasets entirely from the UI via Add Dataset → SQL Dataset -- no hand-editing YAML required. See Virtual Datasets for the walkthrough.

datasets:
  - name: monthly_revenue
    label: "Monthly Revenue"
    type: virtual
    connectionId: "conn_abc123"
    connectionType: PostgreSQL
    sql: |
      SELECT
        DATE_TRUNC('month', order_date) AS month,
        SUM(amount) AS revenue,
        COUNT(*) AS order_count
      FROM orders
      GROUP BY DATE_TRUNC('month', order_date)
    fields:
      dimensions:
        - name: month
          label: "Month"
          dataType: date
      metrics:
        - name: revenue
          label: "Revenue"
          dataType: number
          sourceField: revenue
        - name: order_count
          label: "Order Count"
          dataType: number
          sourceField: order_count

Use virtual datasets for pre-aggregated data, complex transformations, or combining tables with custom logic.

Dataset Field Reference

FieldTypeRequiredDescription
namestringYesUnique identifier (snake_case)
labelstringYesUser-friendly display name
descriptionstringNoHelp text for users
typephysical | virtualYesTable-backed or SQL-backed
connectionIdstringYesID of the data connection
connectionTypeenumYesDatabase type
catalogstringNoDatabase catalog (BigQuery, Snowflake)
schemastringNoDatabase schema name
tablestringConditionalTable name (required for physical)
sqlstringConditionalSQL query (required for virtual)
primaryKeystring[]NoPrimary key columns (important for fan-out detection)
fieldsobjectYesContains identifiers, dimensions, metrics arrays

Supported connectionType values: PostgreSQL, MySQL, MSSQL, BigQuery, Redshift, Snowflake, clickhouse, S3, S3Tables, GoogleSheets, FileUpload, API


Primary Date Field

Each dataset can declare one date dimension as its primary date field. This is the dataset's canonical date — the one dashboard-level features use when they need a "default" date to filter on.

Today it powers the Global Date Filter: a single date range set once on a dashboard that applies to every eligible card, without per-card setup.

Declare it with primaryDateField on the dataset, referencing a dimension by name:

datasets:
  - name: orders
    label: "Orders"
    type: physical
    primaryDateField: order_date
    fields:
      dimensions:
        - name: order_date
          label: "Order Date"
          dataType: date
        - name: ship_date
          label: "Ship Date"
          dataType: datetime

In the semantic domain editor UI, open a dataset's dimensions and toggle Set as primary date field on the column you want. The toggle only appears on date or datetime dimensions.

Rules

A few things to keep in mind when setting a primary date field:

  • The field you reference must exist on the dataset.
  • It must be a date or datetime type — other data types aren't eligible, and the toggle won't appear on them.
  • Renames are safe. If you rename the field later, the primaryDateField reference updates automatically — no dashboard re-wiring needed.

Each dataset has one primary date field at a time. To switch to a different field, toggle Set as primary date field on the new date dimension (the previous one clears automatically), or update primaryDateField directly in YAML.

See Global Date Filter for the dashboard-side feature this unlocks.


Fields

Every dataset has three field buckets: identifiers, dimensions, and metrics. Each serves a distinct role in query generation.

Identifiers

Identifiers are primary and foreign key fields used for joins:

identifiers:
  - name: order_id
    label: "Order ID"
    dataType: number
    type: primary

  - name: customer_id
    label: "Customer ID"
    dataType: number
    type: foreign
FieldTypeRequiredDescription
namestringYesColumn name in the database
labelstringNoDisplay name
descriptionstringNoHelp text
dataTypeenumYesstring, number, boolean, date, datetime, json, geo
typeprimary | foreignYesPrimary keys enable fan-out detection
hiddenbooleanNoHide from end users
tagsstring[]NoCategorization tags

Always mark primary keys with type: primary. This enables Semaphor's fan-out prevention, which avoids metric inflation when joining one-to-many relationships.

Dimensions

Dimensions are groupable fields that appear in GROUP BY clauses:

dimensions:
  - name: order_date
    label: "Order Date"
    dataType: date
    granularity: day

  - name: status
    label: "Status"
    dataType: string

  - name: region
    label: "Region"
    dataType: string
    isDisplayField: true

  - name: order_month
    label: "Order Month"
    dataType: date
    expression: "DATE_TRUNC('month', order_date)"
FieldTypeRequiredDescription
namestringYesColumn name in the database
labelstringNoDisplay name
descriptionstringNoHelp text
dataTypeenumYesstring, number, boolean, date, datetime, json, geo
granularityenumNoTime granularity: day, week, month, quarter, year, hour, minute, second
isDisplayFieldbooleanNoMark as the default display label for this dataset
dateFormatstringNoCustom date display format
customFormatstringNoCustom format string
hiddenbooleanNoHide from end users
tagsstring[]NoCategorization tags
sourceFieldstringNoPhysical column name. Required if no expression and name doesn't match a column
expressionstringNoSQL expression for an inline dimension transform. Required if no sourceField. See Inline Expressions

When using expression, the dimension is evaluated as a row-level SQL transform. Aggregate functions (SUM, COUNT, etc.) are not allowed in dimension expressions.

The expression field supports inline SQL transforms for dimensions. See Inline Expressions for syntax details and examples.

Metrics

Metrics are aggregatable fields that produce numeric results:

metrics:
  - name: amount
    label: "Revenue"
    dataType: number
    sourceField: amount

  - name: quantity
    label: "Quantity"
    dataType: number
    sourceField: quantity

  - name: discounted_total
    label: "Discounted Total"
    dataType: number
    expression: "amount * (1 - discount_rate)"
FieldTypeRequiredDescription
namestringYesUnique metric name
labelstringNoDisplay name
descriptionstringNoHelp text
dataTypeenumYesUsually number
sourceFieldstringConditionalPhysical column name. Required if no expression
expressionstringConditionalSQL expression. Required if no sourceField
formatstring | objectNoDisplay format specification
unitstringNoUnit label (e.g., "USD", "kg")
isDefaultbooleanNoMark as the default metric for this dataset
hiddenbooleanNoHide from end users
filtersarrayNoDefault filters applied to this metric
tagsstring[]NoCategorization tags

Every metric must have either sourceField or expression. If neither is provided, validation will fail.

The expression and format fields support inline SQL transforms and display formatting. See Inline Expressions for syntax details, aggregation behavior, and examples.

Complete Fields Example

Putting it all together for an orders dataset:

datasets:
  - name: orders
    label: "Orders"
    type: physical
    connectionId: "conn_abc123"
    connectionType: PostgreSQL
    schema: public
    table: orders
    primaryKey: [order_id]
    fields:
      identifiers:
        - name: order_id
          label: "Order ID"
          dataType: number
          type: primary
        - name: customer_id
          label: "Customer ID"
          dataType: number
          type: foreign
      dimensions:
        - name: order_date
          label: "Order Date"
          dataType: date
          granularity: day
        - name: status
          label: "Status"
          dataType: string
        - name: channel
          label: "Sales Channel"
          dataType: string
        - name: order_quarter
          label: "Order Quarter"
          dataType: date
          expression: "DATE_TRUNC('quarter', order_date)"
      metrics:
        - name: amount
          label: "Revenue"
          dataType: number
          sourceField: amount
        - name: quantity
          label: "Quantity"
          dataType: number
          sourceField: quantity

On this page