Datasets & Fields

Datasets

Datasets represent your data sources. There are two types:

Physical Datasets

A physical dataset maps directly to a database table or view:

datasets:
  - name: orders
    label: "Orders"
    description: "Customer order transactions"
    type: physical
    connectionId: "conn_abc123"
    connectionType: PostgreSQL
    schema: public
    table: orders
    primaryKey:
      - order_id
    fields:
      identifiers: []
      dimensions: []
      metrics: []

Virtual Datasets

A virtual dataset uses a custom SQL query instead of a table:

You can author SQL-backed virtual datasets entirely from the UI via Add Dataset → SQL Dataset -- no hand-editing YAML required. See Virtual Datasets for the walkthrough.

datasets:
  - name: monthly_revenue
    label: "Monthly Revenue"
    type: virtual
    connectionId: "conn_abc123"
    connectionType: PostgreSQL
    sql: |
      SELECT
        DATE_TRUNC('month', order_date) AS month,
        SUM(amount) AS revenue,
        COUNT(*) AS order_count
      FROM orders
      GROUP BY DATE_TRUNC('month', order_date)
    fields:
      dimensions:
        - name: month
          label: "Month"
          dataType: date
      metrics:
        - name: revenue
          label: "Revenue"
          dataType: number
          sourceField: revenue
        - name: order_count
          label: "Order Count"
          dataType: number
          sourceField: order_count

Use virtual datasets for pre-aggregated data, complex transformations, or combining tables with custom logic.

Dataset Field Reference

Field	Type	Required	Description
`name`	string	Yes	Unique identifier (snake_case)
`label`	string	Yes	User-friendly display name
`description`	string	No	Help text for users
`type`	`physical` \| `virtual`	Yes	Table-backed or SQL-backed
`connectionId`	string	Yes	ID of the data connection
`connectionType`	enum	Yes	Database type
`catalog`	string	No	Database catalog (BigQuery, Snowflake)
`schema`	string	No	Database schema name
`table`	string	Conditional	Table name (required for physical)
`sql`	string	Conditional	SQL query (required for virtual)
`primaryKey`	string[]	No	Primary key columns (important for fan-out detection)
`fields`	object	Yes	Contains `identifiers`, `dimensions`, `metrics` arrays

Supported connectionType values: PostgreSQL, MySQL, MSSQL, BigQuery, Redshift, Snowflake, clickhouse, S3, S3Tables, GoogleSheets, FileUpload, API

Primary Date Field

Each dataset can declare one date dimension as its primary date field. This is the dataset's canonical date — the one dashboard-level features use when they need a "default" date to filter on.

Today it powers the Global Date Filter: a single date range set once on a dashboard that applies to every eligible card, without per-card setup.

Declare it with primaryDateField on the dataset, referencing a dimension by name:

datasets:
  - name: orders
    label: "Orders"
    type: physical
    primaryDateField: order_date
    fields:
      dimensions:
        - name: order_date
          label: "Order Date"
          dataType: date
        - name: ship_date
          label: "Ship Date"
          dataType: datetime

In the semantic domain editor UI, open a dataset's dimensions and toggle Set as primary date field on the column you want. The toggle only appears on date or datetime dimensions.

Rules

A few things to keep in mind when setting a primary date field:

The field you reference must exist on the dataset.
It must be a date or datetime type — other data types aren't eligible, and the toggle won't appear on them.
Renames are safe. If you rename the field later, the primaryDateField reference updates automatically — no dashboard re-wiring needed.

Each dataset has one primary date field at a time. To switch to a different field, toggle Set as primary date field on the new date dimension (the previous one clears automatically), or update primaryDateField directly in YAML.

See Global Date Filter for the dashboard-side feature this unlocks.

Fields

Every dataset has three field buckets: identifiers, dimensions, and metrics. Each serves a distinct role in query generation.

Identifiers

Identifiers are primary and foreign key fields used for joins:

identifiers:
  - name: order_id
    label: "Order ID"
    dataType: number
    type: primary

  - name: customer_id
    label: "Customer ID"
    dataType: number
    type: foreign

Field	Type	Required	Description
`name`	string	Yes	Column name in the database
`label`	string	No	Display name
`description`	string	No	Help text
`dataType`	enum	Yes	`string`, `number`, `boolean`, `date`, `datetime`, `json`, `geo`
`type`	`primary` \| `foreign`	Yes	Primary keys enable fan-out detection
`hidden`	boolean	No	Hide from end users
`tags`	string[]	No	Categorization tags

Always mark primary keys with type: primary. This enables Semaphor's fan-out prevention, which avoids metric inflation when joining one-to-many relationships.

Dimensions

Dimensions are groupable fields that appear in GROUP BY clauses:

dimensions:
  - name: order_date
    label: "Order Date"
    dataType: date
    granularity: day

  - name: status
    label: "Status"
    dataType: string

  - name: region
    label: "Region"
    dataType: string
    isDisplayField: true

  - name: order_month
    label: "Order Month"
    dataType: date
    expression: "DATE_TRUNC('month', order_date)"

Field	Type	Required	Description
`name`	string	Yes	Column name in the database
`label`	string	No	Display name
`description`	string	No	Help text
`dataType`	enum	Yes	`string`, `number`, `boolean`, `date`, `datetime`, `json`, `geo`
`granularity`	enum	No	Time granularity: `day`, `week`, `month`, `quarter`, `year`, `hour`, `minute`, `second`
`isDisplayField`	boolean	No	Mark as the default display label for this dataset
`dateFormat`	string	No	Custom date display format
`customFormat`	string	No	Custom format string
`hidden`	boolean	No	Hide from end users
`tags`	string[]	No	Categorization tags
`sourceField`	string	No	Physical column name. Required if no `expression` and name doesn't match a column
`expression`	string	No	SQL expression for an inline dimension transform. Required if no `sourceField`. See Inline Expressions

When using expression, the dimension is evaluated as a row-level SQL transform. Aggregate functions (SUM, COUNT, etc.) are not allowed in dimension expressions.

The expression field supports inline SQL transforms for dimensions. See Inline Expressions for syntax details and examples.

Metrics

Metrics are aggregatable fields that produce numeric results:

metrics:
  - name: amount
    label: "Revenue"
    dataType: number
    sourceField: amount

  - name: quantity
    label: "Quantity"
    dataType: number
    sourceField: quantity

  - name: discounted_total
    label: "Discounted Total"
    dataType: number
    expression: "amount * (1 - discount_rate)"

Field	Type	Required	Description
`name`	string	Yes	Unique metric name
`label`	string	No	Display name
`description`	string	No	Help text
`dataType`	enum	Yes	Usually `number`
`sourceField`	string	Conditional	Physical column name. Required if no `expression`
`expression`	string	Conditional	SQL expression. Required if no `sourceField`
`format`	string \| object	No	Display format specification
`unit`	string	No	Unit label (e.g., "USD", "kg")
`isDefault`	boolean	No	Mark as the default metric for this dataset
`hidden`	boolean	No	Hide from end users
`filters`	array	No	Default filters applied to this metric
`tags`	string[]	No	Categorization tags

Every metric must have either sourceField or expression. If neither is provided, validation will fail.

The expression and format fields support inline SQL transforms and display formatting. See Inline Expressions for syntax details, aggregation behavior, and examples.

Complete Fields Example

Putting it all together for an orders dataset:

datasets:
  - name: orders
    label: "Orders"
    type: physical
    connectionId: "conn_abc123"
    connectionType: PostgreSQL
    schema: public
    table: orders
    primaryKey: [order_id]
    fields:
      identifiers:
        - name: order_id
          label: "Order ID"
          dataType: number
          type: primary
        - name: customer_id
          label: "Customer ID"
          dataType: number
          type: foreign
      dimensions:
        - name: order_date
          label: "Order Date"
          dataType: date
          granularity: day
        - name: status
          label: "Status"
          dataType: string
        - name: channel
          label: "Sales Channel"
          dataType: string
        - name: order_quarter
          label: "Order Quarter"
          dataType: date
          expression: "DATE_TRUNC('quarter', order_date)"
      metrics:
        - name: amount
          label: "Revenue"
          dataType: number
          sourceField: amount
        - name: quantity
          label: "Quantity"
          dataType: number
          sourceField: quantity

Datasets & Fields

On this page