Semaphor
Semantic DomainsTemplate Authoring

Datasets & Fields

Define data sources and their field structure in semantic domain templates

Datasets

Datasets represent your data sources. There are two types:

Physical Datasets

A physical dataset maps directly to a database table or view:

datasets:
  - name: orders
    label: "Orders"
    description: "Customer order transactions"
    type: physical
    connectionId: "conn_abc123"
    connectionType: PostgreSQL
    schema: public
    table: orders
    primaryKey:
      - order_id
    fields:
      identifiers: []
      dimensions: []
      metrics: []

Virtual Datasets

A virtual dataset uses a custom SQL query instead of a table:

datasets:
  - name: monthly_revenue
    label: "Monthly Revenue"
    type: virtual
    connectionId: "conn_abc123"
    connectionType: PostgreSQL
    sql: |
      SELECT
        DATE_TRUNC('month', order_date) AS month,
        SUM(amount) AS revenue,
        COUNT(*) AS order_count
      FROM orders
      GROUP BY DATE_TRUNC('month', order_date)
    fields:
      dimensions:
        - name: month
          label: "Month"
          dataType: date
      metrics:
        - name: revenue
          label: "Revenue"
          dataType: number
          sourceField: revenue
        - name: order_count
          label: "Order Count"
          dataType: number
          sourceField: order_count

Use virtual datasets for pre-aggregated data, complex transformations, or combining tables with custom logic.

Dataset Field Reference

FieldTypeRequiredDescription
namestringYesUnique identifier (snake_case)
labelstringYesUser-friendly display name
descriptionstringNoHelp text for users
typephysical | virtualYesTable-backed or SQL-backed
connectionIdstringYesID of the data connection
connectionTypeenumYesDatabase type
catalogstringNoDatabase catalog (BigQuery, Snowflake)
schemastringNoDatabase schema name
tablestringConditionalTable name (required for physical)
sqlstringConditionalSQL query (required for virtual)
primaryKeystring[]NoPrimary key columns (important for fan-out detection)
fieldsobjectYesContains identifiers, dimensions, metrics arrays

Supported connectionType values: PostgreSQL, MySQL, MSSQL, BigQuery, Redshift, Snowflake, clickhouse, S3, S3Tables, GoogleSheets, FileUpload, API


Fields

Every dataset has three field buckets: identifiers, dimensions, and metrics. Each serves a distinct role in query generation.

Identifiers

Identifiers are primary and foreign key fields used for joins:

identifiers:
  - name: order_id
    label: "Order ID"
    dataType: number
    type: primary
 
  - name: customer_id
    label: "Customer ID"
    dataType: number
    type: foreign
FieldTypeRequiredDescription
namestringYesColumn name in the database
labelstringNoDisplay name
descriptionstringNoHelp text
dataTypeenumYesstring, number, boolean, date, datetime, json, geo
typeprimary | foreignYesPrimary keys enable fan-out detection
hiddenbooleanNoHide from end users
tagsstring[]NoCategorization tags

Always mark primary keys with type: primary. This enables Semaphor's fan-out prevention, which avoids metric inflation when joining one-to-many relationships.

Dimensions

Dimensions are groupable fields that appear in GROUP BY clauses:

dimensions:
  - name: order_date
    label: "Order Date"
    dataType: date
    granularity: day
 
  - name: status
    label: "Status"
    dataType: string
 
  - name: region
    label: "Region"
    dataType: string
    isDisplayField: true
FieldTypeRequiredDescription
namestringYesColumn name in the database
labelstringNoDisplay name
descriptionstringNoHelp text
dataTypeenumYesstring, number, boolean, date, datetime, json, geo
granularityenumNoTime granularity: day, week, month, quarter, year, hour, minute, second
isDisplayFieldbooleanNoMark as the default display label for this dataset
dateFormatstringNoCustom date display format
customFormatstringNoCustom format string
hiddenbooleanNoHide from end users
tagsstring[]NoCategorization tags

Metrics

Metrics are aggregatable fields that produce numeric results:

metrics:
  - name: amount
    label: "Revenue"
    dataType: number
    sourceField: amount
 
  - name: quantity
    label: "Quantity"
    dataType: number
    sourceField: quantity
 
  - name: discounted_total
    label: "Discounted Total"
    dataType: number
    expression: "amount * (1 - discount_rate)"
FieldTypeRequiredDescription
namestringYesUnique metric name
labelstringNoDisplay name
descriptionstringNoHelp text
dataTypeenumYesUsually number
sourceFieldstringConditionalPhysical column name. Required if no expression
expressionstringConditionalSQL expression. Required if no sourceField
formatstring | objectNoDisplay format specification
unitstringNoUnit label (e.g., "USD", "kg")
isDefaultbooleanNoMark as the default metric for this dataset
hiddenbooleanNoHide from end users
filtersarrayNoDefault filters applied to this metric
tagsstring[]NoCategorization tags

Every metric must have either sourceField or expression. If neither is provided, validation will fail.

The expression and format fields support inline SQL transforms and display formatting. See Metric Expressions & Formatting for syntax details, aggregation behavior, and examples.

Complete Fields Example

Putting it all together for an orders dataset:

datasets:
  - name: orders
    label: "Orders"
    type: physical
    connectionId: "conn_abc123"
    connectionType: PostgreSQL
    schema: public
    table: orders
    primaryKey: [order_id]
    fields:
      identifiers:
        - name: order_id
          label: "Order ID"
          dataType: number
          type: primary
        - name: customer_id
          label: "Customer ID"
          dataType: number
          type: foreign
      dimensions:
        - name: order_date
          label: "Order Date"
          dataType: date
          granularity: day
        - name: status
          label: "Status"
          dataType: string
        - name: channel
          label: "Sales Channel"
          dataType: string
      metrics:
        - name: amount
          label: "Revenue"
          dataType: number
          sourceField: amount
        - name: quantity
          label: "Quantity"
          dataType: number
          sourceField: quantity

On this page