logoSemaphor
Semantic Domains

Admin Setup Guide

Complete guide for administrators to create and configure semantic domains

  1. Log in to Semaphor
  2. Select your project from the project list
  3. Click on "Domains" tab in the top navigation

Creating a Domain

Step 1: Create a New Domain

  1. On the Domains page, click the "Create Domain" button
  2. Fill in the domain details:
    • Name: A unique identifier (e.g., sales_analytics)
    • Label: User-friendly display name (e.g., Sales Analytics)
    • Description: Brief explanation of what this domain covers
    • Connection: Select the data connection this domain uses
  3. Click "Create" to save your domain

Quick Setup Summary

Before diving into the details, here's the complete workflow at a glance:

  1. Create Domain → Name it and select a connection
  2. Add Datasets → Select tables, let AI generate labels (30-60s per dataset)
  3. Discover Relationships → Run AI discovery, accept suggestions (1-2 minutes)
  4. Save & Test → Save domain, test in a dashboard card

Total time: 5-10 minutes for a typical domain with 4-5 datasets

The AI does most of the work! Semaphor automatically generates field labels, descriptions, and discovers relationships between your datasets. Your main job is reviewing and accepting suggestions.


Now let's walk through each step in detail.

Step 2: Open the Domain Editor

After creating your domain, you'll be taken to the Domain Editor, which has four tabs:

  • Datasets: Add and configure your data sources
  • Relationships: Define how datasets connect
  • Code Editor: View and edit the domain as YAML/JSON
  • Preview: Test your domain configuration

Adding Datasets

After creating your domain, you'll add the datasets (tables) you want users to explore.

Step 1: Select Your Datasets

  1. In the Domain Editor, ensure you're on the Datasets tab
  2. Click the "Add Dataset" button
  3. Select the tables or views you want to include from your database
  4. Click "Add" to add each dataset

AI-Powered Labels: After adding datasets, Semaphor automatically generates user-friendly labels, field names, and descriptions using AI. This process takes a few moments but saves you from manually configuring each field.

Step 2: Wait for AI Processing

After you add datasets and click Save, Semaphor's AI automatically:

  • Generates customer-facing labels for table names (e.g., orders_tbl → "Orders")
  • Creates readable field names (e.g., cust_id → "Customer ID")
  • Adds helpful descriptions for fields
  • Classifies fields as dimensions (groupable) or measures (aggregatable)
  • Suggests appropriate aggregation functions (SUM, COUNT, AVG)

Note: This AI operation typically takes 30-60 seconds per dataset. You'll see a loading indicator while processing.

Adding Datasets

Step 3: Review AI-Generated Fields (Optional)

After AI processing completes, you can review and adjust the generated labels:

  1. Click the "Edit" button (pencil icon) next to a dataset
  2. Review the AI-generated field labels and descriptions
  3. Make adjustments if needed (most users find the AI labels work well as-is)
  4. Hide any fields that shouldn't be visible to end users

Advanced: You can also create virtual datasets using custom SQL queries. See the Advanced Configuration section below.


That's all you need to get your datasets ready! The AI handles the labeling and classification automatically. Next, you'll define how these datasets relate to each other.

Creating Relationships

Relationships define how datasets connect to each other, enabling automatic joins when users explore data across multiple datasets.

Recommended Workflow: Use AI Discovery to automatically identify relationships. Most users find they don't need to manually define any relationships after using AI discovery.

Using AI-Powered Relationship Discovery

Semaphor can automatically discover potential relationships between your datasets using AI analysis of field names, data types, and semantic patterns. This is the fastest way to set up your domain.

The relationship discovery process typically takes 1-2 minutes to complete.

The entire process takes just a few steps:

  1. Click "Discover Relationships" in the Relationships tab
  2. Click "Run Discovery" and wait 1-2 minutes while AI analyzes your datasets
  3. Review the suggestions - each shows confidence (High/Medium/Low) and rationale
  4. Select relationships to add - use "Select High Confidence Only" or review each one
  5. Click "Accept Selected" - done!

Discover Relationships

What the AI analyzes:

  • Field names (e.g., customer_id likely joins to customers.id)
  • Data types (matching types for join keys)
  • Semantic patterns (fact tables to dimension tables)
  • Table structures (primary/foreign key patterns)

What you'll see in suggestions:

  • Source → Target: Which datasets connect (e.g., orders → customers)
  • Join Fields: Which fields match (e.g., orders.customer_id = customers.id)
  • Confidence: High, Medium, or Low
  • Rationale: Why the AI thinks this relationship exists
  • Cardinality: One-to-one, one-to-many, etc.

Tip: Start by accepting only High Confidence relationships. You can always run discovery again or manually add relationships later.


That's it! After accepting relationships, your domain is ready for users to query. The relationships enable automatic joins when users select fields from multiple datasets.

Advanced: Manual Relationship Management

Note: Most users don't need to manually create relationships after using AI Discovery. This section is for advanced use cases.

If you need to manually add or edit a relationship:

  1. In the Relationships tab, click "Add Relationship"
  2. Fill in the relationship details:
    • Name: Identifier (e.g., orders_to_customers)
    • Source/Target Datasets: Which datasets to connect
    • Join Fields: Which fields to match
    • Cardinality: One-to-one, one-to-many, etc.
    • Join Type: INNER, LEFT, RIGHT, or FULL
  3. Enable Auto-Join to allow automatic join resolution
  4. Click "Save"

When to manually add relationships:

  • AI didn't detect a relationship you know exists
  • You need to specify a specific join type (e.g., INNER vs LEFT)
  • You're working with non-standard naming conventions
  • You need composite key relationships (multiple join fields)

Testing Your Domain

Step 1: Save Your Changes

After configuring datasets and relationships:

  1. Click the "Save" button in the top-right corner
  2. Wait for the confirmation message
  3. Your changes are now persisted

Step 2: Preview the Domain

  1. Click the "Preview" tab in the Domain Editor
  2. You'll see a summary of your domain:
    • All datasets with their field counts
    • All relationships with source/target info
    • Validation warnings (if any)

Step 3: Test in a Dashboard

  1. Create a new dashboard or open an existing one
  2. Add a new card
  3. Select "Semantic Domain" as the data source
  4. Choose your domain from the dropdown
  5. Try selecting fields from different datasets
    • If relationships are configured correctly, joins happen automatically
    • If fields from unrelated datasets are selected, you'll see an error

Advanced Configuration

Advanced Dataset Configuration

Creating Virtual Datasets

Virtual datasets allow you to define custom SQL queries as reusable datasets. This is useful for:

  • Pre-aggregated data (e.g., monthly revenue summaries)
  • Complex transformations
  • Combining multiple tables with custom logic

To create a virtual dataset:

  1. Click "Add Dataset" and select "Virtual Dataset"
  2. Enter a dataset name (e.g., monthly_revenue)
  3. Write your SQL query:
SELECT
  DATE_TRUNC('month', order_date) as month,
  SUM(total_amount) as revenue,
  COUNT(*) as order_count
FROM orders
GROUP BY DATE_TRUNC('month', order_date)
  1. AI will generate field labels for your query results
  2. Click "Add Dataset"

Virtual datasets are materialized as subqueries when used in joins or filters.

Manual Field Configuration

If you need to manually adjust field properties beyond what AI generates:

  1. Click "Edit" on a dataset
  2. For each field, you can modify:
    • Label: User-friendly display name
    • Description: Help text explaining the field
    • Data Type: String, Number, Date, DateTime, Boolean
    • Field Type: Dimension (groupable) or Measure (aggregatable)
    • Aggregate Function: SUM, AVG, COUNT, MIN, MAX for measures
    • Visibility: Show/hide from users (eye icon)

Best practices:

  • Use clear labels: "Order Date" instead of "ord_dt"
  • Add descriptions to help users understand fields
  • Mark dimensions correctly for proper grouping
  • Set default aggregations for measures

Domain Template Schema

Every semantic domain is stored as a template that defines datasets, relationships, and calculated fields. Understanding the schema helps with advanced configuration, version control, and CI/CD pipelines.

Template Structure

schemaVersion: "2.0"
datasets: [...]              # Array of dataset definitions
relationships: [...]         # Array of relationship definitions
calculatedMetrics: {...}     # Record<name, definition>
calculatedDimensions: {...}  # Record<name, definition>

Dataset Schema

Each dataset in the datasets array has this structure:

FieldTypeRequiredDescription
namestringYesUnique identifier for the dataset
labelstringYesUser-friendly display name
descriptionstringNoHelp text for users
typephysical | virtualYesPhysical table/view or custom SQL
connectionIdstringYesID of the data connection
connectionTypestringYesDatabase type (PostgreSQL, MySQL, etc.)
schemastringNoDatabase schema name
tablestringNoTable name (for physical datasets)
sqlstringNoCustom SQL query (for virtual datasets)
fieldsobjectYesField definitions (see below)
primaryKeystring[]NoPrimary key columns

Fields Object:

fields:
  identifiers:    # Primary/foreign key fields
    - name: id
      label: "Order ID"
      dataType: number
      type: primary
  dimensions:     # Groupable fields
    - name: order_date
      label: "Order Date"
      dataType: date
      granularity: day
  metrics:        # Aggregatable fields
    - name: amount
      label: "Amount"
      dataType: number
      sourceField: amount

Relationship Schema

Each relationship in the relationships array:

FieldTypeRequiredDescription
namestringYesUnique identifier
sourceDatasetstringYesName of source dataset
sourceFieldsstring[]YesJoin columns in source
targetDatasetstringYesName of target dataset
targetFieldsstring[]YesJoin columns in target
cardinalityenumNoone_to_one, one_to_many, many_to_one, many_to_many
defaultJoinTypeenumYesINNER, LEFT, RIGHT, FULL
isAutoJoinbooleanYesEnable automatic join resolution
discoveredByenumYesuser, auto, fk_constraint, ai
isActivebooleanYesWhether the relationship is active

Calculated Metric Schema

Calculated metrics are stored in calculatedMetrics as a record (name → definition):

calculatedMetrics:
  profit_margin:
    label: "Profit Margin"
    description: "Profit as percentage of revenue"
    expression: "({revenue} - {cost}) / NULLIF({revenue}, 0)"
    inputs:
      revenue:
        dataset: orders
        field: amount
        aggregate: SUM
      cost:
        dataset: orders
        field: cost
        aggregate: SUM
    metricType: derived
    format:
      type: percentage
      decimals: 2
    tags:
      - finance
      - kpi
FieldTypeRequiredDescription
labelstringYesDisplay name
descriptionstringNoHelp text
expressionstringYesCalculation formula with {input} tokens
inputsobjectYesInput field references
metricTypeenumNobase, derived, calculated
formatobjectNoDisplay formatting options
tagsstring[]NoCategorization tags

Calculated Dimension Schema

Calculated dimensions are stored in calculatedDimensions as a record:

calculatedDimensions:
  order_size_bucket:
    label: "Order Size"
    description: "Categorizes orders by amount"
    expression: |
      CASE
        WHEN {amount} > 1000 THEN 'Large'
        WHEN {amount} > 100 THEN 'Medium'
        ELSE 'Small'
      END
    inputs:
      amount:
        dataset: orders
        field: amount
    dataType: string
FieldTypeRequiredDescription
labelstringYesDisplay name
descriptionstringNoHelp text
expressionstringYesSQL expression with {input} tokens
inputsobjectYesInput field references
dataTypeenumNostring, number, boolean, date, datetime

Direct Template Editing

Power users can edit the domain template directly as YAML or JSON. This is useful for:

  • Bulk edits: Rename multiple fields at once
  • Version control: Store templates in Git for change tracking
  • CI/CD pipelines: Automate domain updates as part of deployments
  • Migration: Copy configurations between environments

Accessing the Code Editor

  1. Open your domain in the Domain Editor
  2. Click the Code Editor tab
  3. View and edit the full template as YAML or JSON
  4. Click Save to apply changes

Direct template edits bypass validation. Test thoroughly after making changes, especially to relationships and calculated fields.

Complete Template Example

Here's a full domain template for a sales analytics domain:

schemaVersion: "2.0"
 
datasets:
  - name: orders
    label: "Orders"
    description: "Customer order transactions"
    type: physical
    connectionId: "conn_abc123"
    connectionType: PostgreSQL
    schema: public
    table: orders
    primaryKey:
      - id
    fields:
      identifiers:
        - name: id
          label: "Order ID"
          dataType: number
          type: primary
        - name: customer_id
          label: "Customer ID"
          dataType: number
          type: foreign
      dimensions:
        - name: order_date
          label: "Order Date"
          dataType: date
          granularity: day
        - name: status
          label: "Status"
          dataType: string
      metrics:
        - name: amount
          label: "Amount"
          dataType: number
          sourceField: amount
        - name: quantity
          label: "Quantity"
          dataType: number
          sourceField: quantity
 
  - name: customers
    label: "Customers"
    type: physical
    connectionId: "conn_abc123"
    connectionType: PostgreSQL
    schema: public
    table: customers
    primaryKey:
      - id
    fields:
      identifiers:
        - name: id
          label: "Customer ID"
          dataType: number
          type: primary
      dimensions:
        - name: segment
          label: "Customer Segment"
          dataType: string
        - name: region
          label: "Region"
          dataType: string
 
relationships:
  - name: orders_to_customers
    sourceDataset: orders
    sourceFields:
      - customer_id
    targetDataset: customers
    targetFields:
      - id
    cardinality: many_to_one
    defaultJoinType: LEFT
    isAutoJoin: true
    discoveredBy: ai
    confidence: high
    isActive: true
 
calculatedMetrics:
  average_order_value:
    label: "Average Order Value"
    description: "Average revenue per order"
    expression: "{total_revenue} / NULLIF({order_count}, 0)"
    inputs:
      total_revenue:
        dataset: orders
        field: amount
        aggregate: SUM
      order_count:
        dataset: orders
        field: id
        aggregate: COUNT
    metricType: derived
    format:
      type: currency
      decimals: 2
      currency: USD
    tags:
      - kpi
      - sales
 
  profit_margin:
    label: "Profit Margin %"
    description: "Profit as percentage of revenue"
    expression: "({revenue} - {cost}) / NULLIF({revenue}, 0) * 100"
    inputs:
      revenue:
        dataset: orders
        field: amount
        aggregate: SUM
      cost:
        dataset: orders
        field: cost
        aggregate: SUM
    metricType: derived
    format:
      type: percentage
      decimals: 1
    tags:
      - kpi
      - finance
 
calculatedDimensions:
  order_size_bucket:
    label: "Order Size"
    description: "Categorizes orders by amount"
    expression: |
      CASE
        WHEN {amount} > 1000 THEN 'Large'
        WHEN {amount} > 100 THEN 'Medium'
        ELSE 'Small'
      END
    inputs:
      amount:
        dataset: orders
        field: amount
    dataType: string
 
  fiscal_quarter:
    label: "Fiscal Quarter"
    description: "Fiscal quarter based on order date"
    expression: |
      CONCAT('Q', EXTRACT(QUARTER FROM {order_date}), ' ', EXTRACT(YEAR FROM {order_date}))
    inputs:
      order_date:
        dataset: orders
        field: order_date
    dataType: string

Domain-Level Calculated Fields

Add reusable calculated metrics and dimensions directly to your domain template. These fields become available to all users across all dashboards that use the domain.

Adding Calculated Metrics

Define metrics in the calculatedMetrics section of your template:

calculatedMetrics:
  conversion_rate:
    label: "Conversion Rate"
    description: "Percentage of visitors who made a purchase"
    expression: "{purchases} / NULLIF({visitors}, 0) * 100"
    inputs:
      purchases:
        dataset: orders
        field: id
        aggregate: COUNT
      visitors:
        dataset: sessions
        field: visitor_id
        aggregate: COUNT_DISTINCT
    format:
      type: percentage
      decimals: 2
    tags:
      - marketing
      - kpi

Adding Calculated Dimensions

Define dimensions in the calculatedDimensions section:

calculatedDimensions:
  customer_lifetime_tier:
    label: "Customer Lifetime Tier"
    description: "Customer value tier based on total spend"
    expression: |
      CASE
        WHEN {lifetime_value} > 10000 THEN 'Platinum'
        WHEN {lifetime_value} > 5000 THEN 'Gold'
        WHEN {lifetime_value} > 1000 THEN 'Silver'
        ELSE 'Bronze'
      END
    inputs:
      lifetime_value:
        dataset: customers
        field: total_spend
    dataType: string

Best Practices for Domain Fields

  1. Document thoroughly: Include descriptions for every field. Users rely on this context.
  2. Use consistent naming: Follow a pattern like metric_name or dimension_category.
  3. Add tags: Group related fields (e.g., finance, marketing, operations).
  4. Handle edge cases: Use NULLIF for division, COALESCE for null values.
  5. Test before publishing: Validate expressions with sample data.

Domain-level calculated fields appear in the Explorer with a Domain badge, distinguishing them from card-scoped or dashboard-scoped fields.


Next Steps

Once your semantic domain is configured:

  1. Train your users: Share the User Guide with end users
  2. Monitor usage: Track which relationships are used most frequently
  3. Gather feedback: Ask users which fields or relationships they need
  4. Optimize performance: Review query logs for slow multi-hop joins
  5. Expand gradually: Add new datasets and relationships based on user needs