Querying Data

Semaphor MCP provides two query tools:

semaphor_analyze for governed semantic analytics.
semaphor_query_sql_advanced as the raw-SQL escape hatch when the semantic tool cannot express the analysis.

Start every data question with semaphor_get_analysis_context. If the response recommends the semantic path, discover domains and datasets first, then use semaphor_analyze.

Governed Analytics

semaphor_analyze is the default tool for ordinary BI questions: metric by dimension, top-N, time series, filters, latest-available windows, period comparisons, period-change driver analysis, and relationship-aware joins.

Best for: Business analytics that should use curated semantic domains, governed joins, modeled metric aggregates, date-window resolution, and typed execution diagnostics.

Key characteristics:

Uses semantic domain datasets and source-bearing field references.
Respects security policies.
Auto-resolves safe joins through semantic relationships.
Returns generated SQL, records, column metadata, relationship diagnostics, and a typed execution result.
Uses each semantic metric's authored aggregate by default.
Allows a query-specific metric aggregate override when the user asks for a different rollup.

Top suppliers by modeled purchase value

{
  "domainId": "dom_abc",
  "datasetName": "fact_purchase_line",
  "metrics": [
    { "datasetName": "fact_purchase_line", "name": "net_purchase_value" }
  ],
  "primaryMetric": {
    "datasetName": "fact_purchase_line",
    "name": "net_purchase_value"
  },
  "dimensions": [
    { "datasetName": "dim_supplier", "name": "supplier_name" }
  ],
  "limit": 10,
  "response_format": "json"
}

If net_purchase_value is modeled with aggregate: "SUM", Semaphor generates a summed top-N query. The semantic model owns that default.

Aggregate Overrides

Sometimes the same metric can be analyzed with a different aggregate for one question. For example, total margin and average margin are both valid questions.

Average margin by material

{
  "domainId": "dom_abc",
  "datasetName": "fact_sales_line",
  "metrics": [
    {
      "datasetName": "fact_sales_line",
      "name": "gross_margin_value",
      "aggregate": "AVG"
    }
  ],
  "primaryMetric": {
    "datasetName": "fact_sales_line",
    "name": "gross_margin_value",
    "aggregate": "AVG"
  },
  "dimensions": [
    { "datasetName": "dim_material", "name": "material_family" }
  ],
  "limit": 10,
  "response_format": "json"
}

The override applies only to this MCP call. The response marks the override in execution metadata so clients can audit that the query used AVG instead of the semantic default.

Omit aggregate to use the semantic model. Add aggregate only when the user explicitly asks for a different rollup such as average, maximum, or distinct count.

Advanced SQL

semaphor_query_sql_advanced executes raw SQL. Use it only when the request is SQL-first or cannot be represented with semaphor_analyze.

Best for: Custom CTEs, window functions, unsupported semantic relationships, fields not exposed in the semantic domain, or exploratory SQL debugging.

Key characteristics:

Pass a SQL string directly.
Respects security policies.
Optional Python post-processing is available when supported by the client/tool configuration.
Requires explicit connection context.
Does not provide semantic relationship diagnostics or typed analytic intent coverage.

Advanced SQL example

{
  "connectionId": "conn_abc123",
  "sql": "WITH monthly AS (SELECT date_trunc('month', order_date) AS month, SUM(total_amount) AS revenue FROM orders GROUP BY 1) SELECT month, revenue, revenue - LAG(revenue) OVER (ORDER BY month) AS delta FROM monthly ORDER BY month"
}

Choosing Between Tools

	`semaphor_analyze`	`semaphor_query_sql_advanced`
Default choice	Yes	No
Query format	Semantic fields and analytics intent	Raw SQL
Semantic joins	Automatic and governed	Manual
Metric aggregates	From semantic model, with explicit override support	Whatever SQL says
Diagnostics	Relationship, fan-out, validation, date-window, and execution metadata	SQL execution result
Best for	Governed BI questions	Advanced SQL-first analysis

Response Format

Both query tools accept response_format:

"json" -- machine-readable payload with records, metadata, diagnostics, and generated SQL where available.
"markdown" -- human-readable summary for conversational answers.

Use "json" when building programmatic integrations or evals. Use "markdown" when the AI is presenting results directly to a user.

Tips

Prefer semantic discovery first -- Use semaphor_get_analysis_context, then semaphor_list_semantic_domains, semaphor_list_datasets, and semaphor_get_dataset_schema.
Use source-bearing refs -- When selecting fields from joined datasets, include datasetName on each field ref.
Let the semantic model choose aggregates -- Omit aggregate unless the question asks for a different rollup.
Use SQL as an escape hatch -- Raw SQL is valid, but it should not be the first choice for normal governed analytics.