Connections

Connecting to your data

Semaphor allows you to connect to a wide array of data sources, from databases and files to live APIs. If there's a custom data source you would like to connect to, feel free to send a request to support@semaphor.cloud.

Amazon S3

Setting Up a Cross-Account IAM Role for Semaphor Access

1) Create a Cross-Account IAM Role

2) Assign Permissions to the IAM Role

  • In the AWS Console, navigate to the IAM role's Permissions tab.
  • Assign a policy to the role that grants access to your S3 bucket. The resource in the policy should specify your S3 bucket.
Permission Policy
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": ["s3:GetObject", "s3:ListBucket"],
      "Resource": ["arn:aws:s3:::my-s3-bucket", "arn:aws:s3:::my-s3-bucket/*"]
    }
  ]
}

3) Define Trust Relationships

  • Go to the Trust relationships tab of the IAM role.
  • Define a trust relationship that allows Semaphor to access your S3 bucket. You can copy and paste the trust relationship stub from the Semaphor console.
Trust Policy
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::756808986636:root"
      },
      "Action": "sts:AssumeRole",
      "Condition": {
        "StringEquals": {
          "sts:ExternalId": "my_project_id"
        }
      }
    }
  ]
}

4) Use the Fully Qualified Role ARN

  • After creating the role, use the fully qualified role ARN when setting up the connection in the Semaphor console. For example: arn:aws:iam::756808986636:role/semaphor-access-role.
  • Provide the region of your S3 bucket.

5) Verify the Connection

  • Check the connection by clicking the ⚡️ button in the Semaphor console. A green ✓ mark indicates that the connection is successful.

Currently, we support .parquet, and .csv formats. The example below shows connecting to an S3 bucket semaphor-dev.

The wildcard notation **/*.parquet instructs Semaphor to recursively retrieve all files in the root / prefix that end with .parquet.

Connection

Once the connection is successfully established, you can start analyzing your files like tables using SQL.

Card SQL
SELECT * from s3

PostgreSQL

You can connect to PostgreSQL using the connection string format shown below:

Connection String
postgresql+psycopg2://username:password@server:port/db
  • username: Your database user
  • password: Password of the user
  • server: Database server
  • port: Database port. Default is 5432 for PostgreSQL
  • db: Name of the database

MySQL

You can connect MySQL using a simple connection string as below:

Connection String
mysql+mysqlconnector://username:password@server:port/db
  • username: Your database user
  • password: Password of the user
  • server: Database server
  • port: Database port. Default is 3306 for MySQL
  • db: Name of the database

Microsoft SQL Server

You can connect MS-SQL using a simple connection string as below:

Connection String
mssql+pyodbc://username:password@server:port/db?ApplicationIntent=ReadOnly&driver=ODBC+Driver+17+for+SQL+Server
  • username: Your database user
  • password: Password of the user
  • server: Database server
  • port: Database port. Default is 1433 for MySQL
  • db: Name of the database

Google BigQuery

You can connect to BigQuery using the following connection string:

Connection String
bigquery://project/dataset
  • project: Your Google project
  • dataset: Dataset name

In addition, you will also need to provide a service account key (key.json). You can follow the below tutorial to generate a service account key.

Make sure the service account has the following permissions:

  • bigquery.datasets.get – Allows listing datasets in the project.
  • bigquery.datasets.list – Allows listing datasets the user has access to.
  • bigquery.projects.get – Allows getting project metadata.

Global filters with BigQuery

Please be aware that BigQuery has a quirk in that it doesn't accept the fully-qualified-column-name in the WHERE clause. So if you provide a query like below, and filter by country = 'US'.

Card SQL
SELECT * FROM my_dataset.my_table {{ filters | where }}

Semaphor generates the where clause as below:

⚠️

SELECT * FROM my_dataset.my_table WHERE my_dataset.my_table.country IN ( 'US' )

The above query throws an error in BigQuery. For all other databases this is a valid syntax.

As a workaround, you need to alias the table when you access it the first time

Card SQL
SELECT * FROM my_dataset.my_table as t {{ filters | where }}

For the above query, Semaphor will resolve the filters to the alias t and generate the following query:

Generated Query
SELECT * FROM my_dataset.my_table as t WHERE t.country IN ('US')

This is a valid syntax in BigQuery and will run without errors.



Snowflake

You can connect Snowflake using a simple connection string as below:

Connection String
snowflake://username:password@organization-account/db
  • username: Your database user
  • password: Password of the user
  • organization-account: Your Snowflake organization and account separated by a dash
  • db: Name of the database

You can get the your organization and account from your Snowflake url: https://app.snowflake.com/organization/account

You can also get your current account using the following SQL command:

  SELECT CURRENT_ACCOUNT();