Connecting to your data
Semaphor allows you to connect to a wide array of data sources, from databases and files to live APIs. If there's a custom data source you would like to connect to, feel free to send a request to support@semaphor.cloud.
Amazon S3
Setting Up a Cross-Account IAM Role for Semaphor Access
1) Create a Cross-Account IAM Role
- Create a
Cross-account IAM role
. Please see the AWS docs (opens in a new tab) for instructions.
2) Assign Permissions to the IAM Role
- In the AWS Console, navigate to the IAM role's
Permissions
tab. - Assign a policy to the role that grants access to your S3 bucket. The resource in the policy should specify your S3 bucket.
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": ["s3:GetObject", "s3:ListBucket"],
"Resource": ["arn:aws:s3:::my-s3-bucket", "arn:aws:s3:::my-s3-bucket/*"]
}
]
}
3) Define Trust Relationships
- Go to the
Trust relationships
tab of the IAM role. - Define a trust relationship that allows Semaphor to access your S3 bucket. You can copy and paste the trust relationship stub from the Semaphor console.
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::756808986636:root"
},
"Action": "sts:AssumeRole",
"Condition": {
"StringEquals": {
"sts:ExternalId": "my_project_id"
}
}
}
]
}
4) Use the Fully Qualified Role ARN
- After creating the role, use the fully qualified role ARN when setting up the connection in the Semaphor console. For example:
arn:aws:iam::756808986636:role/semaphor-access-role
. - Provide the
region
of your S3 bucket.
5) Verify the Connection
- Check the connection by clicking the ⚡️ button in the Semaphor console. A green ✓ mark indicates that the connection is successful.
Currently, we support .parquet
, and .csv
formats. The example below shows connecting to an S3 bucket semaphor-dev
.
The wildcard notation **/*.parquet
instructs Semaphor to recursively retrieve all files in the root /
prefix that end with .parquet
.
Once the connection is successfully established, you can start analyzing your files like tables using SQL.
SELECT * from s3
PostgreSQL
You can connect to PostgreSQL using the connection string format shown below:
postgresql+psycopg2://username:password@server:port/db
username
: Your database userpassword
: Password of the userserver
: Database serverport
: Database port. Default is5432
for PostgreSQLdb
: Name of the database
MySQL
You can connect MySQL using a simple connection string as below:
mysql+mysqlconnector://username:password@server:port/db
username
: Your database userpassword
: Password of the userserver
: Database serverport
: Database port. Default is3306
for MySQLdb
: Name of the database
Microsoft SQL Server
You can connect MS-SQL using a simple connection string as below:
mssql+pyodbc://username:password@server:port/db?ApplicationIntent=ReadOnly&driver=ODBC+Driver+17+for+SQL+Server
username
: Your database userpassword
: Password of the userserver
: Database serverport
: Database port. Default is1433
for MySQLdb
: Name of the database
Google BigQuery
You can connect to BigQuery using the following connection string:
bigquery://project/dataset
project
: Your Google projectdataset
: Dataset name
In addition, you will also need to provide a service account key (key.json). You can follow the below tutorial to generate a service account key.
Snowflake
You can connect Snowflake using a simple connection string as below:
snowflake://username:password@organization-account/db
username
: Your database userpassword
: Password of the userorganization-account
: Your Snowflake organization and account separated by a dashdb
: Name of the database
You can get the your organization and account from your Snowflake url:
https://app.snowflake.com/organization/account
You can also get your current account using the following SQL command:
SELECT CURRENT_ACCOUNT();