Amazon S3
Connect Semaphor to your Amazon S3 bucket
Prerequisites
Before proceeding, ensure you have the following:
- An AWS account
- A Semaphor account
- An S3 bucket with required data
Create a Cross-Account IAM Role
To allow Semaphor to access your S3 bucket securely, you need to create a cross-account IAM role.
- Follow the official official AWS guide on creating a cross-account IAM role.
- Ensure that the role trusts Semaphor’s AWS account (defined in step 3).
Assign Permissions to the IAM Role
Once the IAM role is created, you must grant it permissions to access your S3 bucket.
- Navigate to IAM in the AWS Console.
- Open the IAM role and go to the Permissions tab.
- Attach the following policy to grant Semaphor access to your bucket:
Replace my-s3-bucket
with the actual name of your S3 bucket.
Define Trust Relationships
Semaphor needs permission to assume the IAM role. To set this up:
- Go to the Trust relationships tab of the IAM role.
- Add the following Trust Policy to allow Semaphor’s AWS account to assume the role:
Replace my_project_id
with the actual project ID.
Configure the Connection in Semaphor
After setting up the IAM role:
- Copy the fully qualified role ARN from the AWS Console. Example:
In the Semaphor console, enter:
- The Role ARN
- The AWS region where your S3 bucket is located
Verify the Connection
To confirm that Semaphor can access your S3 bucket:
- Click the
⚡️ Test
Connection button in the Semaphor console. - If successful, you will see a green check mark (✓) indicating a valid connection.
Supported File Formats
Semaphor currently supports .parquet
and .csv
file formats.
You can configure Semaphor to scan your bucket using wildcard *
notation. The wildcard notation **/*.parquet
instructs Semaphor to recursively retrieve all files in the root /
prefix that end with .parquet
.
Analyze your data
Once the connection is established, you can start analyzing your S3 files as if they are database tables.