Ingest Athena data into Port via Meltano, S3 and webhook
This guide will demonstrate how to ingest Athena into Port using Meltano, S3 and a webhook integration.
S3 integrations lack some of the features (such as reconciliation) found in Ocean or other Port integration solutions.
As a result, if a record ingested during the initial sync is later deleted in the data source, there’s no automatic mechanism to remove it from Port. The record simply won’t appear in future syncs, but it will remain in Port indefinitely.
If the data includes a flag for deleted records (e.g., is_deleted: "true"), you can configure a webhook delete operation in your webhook’s mapping configuration to remove these records from Port automatically.
Prerequisites
-
Ensure you have a Port account and have completed the onboarding process.
-
This feature is part of Port's limited-access offering. To obtain the required S3 bucket, please contact our team directly via chat, Slack, or e-mail, and we will create and manage the bucket on your behalf.
-
Access to an available Meltano app - for reference, follow the quick start guide, or follow the following steps:
- shell
-
Install python3
brew install python3
-
Create a python virtual env:
python -m venv .venv
source .venv/bin/activate -
Install meltano & follow installation instructions
pip install meltano
-
Change to meltano project
cd <name_of_project>
- Access to AWS credentials with query access to your account's Athena - follow AWS guide for security management in Athena.
Data model setup
Add Blueprints
Since Athena is a data source with dynamic schema, this guide cannot include the target blueprints for your use-case in advance. You will need to create the target blueprints to replicate the data schema as is OR add some tranformations in the target schema in Port.
Once you have decided on the desired blueprints you wish to set up, you can refer to the blueprint creation docs to set them up in your account.
Create webhook integration
Since Athena is a data source with dynamic schema, this guide cannot include the mapping configuration for your use-case in advance. Once you have decided on the mappings you wish to set up, you can refer to the webhook creation docs to set them up in your portal.
It is important that you use the generated webhook URL when setting up the Connection, otherwise the data will not be automatically ingested into Port from S3.
Meltano Setup
Refer to this GitHub Repository to view examples and prepared code sample for this integration.
Set up S3 Destination
If you haven't already set up S3 Destination for Port S3, follow these steps:
Meltano provides detailed documentation on how to generate/receive the appropriate credentials to set the s3-target loader. Once the appropriate credentials are prepared, you may set up the meltano extractor:
- shell
-
Navigate to your meltano environment:
cd path/to/your/meltano/project/
-
Install the source plugin you wish to extract data from:
meltano add loader target-s3
-
Configure the plugin using the interactive CLI prompt:
meltano config target-s3 set --interactive
Or set the configuration parameters individually using the CLI:
# required
meltano config target-s3 set cloud_provider.aws.aws_access_key_id $AWS_ACCESS_KEY_ID
meltano config target-s3 set cloud_provider.aws.aws_secret_access_key $AWS_SECRET_ACCESS_KEY
meltano config target-s3 set cloud_provider.aws.aws_bucket $AWS_BUCKET
meltano config target-s3 set cloud_provider.aws.aws_region $AWS_REGION
# recommended
meltano config target-s3 set append_date_to_filename_grain microsecond
meltano config target-s3 set partition_name_enabled true
meltano config target-s3 set prefix 'data/'
Set up Athena Connection
- Install and configure a Athena extractor, for more information go to: Athena extractor.
Add the tap-ahena extractor to your project using meltano add :
meltano add extractor tap-athena
Configure the tap-Athena settings using meltano config :
meltano config tap-athena set --interactive
Test that extractor settings are valid using meltano config :
meltano config tap-athena test
Optional:
Since Athena is a data source with dynamic catalog, you can use the built-in with the discover
capability, which lets you extract the stream catalog:
meltano invoke tap-athena --discover > extract/athena-catalog.json
This will enable you to manually alter the catalog file to manage stream selection.
A common use-case, for example, is to limit the catalog to a specific schema, using jq
:
jq '{streams: [.streams[] | select(.tap_stream_id | startswith("<SCHEMA_NAME>-"))]}' extract/athena-catalog.json > extract/athena-filtered-catalog.json
And setting the loader to use this catalog in the configuration file using the catalog
extra field, for example:
- name: tap-athena
variant: meltanolabs
pip_url: git+https://github.com/MeltanoLabs/tap-athena.git
catalog: extract/athena-filtered-catalog.json
-
Create a specific target-s3 loader for the webhook you created, and enter the Webhook URL you have copied when setting up the webhook as the part of the
prefix
configuration field, for example: "data/wSLvwtI1LFwQzXXX
".meltano add loader target-s3--athenaintegration --inherit-from target-s3
meltano config target-s3--athenaintegration set prefix data/<WEBHOOK_URL>
meltano config target-s3--athenaintegration set format format_type jsonl -
Run the connection:
meltano el tap-athena target-s3--athenaintegration