Ingest Athena data into Port via Meltano, S3 and webhook

This guide will demonstrate how to ingest Athena into Port using Meltano, S3 and a webhook integration.

Disclaimer

S3 integrations lack some of the features (such as reconciliation) found in Ocean or other Port integration solutions.

As a result, if a record ingested during the initial sync is later deleted in the data source, there’s no automatic mechanism to remove it from Port. The record simply won’t appear in future syncs, but it will remain in Port indefinitely.

If the data includes a flag for deleted records (e.g., is_deleted: "true"), you can configure a webhook delete operation in your webhook’s mapping configuration to remove these records from Port automatically.

Prerequisites

Ensure you have a Port account and have completed the onboarding process.
This feature is part of Port's limited-access offering. To obtain the required S3 bucket, please contact our team directly via chat, Slack, or e-mail, and we will create and manage the bucket on your behalf.
Access to an available Meltano app - for reference, follow the quick start guide, or follow the following steps:

shell

Install python3
```
brew install python3
```

Create a python virtual env:

python -m venv .venv
source .venv/bin/activate

Install meltano & follow installation instructions
```
pip install meltano
```
Change to meltano project
```
cd <name_of_project>
```

Access to AWS credentials with query access to your account's Athena - follow AWS guide for security management in Athena.

Data model setup

Add Blueprints

Since Athena is a data source with dynamic schema, this guide cannot include the target blueprints for your use-case in advance. You will need to create the target blueprints to replicate the data schema as is OR add some tranformations in the target schema in Port.

Once you have decided on the desired blueprints you wish to set up, you can refer to the blueprint creation docs to set them up in your account.

Create webhook integration

Since Athena is a data source with dynamic schema, this guide cannot include the mapping configuration for your use-case in advance. Once you have decided on the mappings you wish to set up, you can refer to the webhook creation docs to set them up in your portal.

Important

It is important that you use the generated webhook URL when setting up the Connection, otherwise the data will not be automatically ingested into Port from S3.

Meltano Setup

Recommended

Refer to this GitHub Repository to view examples and prepared code sample for this integration.

Set up S3 Destination

If you haven't already set up S3 Destination for Port S3, follow these steps:

Meltano provides detailed documentation on how to generate/receive the appropriate credentials to set the s3-target loader. Once the appropriate credentials are prepared, you may set up the meltano extractor:

shell

Navigate to your meltano environment:
```
cd path/to/your/meltano/project/
```
Install the source plugin you wish to extract data from:
```
meltano add loader target-s3
```

Configure the plugin using the interactive CLI prompt:

meltano config target-s3 set --interactive

Or set the configuration parameters individually using the CLI:

# required
meltano config target-s3 set cloud_provider.aws.aws_access_key_id $AWS_ACCESS_KEY_ID 
meltano config target-s3 set cloud_provider.aws.aws_secret_access_key $AWS_SECRET_ACCESS_KEY 
meltano config target-s3 set cloud_provider.aws.aws_bucket $AWS_BUCKET 
meltano config target-s3 set cloud_provider.aws.aws_region $AWS_REGION
# recommended
meltano config target-s3 set append_date_to_filename_grain microsecond
meltano config target-s3 set partition_name_enabled true
meltano config target-s3 set prefix 'data/'

Set up Athena Connection

Install and configure a Athena extractor, for more information go to: Athena extractor.

Add the tap-ahena extractor to your project using meltano add :

meltano add extractor tap-athena

Configure the tap-Athena settings using meltano config :

meltano config tap-athena set --interactive

Test that extractor settings are valid using meltano config :

meltano config tap-athena test

Optional: Since Athena is a data source with dynamic catalog, you can use the built-in with the discover capability, which lets you extract the stream catalog:

meltano invoke tap-athena --discover > extract/athena-catalog.json

This will enable you to manually alter the catalog file to manage stream selection. A common use-case, for example, is to limit the catalog to a specific schema, using jq:

jq '{streams: [.streams[] | select(.tap_stream_id | startswith("<SCHEMA_NAME>-"))]}' extract/athena-catalog.json > extract/athena-filtered-catalog.json

And setting the loader to use this catalog in the configuration file using the catalog extra field, for example:

  - name: tap-athena
    variant: meltanolabs
    pip_url: git+https://github.com/MeltanoLabs/tap-athena.git
    catalog: extract/athena-filtered-catalog.json

Create a specific target-s3 loader for the webhook you created, and enter the Webhook URL you have copied when setting up the webhook as the part of the prefix configuration field, for example: "data/wSLvwtI1LFwQzXXX".

meltano add loader target-s3--athenaintegration --inherit-from target-s3
meltano config target-s3--athenaintegration set prefix data/<WEBHOOK_URL>
meltano config target-s3--athenaintegration set format format_type jsonl

Run the connection:

meltano el tap-athena target-s3--athenaintegration

Prerequisites​

Data model setup​

Add Blueprints​

Create webhook integration​

Meltano Setup​

Set up S3 Destination​

Set up Athena Connection​

Additional relevant guides​