Skip to main content

Check out Port for yourself 

Ingest Athena data into Port via Meltano, S3 and webhook

This guide will demonstrate how to ingest Athena into Port using Meltano, S3 and a webhook integration.

Disclaimer

S3 integrations lack some of the features (such as reconciliation) found in Ocean or other Port integration solutions.

As a result, if a record ingested during the initial sync is later deleted in the data source, there’s no automatic mechanism to remove it from Port. The record simply won’t appear in future syncs, but it will remain in Port indefinitely.

If the data includes a flag for deleted records (e.g., is_deleted: "true"), you can configure a webhook delete operation in your webhook’s mapping configuration to remove these records from Port automatically.

Prerequisites

  • Ensure you have a Port account and have completed the onboarding process.

  • This feature is part of Port's limited-access offering. To obtain the required S3 bucket, please contact our team directly via chat, Slack, or e-mail, and we will create and manage the bucket on your behalf.

  • Access to an available Meltano app - for reference, follow the quick start guide, or follow the following steps:

  1. Install python3

    brew install python3
  2. Create a python virtual env:

    python -m venv .venv
    source .venv/bin/activate
  3. Install meltano & follow installation instructions

    pip install meltano
  4. Change to meltano project

    cd <name_of_project>

Data model setup

Add Blueprints

Since Athena is a data source with dynamic schema, this guide cannot include the target blueprints for your use-case in advance. You will need to create the target blueprints to replicate the data schema as is OR add some tranformations in the target schema in Port.

Once you have decided on the desired blueprints you wish to set up, you can refer to the blueprint creation docs to set them up in your account.

Create webhook integration

Since Athena is a data source with dynamic schema, this guide cannot include the mapping configuration for your use-case in advance. Once you have decided on the mappings you wish to set up, you can refer to the webhook creation docs to set them up in your portal.

Important

It is important that you use the generated webhook URL when setting up the Connection, otherwise the data will not be automatically ingested into Port from S3.

Meltano Setup

Recommended

Refer to this GitHub Repository to view examples and prepared code sample for this integration.

Set up S3 Destination

If you haven't already set up S3 Destination for Port S3, follow these steps:

Meltano provides detailed documentation on how to generate/receive the appropriate credentials to set the s3-target loader. Once the appropriate credentials are prepared, you may set up the meltano extractor:

  1. Navigate to your meltano environment:

    cd path/to/your/meltano/project/
  2. Install the source plugin you wish to extract data from:

    meltano add loader target-s3
  3. Configure the plugin using the interactive CLI prompt:

    meltano config target-s3 set --interactive

    Or set the configuration parameters individually using the CLI:

    # required
    meltano config target-s3 set cloud_provider.aws.aws_access_key_id $AWS_ACCESS_KEY_ID
    meltano config target-s3 set cloud_provider.aws.aws_secret_access_key $AWS_SECRET_ACCESS_KEY
    meltano config target-s3 set cloud_provider.aws.aws_bucket $AWS_BUCKET
    meltano config target-s3 set cloud_provider.aws.aws_region $AWS_REGION
    # recommended
    meltano config target-s3 set append_date_to_filename_grain microsecond
    meltano config target-s3 set partition_name_enabled true
    meltano config target-s3 set prefix 'data/'

Set up Athena Connection

  1. Install and configure a Athena extractor, for more information go to: Athena extractor.

Add the tap-ahena extractor to your project using meltano add :

meltano add extractor tap-athena

Configure the tap-Athena settings using meltano config :

meltano config tap-athena set --interactive

Test that extractor settings are valid using meltano config :

meltano config tap-athena test

Optional: Since Athena is a data source with dynamic catalog, you can use the built-in with the discover capability, which lets you extract the stream catalog:

meltano invoke tap-athena --discover > extract/athena-catalog.json

This will enable you to manually alter the catalog file to manage stream selection. A common use-case, for example, is to limit the catalog to a specific schema, using jq:

jq '{streams: [.streams[] | select(.tap_stream_id | startswith("<SCHEMA_NAME>-"))]}' extract/athena-catalog.json > extract/athena-filtered-catalog.json

And setting the loader to use this catalog in the configuration file using the catalog extra field, for example:

  - name: tap-athena
variant: meltanolabs
pip_url: git+https://github.com/MeltanoLabs/tap-athena.git
catalog: extract/athena-filtered-catalog.json
  1. Create a specific target-s3 loader for the webhook you created, and enter the Webhook URL you have copied when setting up the webhook as the part of the prefix configuration field, for example: "data/wSLvwtI1LFwQzXXX".

    meltano add loader target-s3--athenaintegration --inherit-from target-s3
    meltano config target-s3--athenaintegration set prefix data/<WEBHOOK_URL>
    meltano config target-s3--athenaintegration set format format_type jsonl
  2. Run the connection:

    meltano el tap-athena target-s3--athenaintegration

Additional relevant guides