Athena Query Alterer Open Sourced
Written by Alex Denham   
Tuesday, 12 February 2019

A tool that alerts you if users are running expensive queries on Amazon Athena query engine has been open sourced by the developers.

The Athena Alerter was developed by a team at Fandom Engineering, and is designed to overcome the problem that the AWS Athena big data query engine is easy to use, but can also easily run up large bills as a consequence of its simplicity.

fandom

 

It is possible to write less expensive queries by using partitions to limit the amount of data being accessed, but that requires the query writer to remember to use a partition and to understand how expensive a question could be. This is particularly the case when working in an external tool.

Because of the potential to spend money on Athena so quickly, the development team at Fandom Engineering developed Athena Alerter. This is an open source set of lambda functions designed to work together to track which queries are run, how much data do they scan (which directly maps to costs) and notify users when they run costly queries.

Because Fandom Engineering uses Slack for internal communication, Athena Alerter notifies users by sending Slack messages. However the developers point out that given the very modular nature of the tool, it’s easy to adjust the notification function to use a different mechanism.

Internally, the alerting tool uses Cloudtrail, Lambda, DynamoDB, SQS, and S3, and is set up using a CloudFormation‎ script which will create all the AWS components that are needed for you. The user needs to provide their specific configuration and then use the provided makefile.

To process the information about Athena queries, the tool first processes Cloudtrail logs to learn who started which query, then uses the Athena API to track the query and get information about the amount of data scanned. This informatin is then pushed to DynamoDB and SQS, and users are notified.

athena alterter

 

At its heart the tool consist of three lambda functions: 

  • cloudtrail_handler — this function processes cloudtrail logs and adds entries to the DynamoDB table. At this stage the function provides query, executing user, start time and execution id.
  • usage_update — this function runs every minute, takes queries that are in “Running” state and updates information about amount of scanned data. Note that athena api does not provide information about who's executing the query, hence the tool relies on cloudtrail for that. When a query execution finishes a SQS event is generated
  • notification — this function runs for each sqs event, checks whether the amount of data scanned exceeded the notification threshold and if so, generates a slack message. If you want to process the data scanned information differently, this function can be easily replaced with your own implementation.

Athena Alterer is available on Github.

fandom 

 

More Information

Athena Alerter on GitHub

Related Articles

 AWS Lambda For The Impatient

Amazon Glacier Select Analyzes Archived Data

New AWS Services

AWS Improvements For Developers

To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.

Banner


PlanetScale Gets Into Vector Search
02/12/2024

PlanetScale, the cloud MySQL-compatible database with advanced scaling capabilities, is now upgraded with vector storage and search.



Meta Releases OpenSource Podcast Generating Tool
28/11/2024

Meta has released an open source project that can be used to automatically convert a PDF file into a podcast. Meta says Notebook Llama can be considered an open-source version of Google's NotebookLM.

 [ ... ]


More News

espbook

 

Comments




or email your comment to: comments@i-programmer.info

Last Updated ( Tuesday, 12 February 2019 )