Athena Query Alterer Open Sourced
Written by Alex Denham   
Tuesday, 12 February 2019

A tool that alerts you if users are running expensive queries on Amazon Athena query engine has been open sourced by the developers.

The Athena Alerter was developed by a team at Fandom Engineering, and is designed to overcome the problem that the AWS Athena big data query engine is easy to use, but can also easily run up large bills as a consequence of its simplicity.

fandom

 

It is possible to write less expensive queries by using partitions to limit the amount of data being accessed, but that requires the query writer to remember to use a partition and to understand how expensive a question could be. This is particularly the case when working in an external tool.

Because of the potential to spend money on Athena so quickly, the development team at Fandom Engineering developed Athena Alerter. This is an open source set of lambda functions designed to work together to track which queries are run, how much data do they scan (which directly maps to costs) and notify users when they run costly queries.

Because Fandom Engineering uses Slack for internal communication, Athena Alerter notifies users by sending Slack messages. However the developers point out that given the very modular nature of the tool, it’s easy to adjust the notification function to use a different mechanism.

Internally, the alerting tool uses Cloudtrail, Lambda, DynamoDB, SQS, and S3, and is set up using a CloudFormation‎ script which will create all the AWS components that are needed for you. The user needs to provide their specific configuration and then use the provided makefile.

To process the information about Athena queries, the tool first processes Cloudtrail logs to learn who started which query, then uses the Athena API to track the query and get information about the amount of data scanned. This informatin is then pushed to DynamoDB and SQS, and users are notified.

athena alterter

 

At its heart the tool consist of three lambda functions: 

  • cloudtrail_handler — this function processes cloudtrail logs and adds entries to the DynamoDB table. At this stage the function provides query, executing user, start time and execution id.
  • usage_update — this function runs every minute, takes queries that are in “Running” state and updates information about amount of scanned data. Note that athena api does not provide information about who's executing the query, hence the tool relies on cloudtrail for that. When a query execution finishes a SQS event is generated
  • notification — this function runs for each sqs event, checks whether the amount of data scanned exceeded the notification threshold and if so, generates a slack message. If you want to process the data scanned information differently, this function can be easily replaced with your own implementation.

Athena Alterer is available on Github.

fandom 

 

More Information

Athena Alerter on GitHub

Related Articles

 AWS Lambda For The Impatient

Amazon Glacier Select Analyzes Archived Data

New AWS Services

AWS Improvements For Developers

To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.

Banner


Microsoft Introduces Vector Abstractions Library For .NET
21/11/2024

Microsoft has announced a preview release of the Microsoft Extensions VectorData Abstractions library, which can be used to help integrate vector stores into .NET applications and libraries.



AI Breakthrough For Robot Surgery
17/11/2024

Using imitation learning, a robot has learned to perform surgical procedures as skillfully as human surgeons, bringing the field of robotic surgery closer to true autonomy.


More News

espbook

 

Comments




or email your comment to: comments@i-programmer.info

Last Updated ( Tuesday, 12 February 2019 )