Amazon Open Sources Python Library for AWS Glue
Written by Kay Ewbank   
Tuesday, 04 June 2019

Amazon has open-sourced a Python library known as Athena Glue Service Logs (AGSlogger) that makes it easier to parse log formats into AWS Glue for analysis and is intended for use with AWS service logs.

Organizations that use Amazon Simple Storage Service (S3) for storing logs often want to query the logs using Amazon Athena, a serverless query engine for data on S3. Amazon says that many customers use Athena to query logs for service and application troubleshooting, performance analysis, and security audits.

athena

The newly open-sourced Python library, Athena Glue Service Logs (AGSlogger), has predefined templates for parsing and optimizing a variety of popular log formats. AGSLogger lets you define schemas, manage partitions, and transform data as part of an extract, transform, load (ETL) job in AWS Glue. The idea is that developers will be able to use the library with AWS Glue ETL jobs to give you a common framework for processing log data.

The library is designed to do an initial conversion of AWS Service logs, then keep converting logs as they are delivered to S3. While it is possible to query the logs in place using Athena, for cost and performance reasons it can be better to convert the logs into partitioned Parquet files. The library has Glue Jobs for a number of types of service log that will create the source and destination tables, convert the source data to partitioned Parquet files, and maintain new partitions for the source and destination tables.

The library supports a number of log types:

  • Application Load Balancer
  • Classic Load Balancer
  • AWS CloudTrail
  • Amazon CloudFront
  • S3 Access
  • Amazon VPC Flow

Once converted from row-based log files to columnar-based Parquet, the data can be queried using Athena. Apache Parquet is an open-source column-oriented storage format originally developed for Apache Hadoop, but now more widely used .

 athena

More Information

Athena Glue Service Logs On GitHub

Related Articles

Athena Query Alterer Open Sourced

Databricks Delta Adds Faster Parquet Import

New AWS Services

AWS Improvements For Developers

To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.

Banner


Improved Code Completion With JetBrains Mellum
29/10/2024

JetBrains has launched Mellum, a proprietary large language model specifically built for coding. Currently available only with JetBrains AI Assistant, Mellum is claimed to provide faster, sm [ ... ]



52nd Mersenne Prime Found
27/10/2024

It has been nearly six years since the last Mersenne prime was discovered. Now, at last, we have Mersenne prime number 52 and it has 41,024,320 digits!


More News

espbook

 

Comments




or email your comment to: comments@i-programmer.info

Last Updated ( Tuesday, 04 June 2019 )