AWS Glue 4 Adds Pandas Support
Written by Kay Ewbank   
Thursday, 01 December 2022

AWS Glue has been updated with updated engines and support for Pandas. AWS Glue is a serverless data integration service that Amazon says makes it easier to discover, prepare, move, and integrate data from multiple sources for analytics, machine learning and application development.

Glue includes a collection of libraries, engines, and tools developed by the open source community. AWS Glue consists of a Data Catalog which is a central metadata repository; an ETL engine that can automatically generate Scala or Python code; a flexible scheduler that handles dependency resolution, job monitoring, and retries; and AWS Glue DataBrew for cleaning and normalizing data with a visual interface.

awslogo

Glue 4 includes AWS Glue Studio, a new graphical interface that makes it easy to create, run, and monitor extract, transform, and load jobs. The Studio can be used to visually compose data transformation workflows for running on AWS Glue’s Apache Spark-based serverless ETL engine.

glue

The Pandas support means Python developers can use Pandas data analysis and manipulation facilities. The new version of Glue also has updated versions of the Spark and Python engines, Python 3.10 and Apache Spark 3.3.0. Both engines include bug fixes and performance enhancements; Spark includes new features such as row-level runtime filtering and additional built-in functions. Glue and Amazon EMR make use of the same optimized Spark runtime, which the Glue team says has been optimized to run in the AWS cloud and can be two to three times faster than the basic open source version.

Glue 4.0 also adds native support for the Cloud Shuffle Service Plugin for Spark to help scale disk usage, and Adaptive Query Execution to dynamically optimize queries as they run.

Another improvement to the new release is the addition of support for more data formats. Glue now has support for Apache Hudi, Apache Iceberg, and Delta Lake. It also now includes the Parquet vectorized reader, with support for additional data types and encodings. It has been upgraded to use log4j 2 and is no longer dependent on log4j 1.

awslogo

More Information

Amazon Glue Webpage

Related Articles

Amazon Announces AWS Visual Embedding

Amazon Launches AWS Workflow Studio

Amazon Releases Data IDE, Meet EMR Studio

Amazon AWS Invests In Rust

To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.

Banner


Apollo Adds REST APIs For GraphQL
29/10/2024

Apollo has added a simpler way to integrate REST APIs into a federated GraphQL environment. Available now in public preview, can be used to map REST API endpoints to their GraphQL schema using a decla [ ... ]



Mastering LLMs With Experts
22/10/2024

A freely available set of workshops and talks on the essentials of LLMs, taught by practitioners. The topics include Evals, Retrieval-augmented-generation (RAG), Fine-tuning etc.


More News

espbook

 

Comments




or email your comment to: comments@i-programmer.info

Last Updated ( Thursday, 01 December 2022 )