AWS Glue 4 Adds Pandas Support
Written by Kay Ewbank   
Thursday, 01 December 2022

AWS Glue has been updated with updated engines and support for Pandas. AWS Glue is a serverless data integration service that Amazon says makes it easier to discover, prepare, move, and integrate data from multiple sources for analytics, machine learning and application development.

Glue includes a collection of libraries, engines, and tools developed by the open source community. AWS Glue consists of a Data Catalog which is a central metadata repository; an ETL engine that can automatically generate Scala or Python code; a flexible scheduler that handles dependency resolution, job monitoring, and retries; and AWS Glue DataBrew for cleaning and normalizing data with a visual interface.

awslogo

Glue 4 includes AWS Glue Studio, a new graphical interface that makes it easy to create, run, and monitor extract, transform, and load jobs. The Studio can be used to visually compose data transformation workflows for running on AWS Glue’s Apache Spark-based serverless ETL engine.

glue

The Pandas support means Python developers can use Pandas data analysis and manipulation facilities. The new version of Glue also has updated versions of the Spark and Python engines, Python 3.10 and Apache Spark 3.3.0. Both engines include bug fixes and performance enhancements; Spark includes new features such as row-level runtime filtering and additional built-in functions. Glue and Amazon EMR make use of the same optimized Spark runtime, which the Glue team says has been optimized to run in the AWS cloud and can be two to three times faster than the basic open source version.

Glue 4.0 also adds native support for the Cloud Shuffle Service Plugin for Spark to help scale disk usage, and Adaptive Query Execution to dynamically optimize queries as they run.

Another improvement to the new release is the addition of support for more data formats. Glue now has support for Apache Hudi, Apache Iceberg, and Delta Lake. It also now includes the Parquet vectorized reader, with support for additional data types and encodings. It has been upgraded to use log4j 2 and is no longer dependent on log4j 1.

awslogo

More Information

Amazon Glue Webpage

Related Articles

Amazon Announces AWS Visual Embedding

Amazon Launches AWS Workflow Studio

Amazon Releases Data IDE, Meet EMR Studio

Amazon AWS Invests In Rust

To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.

Banner


SourceBuddy Brings Eval To Java
23/01/2023

SourceBuddy is a Java library that compiles and loads dynamically generated Java source code. This has the advantage of providing Java with an eval facility such as those found in interpreted lan [ ... ]



CodinGame Findings On Hiring And Getting Hired
25/01/2023

More than one in two developers is considering job hopping with the next year while talent retention has become the top issue on  technical hiring companies' priority list. These are among the fi [ ... ]


More News

picobook

 



 

Comments




or email your comment to: comments@i-programmer.info

Last Updated ( Thursday, 01 December 2022 )