Amazon Releases AWS Glue 5 |
Written by Kay Ewbank | |||
Monday, 10 February 2025 | |||
Amazon has announced the general availability of AWS Glue 5.0, with improved performance, enhanced security, and support for Amazon Sagemaker Unified Studio and Sagemaker Lakehouse. AWS Glue is a serverless data integration service that Amazon says makes it easier to discover, prepare, move, and integrate data from multiple sources for analytics, machine learning and application development. Glue includes a collection of libraries, engines, and tools developed by the open source community. AWS Glue consists of a Data Catalog which is a central metadata repository; an ETL engine that can automatically generate Scala or Python code; a flexible scheduler that handles dependency resolution, job monitoring, and retries; and AWS Glue DataBrew for cleaning and normalizing data with a visual interface. The performance and security improvements to AWS Glue 5.0 come largely from upgrades to the engine to Apache Spark 3.5.2, Python 3.11, and Java 17. Amazon says that Glue 5.0 uses the AWS performance optimized Spark runtime, which they say is 3.9 times faster than open source Spark. This and other changes means Glue 5.0 is 32% faster than AWS Glue 4.0 and reduces costs by 22%. Glue 5.0 also updates its open table format support to Apache Hudi 0.15.0, Apache Iceberg 1.6.1, and Delta Lake 3.2.0. This means users get stronger tools for improving performance, cost, governance, and privacy in their data lakes. AWS Glue 5.0 also adds Spark native fine grained access control with AWS Lake Formation, meaning users can apply table, column, row, and cell level permissions on Amazon S3 data lakes. Glue 5.0 also adds support for Sagemaker Lakehouse. This means organizations can unify their data across Amazon S3 data lakes and Amazon Redshift data warehouses. SageMaker Lakehouse lets customer unity all their data across Amazon Simple Storage Service (Amazon S3) data lakes and Amazon Redshift data warehouses. Its aim is to let organizations build analytics and AI/ML applications on a single copy of data. SageMaker Lakehouse can also be used to access and query data in-place with all Apache Iceberg–compatible tools and engines. AWS Glue 5 is available now. More InformationRelated ArticlesAmazon Announces AWS Glue Data Quality AWS Glue 4 Adds Pandas Support Amazon Open Sources Python Library for AWS Glue To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.
Comments
or email your comment to: comments@i-programmer.info |