Amazon Announces AWS Glue Data Quality

Written by Kay Ewbank

Tuesday, 20 June 2023

Amazon has announced AWS Glue Data Quality, a new feature of AWS Glue that measures and monitors the data quality of Amazon Simple Storage Service (S3) based data lakes, data warehouses, and other data repositories.

AWS Glue is a serverless data integration service that Amazon says makes it easier to discover, prepare, move, and integrate data from multiple sources for analytics, machine learning and application development.

awslogo

Glue includes a collection of libraries, engines, and tools developed by the open source community. AWS Glue consists of a Data Catalog which is a central metadata repository; an ETL engine that can automatically generate Scala or Python code; a flexible scheduler that handles dependency resolution, job monitoring, and retries; and AWS Glue DataBrew for cleaning and normalizing data with a visual interface.

The new feature, AWS Glue Data Quality, is intended to make it easier to build data lakes in which the data is of high quality, rather than risking the data lake becoming a data swamp of untrustworthy information. Without tools, setting up data quality is time-consuming and difficult, and relies on the manual analysis of the data and the creation of data quality rules and code to alert administrators when the quality deteriorates.

Amazon says AWS Glue Data Quality reduces these manual quality efforts from days to hours. AWS Glue Data Quality automatically computes statistics, recommends quality rules, monitors, and alerts you when it detects that quality has deteriorated.

AWS Glue Data Quality automatically computes statistics for your datasets, then uses those statistics to recommend a set of quality rules that can check for freshness, accuracy, and integrity. You can adjust recommended rules, discard rules, or add new rules as needed. If it detects quality issues, AWS Glue Data Quality also alerts you so that you can act.

AWS Glue Data Quality rules can be applied to data at rest in your datasets and data lakes and to entire data pipelines where data is in motion. You can apply rules across multiple datasets, and for data pipelines built on AWS Glue Studio, you can apply a transform to evaluate the quality for the entire pipeline.

AWS Glue Data Quality can be accessed in the AWS Glue Data Catalog and in AWS Glue ETL jobs.

awslogo

More Information

Amazon Glue Webpage

AWS Glue 4 Adds Pandas Support

Amazon Open Sources Python Library for AWS Glue

Amazon Announces AWS Visual Embedding

Amazon Launches AWS Workflow Studio

To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.

The AI Scam
11/06/2025

AI is not a scam, but the doubters are going to doubt. The latest attack is based on the idea that if you understand how it all works then it should be clear to you that it is a scam. What are they mi [ ... ]

+ Full Story

GNU Nano 8.5 Enhances Anchor Positions
10/07/2025

GNU Nano 8.5 has been released with improved text anchors and fine-tuned syntax coloring. GNU nano is a command line text editor for Unix and Linux that aims to be simple and easy to use.

+ Full Story

More News

Comments

or email your comment to: comments@i-programmer.info

Recent Articles

Recent Book Reviews

Popular Articles

More Information

Related Articles

Comments