Apache Gravitino 0.9 Released
Written by Alex Denham   
Thursday, 29 May 2025

Apache Gravitino v0.9.0-incubating has been released, with optimizations to the fileset catalogs and model catalogs, making it easier for users to manage their unstructured AI data and model data.

Apache Gravitino is a high-performance, geo-distributed, and federated metadata lake. By using a technical data catalog and metadata lake, users can manage access and perform data governance for multiple data sources including filestores, relational databases, and event streams, while safely using multiple engines like Spark, Trino, or Flink on multiple formats on different cloud providers.

gravitino

It provides users with unified metadata access for data and AI assets, and the developers describe it as the world's most powerful open data catalog for building a high-performance, geo-distributed and federated metadata lake. 

Gravitino provides several key features, starting with unified metadata management through a unified model and API to manage different types of metadata, including relational data such as Hive and MySQL; and and file-based metadata sources such as HDFS and S3.

gravitino meta

It provides a  governance layer for managing metadata with features like access control, auditing, and discovery; and connects directly to metadata sources via connectors, ensuring changes are instantly reflected between Gravitino and the underlying systems.

Gravitino can be deployed across multiple regions or clouds, allowing instances to share metadata for a global cross-region view. It supports query engines enabling metadata access without modifying SQL dialects, and is expanding to manage both data and AI assets, with support for AI models and features currently in development.

The improvements to the new release start with the model and fileset catalogs. Until now, the model catalog was immutable, so inflexible. Users can now alter models and model versions and add tags. Meanwhile, in the fileset catalog, Gravitino now supports multiple named storage locations within a single fileset and placeholder-based path generation. There's also new multiple location support, meaning users can reference data across different file systems such as HDFS, S3, and GCS through a unified fileset interface, each with a unique location name. The developers say the enhancements significantly improve the flexibility for multi-cloud environments and complex data organization patterns while maintaining a clean abstraction layer for data assets management. 

The Gravitino Virtual File System (GVFS) now supports accessing multiple locations within filesets, and has been refactored with a pluggable architecture allowing custom operations and hooks. 

Gravitino 0.9 is available now on GitHub.  

 

gravitino

More Information

Gravitino Website

Gravitino On GitHub

Related Articles

Apache Spark Now Understands English

Spark 3 Improves Python and SQL Support

DataBricks Open Sources All Of Delta Lake

Databricks Delta Lake Now Open Source

Apache Hive Adds Support For Set Operations

To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.

Banner


Why Drone Shows Are Booming
04/07/2025

What do you need to make a celebration noteworthy? You may automatically think fireworks, especially for Independence Day, but an increasing number of celebrations are turning to drone shows instead.& [ ... ]



SQL and Assembly Language In Decline In TIOBE Index
09/06/2025

This month's TIOBE Index is out and the headline asks the question "Where is SQL Going?" Its chart provides the answer of "down", a fate that also applied to Assembly Language. The language that is mo [ ... ]


More News

pico book

 

Comments




or email your comment to: comments@i-programmer.info