Snowflake Support For Apache Iceberg Goes GA
Written by Nikos Vaggalis   
Thursday, 29 August 2024

Snowflake has added support for the Iceberg table format and subsequently became able to work with data commonly found in data lakes and warehouses.

Lately there's a lot of talk around Apache Iceberg. What makes it so special?

Enterprises frequently go beyond relational data stores also hosting data on object stores suitable for their data lakes. If you use Amazon S3 as the underlying object store you can store virtually any amount of data on it all the way to exabytes. Iceberg then is an open table format specification that enables S3 data to be queried like SQL tables.

It's important to nothe that Iceberg is not a query or storage engine, it's a specification. In place of the query engine put Snowflake. Iceberg allows Snowflake to work on those files with:

  • ACID (atomicity, consistency, isolation, durability) transactions
  • Schema evolution
  • Hidden partitioning
  • Table snapshots

An example from Postgres, which natively cannot work with such formats, is the pg_lakehouse extension that enables Postgres to work with Iceberg by assuming the role of DuckDB. DuckDB is, of course, the alternative to SQLite for analytical workloads; local first, embeddable and suitable for data science work.

With pg_lake PostgreSQL is powered up with those high performance analytical query engine capabilities too.
Queries are pushed down to DuckDB for processing data like events, metrics, historical snapshots, vendor data, but that's one part of the equation, the query engine. The other is being able to fetch those foreign object stores like S3 and table formats like Iceberg or Delta Lake.

We first met Snowflake in 2022, see Snowflake Improves Developer Support. Now it's Snowflake's turn to turn to Iceberg too. It does so by treating Iceberg compatible files as Snowflake tables and provide capabilities to interact directly with the underlying data. The Iceberg tables combine the performance and query semantics of regular Snowflake tables with external cloud storage that customers manage. As such they're deemed ideal for existing data lakes that customers cannot, or choose not to, store in Snowflake; Snowflake then connects to your storage location using an external volume, and Iceberg tables incur no Snowflake storage costs.

To create an Iceberg Table first you create an external volume which you reference in the table's CREATE statement.

CREATE OR REPLACE ICEBERG TABLE customer_iceberg (
c_custkey INTEGER,
c_name STRING,
c_address STRING,
c_nationkey INTEGER,
c_phone STRING,
c_acctbal INTEGER,
c_mktsegment STRING,
c_comment STRING
)
CATALOG='SNOWFLAKE'
EXTERNAL_VOLUME='iceberg_lab_vol'
BASE_LOCATION='';

After that you can perform Sql DML operations on it.

While the addition of Iceberg is new, there's a lot of work going on as laid out by the roadmap ahead:

  • Polaris Catalog integration: An open source Apache Iceberg catalog, based on an open REST API implementation, will continue to dissolve data silos.
  • Deeper OneLake integration: Fabric OneLake will use Iceberg to provide bidirectional access.
  • Easier batch and streaming pipelines for Iceberg: Dynamic Tables are a hugely popular capability in the Snowflake platform. Supporting Iceberg as a storage format for Dynamic Tables will simplify data processing for data lakes.
  • Streamlined catalog integration: Automatically refreshed Iceberg tables simplifies and streamlines Snowflake’s integration with externally managed Iceberg tables.
  • Flexible sources: Friction-free solutions are key for you to get started with your Snowflake Iceberg table experience. The “direct” offerings for both Parquet and Delta Lake (currently in private preview) offer the ability to access data in place, without having to load the data into Snowflake.

To conclude, Iceberg use is increasing, with vendors integrating it or planning to intergrate into their products. For Snowflake in particular, Iceberg support is handy since some organizations with regulatory or other constraints either are not able to store all of their data in Snowflake or prefer to store data externally in open formats.

And, Snowflake may be a good choice for those who already use the platform or those looking for a fully managed query engine.

snowflakelogo

More Information

Open, Interoperable Storage with Iceberg Tables, Now Generally Available

Related Articles

Snowflake Improves Developer Support
Pg_lakehouse Makes PostgreSQL Quack

 

To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.

Banner


Can You Solve The GCHQ Christmas Challenge 2024
20/12/2024

The GCHQ Christmas Challenge has become a pre-Christmas tradition. While it is primarily targeted at school students working in teams, GCHQ encourages both children and adults to give it a try.



Copilot Improves Code Quality
27/11/2024

Findings from GitHub show that code authored with Copilot has increased functionality and improved readability, is of better quality, and receives higher approval rates than code authored without it.

 [ ... ]


More News

espbook

 

Comments




or email your comment to: comments@i-programmer.info

Last Updated ( Wednesday, 04 September 2024 )