DataBricks Open Sources All Of Delta Lake |
Written by Kay Ewbank | |||
Thursday, 07 July 2022 | |||
Databricks has now made all of Delta Lake open source, including all the APIs. The storage layer of the product was made open source in 2019. Delta Lake can be used to build data lakehouses, which enable data warehousing and machine learning directly on the data lake. Delta Lake handles the stage where data is brought into an organization's data lake. It stores data in Apache Parquet format, and is designed for use in data lakes that are built on HDFS and cloud storage. Databricks was created as a company by the original developers of Apache Spark and specializes in commercial technologies that make use of Spark. Delta Lake is a unified analytics engine and associated table format built on top of Apache Spark, and until it was made open source was only available as part of Databricks Delta, the company's proprietary stack. Since the storage layer wasy made open source, the project has attracted over 190 contributors across more than 70 organizations, nearly two-thirds of whom are from outside Databricks, including contributors from companies including Apple, IBM, Microsoft, Disney, Amazon, and eBay. Delta Lake comes with standalone readers/writers that lets any Python, Ruby, or Rust client write data directly to Delta Lake without requiring any big data engine such as Apache Spark, along with open-source connectors, including Apache Flink, Presto, and Trino. The open source announcement opens up capabilities that until now were only available in Databricks. Delta Lake 2.0, the latest release of Delta Lake, has improvements including support for ZOrder, Change Data Feed, Dynamic Partition Overwrites, and Dropped Columns. Z-Ordering is a technique to colocate related information in the same set of files. This co-locality is used by Delta Lake in data-skipping algorithms, and the developers say it dramatically reduces the amount of data that Delta Lake on Apache Spark needs to read. Delta Lake 2 is available now. More InformationRelated ArticlesDatabricks Delta Lake Now Open Source Databricks Delta Adds Faster Parquet Import Databricks Runtime for Machine Learning Databricks Adds ML Model Export Apache Spark With Structured Streaming Spark BI Gets Fine Grain Security
To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.
Comments
or email your comment to: comments@i-programmer.info |