Dremio 3.0 Adds Data Catalog

Written by Kay Ewbank

Tuesday, 13 November 2018

There's a new version of Dremio, an open-source project designed to give business analysts and data scientists a way to explore and analyze data no matter what its structure or size. New in this release are a data catalog, prioritized workload management, and Kubernetes support.

The developers of Dremio describe it as a data virtualization platform. The software is based on Apache Arrow, Apache Parquet, and Apache Calcite, and the company behind Dremio is a major contributor to Arrow. Apache Arrow is a cross-language development platform for in-memory data. It specifies a standardized language-independent columnar memory format for flat and hierarchical data. Apache Parquet offers similar features for file-based storage. uses Apache Calcite is used for SQL parsing and query optimization.

dremio

Dremio builds Arrow-based structures called Reflections. These are optimized copies of data based on queries against data sources. Dremio also has a query optimizer that uses Apache Arrow to work out the best representation of data to make the query faster. This might mean that a query against an ElasticSearch cluster (for example) would use the Arrow representation of the data instead.

Dremio also has a built-in SQL based query language that provides similar features to those of cost-based optimizers such as SparkSQL, but with the addition of Reflections to take the idea further by providing the optimized copy of the data.

The new version of Dremio adds a data catalog with the idea that users will be able to carry out a simple Google-like search to find datasets. Under the covers, Dremio administrators tag datasets to organize them so they can be discovered by data consumers. The catalog includes built-in wiki pages where information can be stored such as who to ask questions, how often the data is updated, what sources of data make up the dataset, and screen shots of reports and visualizations that use the dataset.

This release also includes support for Gandiva, a new execution kernel for Arrow that is based on LLVM. Gandiva provides performance improvements for low-level operations on Arrow buffers. The developers say in the right circumstances, using Gandiva can improve query performance dramatically - some early testers have reported improvements of over 70x.

Security has been improved with native integration with Apache Ranger for centralized access control. In addition, Dremio 3.0 now supports end-to-end TLS encryption.

New multi-tenant workload controls have been added so that administrators can control resource allocation based on user, group membership, time of day, data source, and query type using standard SQL.

The Kubernetes support comes via an official Docker image and templates for elastic, highly available deployments using the Kubernetes orchestration framework.

Elsewhere there's a new declarative engine for relational database sources that is designed to provide more efficient processing on systems such as Postgres, SQL Server, Oracle, and Teradata; and support for new daa sources including Azure Data Lake Store, Elasticsearch 6, AWS S3 GovCloud, and Teradata.

dremio

More Information

Dremio Website

Apache Arrow Adds Streaming Binary Format

Apache Kylin 2.5 Adds All-in-Spark Cubing Engine

Kylin 2.3.0 Adds SQL Server Support

Apache Kylin Gets Table Level ACL Management

Apache Kylin Adds RDBMS Support

Spark BI Gets Fine Grain Security

To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.

OpenAI Releases Swarm
25/10/2024

OpenAI has released an experimental educational framework for exploring ergonomic, lightweight multi-agent orchestration. Swarm is managed by the OpenAI Solution team, but is not intended to be used i [ ... ]

+ Full Story

Rust 1.82 Improves Apple Support
24/10/2024

Following Rust's six-week release cycle, version 1.82 has been released with higher level support for Apple, and a new Info subcommand for Cargo.

+ Full Story

More News

Comments

or email your comment to: comments@i-programmer.info

Last Updated ( Tuesday, 13 November 2018 )

More Information

Related Articles

Comments