Apache Impala 4 Supports Operator Multi-Threading
Written by Kay Ewbank   
Thursday, 29 July 2021

Apache Impala 4 has been released with many improvements including support for multi-threading across all operators, and support for all TPC-DS 99 queries without manual rewrites. The new version has also improved authentication and authorization.

Impala is an open source, native analytic database for Apache Hadoop that provides a high-performance distributed SQL engine. It was originally developed by Cloudera, and donated to the Apache Software Foundation along with Apache Kudu.

impala

Impala can be used to run SQL queries on data stored in HDFS, HBase, Apache Kudu, Amazon S3, and Microsoft ADLS without requiring data movement or transformation.

The support for multi-threading by operators in the new release overcomes earlier limitations caused because a single query fragment ran in a quasi-single threaded manner on a node. The scanners did run in multiple threads, but all other operators (joins, aggregation) ran in the main thread. The new support adds multi-threaded execution on a single node by running multiple fragment instances, each of which runs in a single thread. The move results in significant performance improvements for some queries, in some cases up to seven times faster by taking better advantage of all the CPU cores.

impala parallel query improvement

The degree of parallelism used for certain operations that can benefit from multithreaded execution is set by a parameter called mt_dop (MultiThreading Degree Of Parallelism). Until now, Impala only supported setting MT_DOP in queries that have only scans and aggregates. This limitation has now been removed.

Another improvement to the new release is that it supports all TPC-DS 99 queries without manual rewrites, including Rollup, Cube and Grouping sets, and uncorrelated subqueries in SelectList. Support has also been added for Intersect and Except set operations.

Authentication and authorization features have been strengthened in the new release, with the ability to integrate with Apache Knox, and support for SAML (Security Assertion Markup Language) authentication. Impala is also now FIPS (Federal Information Processing Standards) compliant. A number of LDAP (Lightweight Directory Access Protocol) features have been added, including support for LDAP search bind operations, and User LDAP search bind support. 

Other authentication and authorization improvements include support for Ranger row-filtering policies, and support for basic role-related statements with Ranger. Kudu table ownership is also supported.

The full list of improvements can be seen in the Impala release notes, and Impala is available for download now.

impala

More Information

Impala Website

Impala 4 Release Notes

Related Articles

Apache Kudu Improves Web Interface

Hadoop SQL Query Engine Launched

Cloudera Impala Real Time Query On Hadoop 

Apache Arrow Adds Streaming Binary Format 

HBase Adds MultiWAL Support

Apache Kafka Adds New Streams API

Apache Beam Moves To Top Level

HBase Adds MultiWAL Support

Spark BI Gets Fine Grain Security

 

To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.

Banner


The Feds Want Us To Move On From C/C++
13/11/2024

The clamour for safe programming languages seems to be growing and becoming official. We have known for a while that C and C++ are dangerous languages so why has it become such an issue now and is it  [ ... ]



Remembering Thomas Kurtz, Co-creator of BASIC
15/11/2024

Thomas Eugene Kurtz, the co-founder of the BASIC programming language, has died at the age of 96. BASIC, which was developed for the purpose of education, popularized computer programming making it ac [ ... ]


More News

espbook

 

Comments




or email your comment to: comments@i-programmer.info