Apache Impala 4 Supports Operator Multi-Threading |
Written by Kay Ewbank |
Thursday, 29 July 2021 |
Apache Impala 4 has been released with many improvements including support for multi-threading across all operators, and support for all TPC-DS 99 queries without manual rewrites. The new version has also improved authentication and authorization. Impala is an open source, native analytic database for Apache Hadoop that provides a high-performance distributed SQL engine. It was originally developed by Cloudera, and donated to the Apache Software Foundation along with Apache Kudu. Impala can be used to run SQL queries on data stored in HDFS, HBase, Apache Kudu, Amazon S3, and Microsoft ADLS without requiring data movement or transformation. The support for multi-threading by operators in the new release overcomes earlier limitations caused because a single query fragment ran in a quasi-single threaded manner on a node. The scanners did run in multiple threads, but all other operators (joins, aggregation) ran in the main thread. The new support adds multi-threaded execution on a single node by running multiple fragment instances, each of which runs in a single thread. The move results in significant performance improvements for some queries, in some cases up to seven times faster by taking better advantage of all the CPU cores. The degree of parallelism used for certain operations that can benefit from multithreaded execution is set by a parameter called mt_dop (MultiThreading Degree Of Parallelism). Until now, Impala only supported setting MT_DOP in queries that have only scans and aggregates. This limitation has now been removed. Another improvement to the new release is that it supports all TPC-DS 99 queries without manual rewrites, including Rollup, Cube and Grouping sets, and uncorrelated subqueries in SelectList. Support has also been added for Intersect and Except set operations. Authentication and authorization features have been strengthened in the new release, with the ability to integrate with Apache Knox, and support for SAML (Security Assertion Markup Language) authentication. Impala is also now FIPS (Federal Information Processing Standards) compliant. A number of LDAP (Lightweight Directory Access Protocol) features have been added, including support for LDAP search bind operations, and User LDAP search bind support. Other authentication and authorization improvements include support for Ranger row-filtering policies, and support for basic role-related statements with Ranger. Kudu table ownership is also supported. The full list of improvements can be seen in the Impala release notes, and Impala is available for download now. More InformationRelated ArticlesApache Kudu Improves Web Interface Hadoop SQL Query Engine Launched Cloudera Impala Real Time Query On Hadoop Apache Arrow Adds Streaming Binary Format Apache Kafka Adds New Streams API Apache Beam Moves To Top Level Spark BI Gets Fine Grain Security
To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.
Comments
or email your comment to: comments@i-programmer.info |