Apache Arrow 6 Improves Support For R and Rust
Monday, 22 November 2021

Apache Arrow 6 has been released with improvements to support for R and Rust as well as Arrow Flight. There's also new support for DataFusion.

Apache Arrow is a development platform for in-memory analytics. It has technologies that enable big data systems to process and move data fast..It is language independent, can be used for flat and hierarchical data, and the data store is organized for efficient analytic operations. It also provides computational libraries. Languages currently supported are C, C++, C#, Go, Java, JavaScript, Julia, MATLAB, Python, R, Ruby, and Rust.


The improvements to the new release start with the addition of bindings for Flight in GLib and Ruby. The team says that while SQL support for Flight hasn't made it into this release, work is ongoing. Arrow Flight SQL defines a protocol for clients to communicate with SQL databases using Arrow Flight.

In Arrow's compute layer, a basic in-memory query engine has been implemented and is accessible from the R bindings. The query engine supports operations including filter, project, sort, equality joins, and various aggregations. A wide range of functions have also been added in this version, and type support has been improved for most of the compute functions.

The support for R has been enhanced with a number of major new features in this version, some of which the team has been building up to for several years. In practical terms, there's more dplyr support, including the ability to carry out grouped aggregation. You can now summarise() on Arrow data, both with or without group_by(). These are supported both with in-memory Arrow tables as well as across partitioned datasets. Most common aggregation functions are supported. In addition to aggregation, Arrow now also supports all of dplyr’s mutating joins (inner, left, right, and full) and filtering joins (semi and anti).

The R team has also added support for DuckDB as a way to query Arrow Datasets. This means you can use duckdb’s dbplyr methods, as well as its SQL interface, to aggregate data.

Alongside the R improvements, there's new support for DataFusion. This is an embedded query engine that uses Rust and Apache Arrow to provide a system that the developers say is high performance, easy to connect, easy to embed, and high quality. This release includes a runtime operator metrics collection framework, and object store abstraction for unified access to local or remote storage. The framework includes Hive-style table partitioning support for Parquet, CSV, Avro and Json files, and DataFrame API support for: except, intersect, show, limit and window functions. It also has extensive SQL support, and now passes TPC-H queries 8, 13 and 21.

Apache Arrow 6 is available for download.


More Information

Apache Arrow Website

Arrow On GitHub

Related Articles

Apache Arrow 5 Improves Asynchronous Scanner

Apache Arrow 4 Adds New C++ Compute Functions

Apache Arrow Improves C++ Support

Apache Arrow 2 Improves C++ and Rust Support

Apache Arrow Reaches 1.0

Apache Arrow Flight Released

Apache Arrow Adds DataFusion Rust-Native Engine

Apache Arrow Adds Streaming Binary Format


To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.


Google Stays Execution Of Adblockers

Google has revised the timetable for phasing out Manifest V2  as Chrome extensions transition to Manifest V3. Now V2 extensions have a new deadline of January 2024 for holders of an Enterprise po [ ... ]

Data Scientists Salary Data

Salary data reveals that data scientists in the United States can expect to earn over $100K. Work in this capacity for one of the FAANG companies and the median salary is $187K. If you get to to a sen [ ... ]

More News





or email your comment to: comments@i-programmer.info

Last Updated ( Monday, 22 November 2021 )