Apache Arrow Adds New View Data Types
Written by Kay Ewbank   
Monday, 27 November 2023

Apache Arrow 14 has been released with new view data types for columnar formats, and a wide range of other improvements.

Apache Arrow is a development platform for in-memory analytics. It has technologies that enable big data systems to process and move data quickly. Arrow is language independent, can be used for flat and hierarchical data, and the data store is organized for efficient analytic operations. It also provides computational libraries. Languages currently supported are C, C++, C#, Go, Java, JavaScript, Julia, MATLAB, Python, R, Ruby, and Rust.

arrow

The improvements to the new release start with new data types for viewing data in Arrow's columnar format. The developers say the improvements were motivated by recent innovations in DuckDB and Meta's Velox engine. The extra view data types start with 16-byte StringView and BinaryView that provide better buffer reuse and faster "false" string comparisons. There are also new ListView and LargeListView types for more performant "out-of-order" building and processing of lists and better buffer reuse.

A new compute function has been added that can be used to calculate a cumulative mean on numeric data, and several other functions have been improved including rounding, divide, take and filter, with new support for duration inputs. 

A new RPC method was added to allow polling for completion in long-running queries as an alternative to the existing blocking GetFlightInfo call, and an experimental asynchronous GetFlightInfo call was added to the client-side API in C++ and Python. The CMake configuration was also fixed to correctly require linking to Arrow Flight RPC when using Arrow Flight SQL

Elsewhere, in Go, the underlying generated Protobuf code is now exposed for easier low-level integrations with Flight, and in Java, utilities were added to help implement basic Flight SQL services for unit testing.

Apache Arrow 14 is available now.  

arrow

More Information

Apache Arrow Website

Related Articles

Apache Arrow 5 Improves Asynchronous Scanner

Apache Arrow 4 Adds New C++ Compute Functions

Apache Arrow Improves C++ Support

Apache Arrow 2 Improves C++ and Rust Support

Apache Arrow Reaches 1.0

To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.

Banner


Improved Code Completion With JetBrains Mellum
29/10/2024

JetBrains has launched Mellum, a proprietary large language model specifically built for coding. Currently available only with JetBrains AI Assistant, Mellum is claimed to provide faster, sm [ ... ]



Google Intensive AI Course - Free On Kaggle
05/11/2024

Google is offering a 5-Day Gen AI Intensive Course designed to equip data scientists with the knowledge and skills to tackle generative AI projects with confidence. It runs on the Kaggle platform from [ ... ]


More News

espbook

 

Comments




or email your comment to: comments@i-programmer.info

Last Updated ( Monday, 27 November 2023 )