Rapids Data Analysis Libraries Updated

Written by Kay Ewbank

Tuesday, 17 March 2020

The RAPIDS software libraries collection for machine learning and data analysis has been updated with improvements to performance and core libraries.

RAPIDS is a GPU-based collection of software libraries for machine learning and data analysis.RAPIDS can be used to create a data science pipeline including data loading, ETL, model training, and inference up to 50 times faster on typical end-to-end data science workflows. RAPIDS has been co-developed by NVidia and developers from some popular open-source projects, specifically Apache Arrow, pandas and scikit-learn. The software is also integrated into the Apache Spark open-source framework for data analytics.

rapids

The updated version benefits from some major refactoring to everything from the RAPIDS core to individual libraries The RAPIDS libraries are more interoperable and more performant. The developers say Python users should just see everything go faster with no changes to their code. The main lower level C++ library has also received major work. The refactoring operation has also improved interoperability with BlazingSQL and Java.

The core cuML suite of libraries that implement machine learning algorithms and mathematical primitives functions has been improved with multi-node, multi-GPU (MNMG) algorithms to scale to larger datasets, along with the addition of pickling and model object cloning functionality for improved usability. The cuGraph library of graph algorithms has also been refactoed to improve interoperability with cuDF and cuML. cuDF provides a pandas-like API, and it too has undergone refactoring that is described as 'almost complete' with key APIs, such as join, sort, sort-based groupby, and the majority of string functions, having been ported to the new libcudf++ APIs and data structures.

The BlazingSQL GPU accelerated SQL engine has been updated to support data-skipping, meaning WHERE clauses in SQL statements will selectively filter and load Apache Parquet row groups during query execution based on Parquet’s metadata. The developers say this substantially reduces the amount of data loaded into memory and enables users to work on even bigger workloads.

The developers say the next version of RAPIDS will have many more features and will be released to coincide with NVidia's GTC conference later in March.

rapids

More Information

RAPIDS Libraries

Databricks Adds ML Model Export

Machine Learning Added To Azure HDInsight

Apache Kylin 2.5 Adds All-in-Spark Cubing Engine

Spark Gets NLP Library

To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.

JetBrains CLion Now Free For Non-Commercial Use
08/05/2025

JetBrains is extending its non-commercial licensing model to CLion, its IDE for C and C++ development on Linux, OS X and Windows. This means that if you are using CLion for hobby development,&nbs [ ... ]

+ Full Story

Google Adds Open-Source Development Kit To Vertex AI
15/04/2025

Google has added an Agent Development Kit (ADK) to Vertex AI, along with an agent engine and an Agent2Agent protocol that provides agents with a common, open language for collaboration. The anno [ ... ]

+ Full Story

More News

{laodposition comment}

Last Updated ( Tuesday, 17 March 2020 )

Recent Articles

Recent Book Reviews

Popular Articles

More Information

Related Articles