SparklyR - An R Interface For Spark
Written by Kay Ewbank   
Friday, 21 October 2016

The team behind R Studio has announced sparklyr, a new package that provides an interface between R and Apache Spark.

The new package aims to fulfil the need for a native dplyr interface to Spark, and to provide interfaces to Spark’s distributed machine learning algorithms. Dplyr is a package that provides a set of tools that you can use to manipulate datasets in R. It's a development from plyr, focusing on only data frames.

 

sparklyr illustration

 

The new package lets you interactively manipulate Spark data using both dplyr and SQL (via DBI). You can filter and aggregate Spark datasets, then bring them into R for analysis and visualization.

The package can also be used to orchestrate distributed machine learning from R using either Spark MLlib or H2O SparkingWater. Both provide a set of high-level APIs built on top of DataFrames that help you create and tune machine learning workflows.

Developers can also extend the package via the extensions that call the full Spark API and provide interfaces to Spark packages, as the facilities used internally by sparklyr for its dplyr and machine learning interfaces are available to extension packages.

You can set up Spark connections and browse Spark data frames within the RStudio IDE using SparklyR, and it also lets you connect to Spark from R via the integrated dplyr backend.

 

spark connect

 

The latest RStudio Preview Release of the RStudio IDE includes integrated support for Spark and the sparklyr package, including tools for: 

  • Creating and managing Spark connections
  • Browsing the tables and columns of Spark DataFrames
  • Previewing the first 1,000 rows of Spark DataFrames

The final version of RStudio IDE that includes integrated support for sparklyr will ship within the next few weeks. 

 

 sparkr

More Information

SparklyR Page

RStudio IDE Preview

Related Articles

Apache Spark 2.0 Released

Apache Spark Technical Preview

A Programmer's Guide to R - Data and Objects

Spark Announcements

Apache Releases Spark 1.6

Spark 1.4 Released

MOOC On Apache Spark 

 

To be informed about new articles on I Programmer, sign up for our weekly newsletter,subscribe to the RSS feed and follow us on, Twitter, FacebookGoogle+ or Linkedin

 

Banner


Pico 2W Announced But There Is A Surprise!
25/11/2024

Raspberry Pi released the Pico 2 a few months ago and we have been waiting for the Pico 2W since then. But Pimoroni beat them to the draw with the Pico Plus 2W based on the RM2 radio module and hinted [ ... ]



Kotlin Ktor Improves Client-Server Support
04/11/2024

Kotlin Ktor 3 is now available with better performance and improvements including support for server-sent events and CSRF (Cross-Site Request Forgery) protection.


More News

 

espbook

 

Comments




or email your comment to: comments@i-programmer.info

Last Updated ( Sunday, 06 November 2016 )