Apache Spark MapR Connector Provides JSON Support
Apache Spark MapR Connector Provides JSON Support
Written by Kay Ewbank   
Monday, 05 June 2017

There's a new Native Spark Connector for MapR-DB JSON that gives developers APIs to access MapR-DB JSON documents from Apache Spark, using the Open JSON Application Interface (OJAI) API.

Apache Spark is an open source big data processing framework, which is used for analytics on streaming and batch workloads. MapR-DB is a high performance NoSQL database, which supports two primary data models: JSON documents and wide column tables. A Spark connector is available for each data model. With the Spark/MapR-DB connectors, you can use MapR-DB as a data source and as a data destination for Spark jobs.

The Native Spark Connector for MapR-DB JSON supports loading data from a MapR-DB table as a Spark Resilient Distributed Dataset (RDD) of OJAI documents and saving a Spark RDD into a MapR-DB JSON table. (An RDD is the base format for storing data for use by Spark.)

native connector batch image

The connector includes a set of APIs that that enable MapR users to write applications that consume MapR-DB JSON tables and use them in Spark. It is is a companion to the MapR-DB Binary Connector for Apache Spark, which can be used to write applications that consume HBase binary tables and use them in Spark.

The connector has two APIs that let you load data from a MapR-DB JSON table to a Spark RDD or save a Spark RDD to a MapR-DB JSON table. It also provides support for Scala bean classes, has a custom partitioner that allows you to partition data for better performance, and supports data locality. When the connector reads data from MapR-DB, it uses the data locality feature of MapR-DB to spawn the Spark executors.

The Native Spark Connector includes support for data frames and dataset APIs, so HBase and MapR-DB binary tables can be queried directly with Spark. The advantage this offers is that it removes any intermediary layers, making it easier to construct faster data pipelines and reduce latency associated with data movement.


More Information

MapR-DB OJAI Documentation

Related Articles

Apache Spark 2.0 Released

Apache Spark Technical Preview

Spark Announcements

Apache Releases Spark 1.6

Spark 1.4 Released

MOOC On Apache Spark 

Learning Spark (book review) 


To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on, Twitter, FacebookGoogle+ or Linkedin.



Microsoft Shuts Coding4Fun - Big Changes At Channel 9

Without any warning Coding4Fun, a really useful and fun blog hosted on Microsoft's Channel 9 portal, has gone. There are also rumors of big changes to the portal itself. More proof that Microsoft does [ ... ]

How Meltdown Works

The news is full of Meltdown and Spectre attacks that appear to work on a wide range of current CPUs, particularly on Intel processors dating from 1995 on. The interesting part of the story is how the [ ... ]

More News




blog comments powered by Disqus

Last Updated ( Monday, 05 June 2017 )

RSS feed of news items only
I Programmer News
Copyright © 2018 i-programmer.info. All Rights Reserved.
Joomla! is Free Software released under the GNU/GPL License.