Spark BI Gets Fine Grain Security
Spark BI Gets Fine Grain Security
Written by Kay Ewbank   
Friday, 06 January 2017

There's a new technique for adding fine grain security when using Apache Hive and Spark to work with large data sets.

 Spark lets you use SQL expressions on data in Hive, but authorization has until now required you to use HDFS ACLs. This lacks the granularity needed with columnar data. While the ideal solution would be if Spark could recognize and respond to fine grain security settings, one alternative is to use an external daemon that can interact with schema level security settings.

LLAP (Live Long and Process, groan) is a collection of long lived daemons that works in tandem with the HDFS Data Node service, and provides the ability to interact with schema based security. LLAP was introduced in Hive 2. It is a hybrid execution model that offers benefits such as caching of columnar data, JIT-friendly operator pipelines, and reduced overhead for multiple queries (including concurrent queries). 

Explaining the technique on the HortonWorks blog, Vadim Vaks said that with LLAP enabled, Spark reads from HDFS directly through LLAP, meaning the only other element needed is a centralized authorization system, which can be provided by Apache Ranger.  This provides centralized authorization and audit services for many components that run on Yarn or rely on data from HDFS, including HDFS, Yarn, Hive (Spark with LLAP), HBase, Kafka, Storm, Solr,  Atlas and Knox. Vaks says:

"Each of the above services integrate with Ranger via a plugin that pulls the latest security policies, caches them, and then applies them at run time."

 So Spark receives the query statement and communicates with Hive to obtain the relevant schemas and query plan. The Ranger Hive plugin is then used to check the cached security policy and inform Spark which columns it can access. 

Apache Ranger provides a centralized security framework to manage fine grained access control over Hadoop and related components (Apache Hive, HBase etc.). The Ranger plugin sits in the path of the user request and is able to make a decision on whether the user request shoud be authorized. The plugin also collects access request details required for auditing.

Once Spark has been told which columns it can access, it then uses LLAP to read from the filesystem. LLAP deals with any filtering or masking, and if the query contains requests for columns that aren't authorized, LLAP stops processing the request and throws an Authorization exception to Spark. If masking is used, the restricted columns are returned but containing only asterisks or a hash of the original value.

Ranger can also be used to offer row level security, so a query will return only the rows that a user has permission to see. As Vaks explains, a row level policy from Ranger would instruct Hive to return a query plan that includes a predicate that filters unauthorised rows. Spark receives the modified query plan and initiates processing, reading data through LLAP. LLAP ensures that the predicate is applied and that the restricted rows are not returned.


sparklogo

 

More Information

HortonWorks Blog Post On LLAP

LLAP On Hive Wiki

Spark LLAP Tutorial

Apache Ranger FAQ

Related Articles 

Apache Spark 2.0 Released

Apache Spark Technical Preview

Spark Announcements

Apache Releases Spark 1.6

Spark 1.4 Released

SPARQL Moves Closer

 

To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on, Twitter, FacebookGoogle+ or Linkedin.

 

Banner


Bjarne Stroustrup Awarded IET Faraday Medal
26/09/2017

Bjarne Stroustrup, the inventor of C++, is the 2017 winner of the Faraday Medal is the most prestigious award of the UK-based Institution of Engineering and Technology.



Kotlin Begins Its Takeover Of Android
11/10/2017

Google made Kotlin its lead language for Android development back in May of this year (2017) and it has been interesting to watch for signs of its take up or outright rejection. Now we have the first  [ ... ]


More News

 

 
 

 

blog comments powered by Disqus

 
 

   
Banner
RSS feed of news items only
I Programmer News
Copyright © 2017 i-programmer.info. All Rights Reserved.
Joomla! is Free Software released under the GNU/GPL License.