Hadoop Adds In-Memory Caching
Written by Kay Ewbank   
Friday, 07 March 2014

Apache Hadoop 2.3.0 has been released with support for in-memory caching and heterogeneous storage hierarchy for HDFS.

The addition of in-memory caching for HDFS means you can choose to cache specific files or directories in HDFS that apps such as MapReduce, Hive and Pig can then read without the overhead of normal disk-based reads. 

apachadbanner

 

According to Justin Kestelyn, on the Cloudera blog, the preliminary benchmarks show that optimized applications can achieve read throughput on the order of gigabytes per second.

Kestelyn reports: that when you cache a file or directory,:

“DataNodes will then cache the corresponding blocks in off-heap memory through the use of mmap and mlock. Once cached, Hadoop applications can query the locations of cached blocks and place their tasks for memory-locality. Finally, when memory-local, applications can use the new zero-copy read API to read cached data with no additional overhead.”

Writing on the Hortonworks blog Arun Murthy says:

“As an example, Hive is taking advantage of this feature by implementing an extremely efficient zero-copy read path for ORC files.”

Other highlights include some support for heterogeneous storage in HDFS with the addition of heterogeneous storage classes, and simplified distribution of MapReduce binaries via the YARN Distributed Cache.

The heterogeneous storage classes mean that Hadoop can now use different storage types on the same Hadoop clusters, so that it will be possible to put together a mix of SSDs, memory, and different types of disks in the same cluster, and let each application choose the storage type that best fits its performance or cost requirements.

Hortonworks has an interesting article describing Heterogeneous Storages in HDFS in more detail and in his blog post, Arun Murthy says:

“we can now make better cost/benefit tradeoffs with different storage media such as commodity disks, enterprise-grade disks, SSDs, Memory etc.”

The Hadoop 2.3.0 Release Notes gives details of other changes and features.


 

hadoopsquare

 


More Information

Apache Hadoop

Apache Hadoop 2.3.0 is Released- Cloudera

Apache Hadoop 2.3.0 Released! (Hortonworks)

Hadoop 2.3.0 Release Notes

Heterogeneous Storages in HDFS

Related Articles

Hadoop 2 Introduces YARN

Hadoop gets to 1.0

Hadoop for Windows

New Hadoop connectors

 

To be informed about new articles on I Programmer, install the I Programmer Toolbar, subscribe to the RSS feed, follow us on, Twitter, Facebook, Google+ or Linkedin,  or sign up for our weekly newsletter.

 

blog comments powered by Disqus

 

Banner


New Downloads For Raspberry Pi
16/09/2014

The Raspberry Pi Foundation has announced new releases of its OS Raspbian and the installer software that helps users to get started, NOOBS.



Time To Enter The Intel RealSense App Challenge
22/09/2014

With just days to go until October 1st, there is still time to come up with an idea for using the new Intel RealSense technology that could win you $50,000.


More News

Last Updated ( Friday, 07 March 2014 )
 
 

   
RSS feed of news items only
I Programmer News
Copyright © 2014 i-programmer.info. All Rights Reserved.
Joomla! is Free Software released under the GNU/GPL License.