Hadoop in 24 Hours
Hadoop in 24 Hours

Author: Jeffrey Aven
Publisher: Sams
Date: April 2017
Pages: 500
ISBN: 978-0672338526
Print: 0672338521
Kindle: B06XYM3XH4
Audience: Big data developers
Rating: 4.5
Reviewer: Kay Ewbank

Hadoop is a complex ecosystem, but this book does a good job of teaching you the way around it. 

The book opens with chapters introducing Hadoop, the Hadoop Cluster Architecture, and deploying Hadoop. The Hadoop Distributed File System (HDFS) is introduced next, followed by techniques for getting data into Hadoop using Flume, Sqoop, and the HDFS RESTful interface. A chapter on data processing in Hadoop introduces MapReduce very nicely.

Banner

Part Two of the book assumes you know enough to actually use Hadoop, and opens with a chapter on programming MapReduce applications using the Java MapReduce API and the MapReduce Streaming API.

Next the author introduces data analysis in HDFS using Apache Pig, from Pig Latin basics through to Pig's built-in functions. A second chapter on Pig looks at more advanced topics such as grouping data, multiple dataset programming, user-defined functions, and the use of macros and variables to automate Pig.

 

 

Two chapters on Hive give a good grounding in analyzing data using Apache Hive, going as far as complex datatypes and optimizing and managing queries in Hive. A chapter on SQL oh Hadoop introduces Impala, Tex, HAWQ and Drill, but it is only an introduction.

The final chapters in this part of the book look at Spark, the Hadoop User Environment (HUE), and NoSQL in the form of HBase and Cassandra.

Hadoop management occupies the rest of the book, starting with YARN, and in particular administering it and scheduling applications using it. The more general Hadoop ecosystem gets a chapter next, with introductions to Oozie and to machine learning and visualization in Hadoop.

Cluster management can be complex in Hadoop, and there's a good chapter on the various cluster management utilities, and a further one on cluster configuration. A chapter on advanced HDFS covers topics such as rack awareness, federation and HDFS caching.

The final chapters cover securing Hadoop, monitoring and troubleshooting, and a set of case studies on integrating Hadoop.

Overall, this is a very good book. There's enough to introduce all the elements of Hadoop and its ecosystem, and while you'd still need to read books specific to some of the sub topics, you get a good grounding in what tools to use and how to use them.

 

Related Reviews

Data Analytics With Hadoop

Field Guide to Hadoop

Hadoop Application Architectures

Hadoop Essentials

Hadoop for Finance Essentials

Hadoop: The Definitive Guide (4th ed)

Hadoop Interview Guide

Professional Hadoop

 

To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on, Twitter, FacebookGoogle+ or Linkedin.

 

Banner


Concurrency in C# Cookbook

Author:  Stephen Cleary 
Publisher: O'Reilly
Pages:208 
ISBN: 978-1449367565
Print: 1449367569
Kindle: B00KCY2CB4
Audience: Experienced C# programmers
Rating:  4
Reviewer: Mike James

Concurrency is a tough topic and you need all the help you can get, hence thi [ ... ]



Abusing the Internet of Things

Author: Nitesh Dhanjani 
Publisher: O'Reilly
Pages: 296
ISBN: 978-1491902332
Print: 1491902337
Kindle: B013VQ7N36
Audience: Developers engaged in creating apps for Internet-connected devices
Rating: 4.5
Reviewer: Harry Fairhead

The subtitle - Blackouts, Freakouts and Stakeouts makes thi [ ... ]


More Reviews

 

Last Updated ( Saturday, 27 May 2017 )
 
 

   
Banner
RSS feed of book reviews only
I Programmer Book Reviews
RSS feed of all content
I Programmer Book Reviews
Copyright © 2017 i-programmer.info. All Rights Reserved.
Joomla! is Free Software released under the GNU/GPL License.