Big Data Analytics

Authors: David Loshin
Publisher: Morgan Kaufmann
Pages: 120
ISBN: 978-0124173194
Aimed at: Managers who need to understand big data technology
Rating: 4
Reviewed by: Kay Ewbank


This is a short book that aims to describe what big data is, what the problems are, how to work out what is actually needed in big data technologies in a business, and to then develop a strategic plan. Does it deliver?

As you can probably tell from that description, this isn’t a book aimed at developers. It doesn’t cover big data application development, MapReduce, Hadoop – nothing specific. Despite that, it’s a useful book if you have to talk to managers or business users and need to talk their language. It’s also a useful read if you haven’t been following the big data story and want to get a grip on the business drives behind it. Each chapter ends with a set of ‘thought exercises’ that would make good questions to ask (or to try to answer) when you’re looking at a specific big data project.

 

Banner

 

David Loshin starts with a brief look at the things that have led to the rise of big data. Next, he looks at which business problems are suited to big data analytics. This would be a useful chapter for Dilbert to show to his pointy headed boss, especially a table on ‘quantifying organizational readiness’ that can be used to work out a score on how ready your company is in areas of feasibility, reasonability, value, integrability, and sustainability. Each has entries scored from 0 to 4, so on the feasibility row, the company scores 0 if evaluation of new technology is not officially sanctioned, and 4 if evaluation and testing is encouraged, and there’s a clear decision process for adoption or rejection, and time is allocated to innovation.

The next chapter looks at what it means to adopt big data technology, and who needs to be involved. Loshin then moves on to developing a strategy for integrating big data analytics. Much of what he advises is obvious – clarify the criteria for adopting big data technology, prepare the environment for the sheer volume of data, put in proper levels of oversight and governance. However, the fact it’s obvious doesn’t mean it’s not correct, and plenty of big data projects have failed because something obvious has been missed.

Having introduced the topic of governance, the whole of the next chapter looks in more detail at the problems of working with data that has been created outside your control. There are some interesting insights into what constitutes good data, how you can measure data quality, and ways to make data reusable.

 

 

The rest of the book has more technical detail than the earlier chapters. A chapter introducing high-performance appliances for big data sets out typical ways big data analytics might be used, and in each case considers the storage, appliance and data management considerations. Loshin then goes on to look at the merits of hardware and software appliances, and gives a short analysis of row- versus column-oriented data layouts. A chapter on big data tools and techniques introduces Hadoop, MapReduce, Yarn, HBase, Hive, Pig and Mahout. The topics are covered fairly briefly and at a high level, but it’s a useful intro. MapReduce then gets a chapter to itself, with a simple example to demonstrate how it works. A chapter on NoSQL discusses key-value, document, tabular and object data stores, and where they fit into the bigger picture. There’s a nice chapter on using graph analytics for big data that gives an understandable and clear introduction to what graph analytics is. It then goes on to discuss when you might use graph analytics, what the different algorithms are (community analysis, path analysis, clustering and so on). There’s a good description of the technical complexity of analyzing graphs, and the chapter closes with a look at what features you should look for in a graph analytics platform. The book closes with a short chapter on best practices.

I found this book an interesting mix. There is a certain amount of management speak along the lines of use cases, organizational alignment, and instituting governance. However, each time I’d begun muttering, Loshin then got back to a practical and worthwhile point. The business advice made sense, and if you do have a pointy-headed boss, this book would be a good one to try to keep him pointing in the right direction. The four star rating I’ve given the book is for someone who needs to understand the technology and how it might be useful. For a programmer, the book is much less directly useful. It would make a good intro, and it would be good for giving you the right phrases to use when talking to your business users, but otherwise it’s too high level.

 

Banner


TinyML: Machine Learning with TensorFlow Lite

Authors: Pete Warden and Daniel Situnayake
Publisher: O'Reilly
Date: December 2019
Pages: 504
ISBN: 978-1492052043
Print: 1492052043
Kindle: B082TY3SX7
Audience: Developers interested in machine learning
Rating: 5, but see reservations
Reviewer: Harry Fairhead
Can such small machines really do ML?



Modern JavaScript for the Impatient

Author: Cay S. Horstmann
Publisher: Addison-Wesley
Date: July 2020
Pages: 352
ISBN: 978-0136502142
Print: 0136502148
Kindle: B08F5HFWBH
Audience: Developers interested in JavaScript
Rating: 4
Reviewer: Mike James
So you're impatient - what next?


More Reviews

Last Updated ( Friday, 19 September 2014 )