Big Data Fundamentals

Author: Thomas Erl et al
Publisher: Prentice Hall
Pages: 240
ISBN: 978-0134291079
Print: 0134291077
Kindle: B019YLYLVY
Audience: Decision-makers new to Big Data
Rating: 3.0
Reviewer: Ian Stirk

Big Data is an increasingly hot topic, so an entry level book should prove useful to newcomers.

Although not explicitly stated, the book seems to be aimed at decision-makers/managers. Its content is descriptive rather than code-based. Some general IT awareness, experience of traditional IT systems, and of the system life-cycle will help in understanding the book.

The book is relatively small, containing around 200 working pages, spread over eight chapters. The book is split into two parts, the first part looks at the background to Big Data, planning considerations, and some of the technologies involved, the second part looks at concepts involved in storing and analysing Big Data.

Below is a chapter-by-chapter exploration of the topics covered.

Chapter 1 Understanding Big Data

The book opens with a look at some concepts and terminology, specifically: datasets, analysis, analytics, business intelligences (BI), and key performance indicators (KPI) – all are described with reference to business usage. Next, the standard Big Data Vs are briefly discussed, namely: volume, velocity, variety, veracity, and value. The chapter continues with a look at the different types of data, namely: structured, unstructured, semi-structured, and metadata. The chapter ends by introducing a case study (Ensure to Insure), which is continued in discrete subsections in subsequent chapters.

This chapter provides a useful background to Big Data, defining salient concepts, Big Data characteristics, and types of data.

The chapter is well written, easy to read, with lots of helpful diagrams. Discussions are explained in a business context. These traits apply to the whole book.

Chapter 2 Business Motivations and Drivers for Big Data Adoption

This chapter opens with a look at business cycles, with cost-cutting occurring during recessions and new products/services and innovations during times of growth. The importance of using Big Data to extract more useful information for competitive advantage is discussed. Next, the importance business architecture and its alignment to IT are briefly discussed.

The chapter continues with a look at the factors that have increased the uptake of Big Data by business, these include:

Affordable Technology and Commodity Hardware
Social Media
Hyper-Connected Communities and Devices
Cloud Computing
Internet of Everything (IoE)

This chapter provides a helpful overview of why Big Data processing is needed together with some of its driving forces.

Chapter 3 Big Data Adoption and Planning Considerations

This chapter outlines some concerns to consider when introducing Big Data, these include: privacy, security, auditing, batch and streaming support, performance, and use of the cloud. In each case, each concern is briefly described and put into context.

The chapter continues with a look at the Big Data analytics lifecycle, which differs from traditional analysis due to the volume, velocity and variety of data. The analytics sections briefly discussed are: data identification, gathering and filtering, extraction, validation/cleansing, aggregation/representation, analysis, visualization, and use of results.

The chapter provides a helpful list of factors to consider when planning a Big Data system.

Chapter 4 Enterprise Technologies and Big Data Business Intelligence

This chapter provides an overview of the salient distinguishing features of various types of enterprise system, namely:

Online Transaction Processing (OLTP) – mainly many small quick queries
Online Analytical Processing (OLAP) – mainly a few long running queries
Extract Transform Load (ETL) – process of moving and transforming data
Data Warehouses/Data Marts – data storage

The chapter continues with a brief overview of traditional BI (ad-hoc reports, dashboards), before looking at Big Data BI which can analyse multiple business processes at the same time.

Chapter 5 Big Data Storage Concepts

This chapter discusses various aspect of Big Data storage, including:

Clusters – grouping of servers, co-ordinated processing
File Systems and Distributed File Systems – provides parallel processing and scalability
NoSQL – non-relational databases, many niche types
Sharding – horizontal partitioning of datasets
Replication – provides scalability and fault tolerance
Sharding and Replication – provides high availability

The chapter continues with a look at the CAP theorem, which basically states that in a partitioned system you can have either availability or consistency. Next, the ACID principles of transaction management are defined (i.e. Atomic, Consistent, Isolated, Durable), before looking at BASE database principle (Basically Available, Soft State, Eventually consistent) – which prefers availability over consistency.

This chapter provides a useful introduction to some of the technology factors involved in Big Data systems.

Chapter 6 Big Data Processing Concepts

This chapter looks at how the large volumes of data are processed, using parallel processing on commodity servers in a cluster. Hadoop, the most popular Big Data platform is introduced VERY briefly.

The chapter continues with a look at MapReduce batch processing. In essence, the data is broken down and processed on numerous servers (the Map phase), the results are combined and aggregated where necessary (the Reduce phase). The chapter next considers realtime in-memory stream processing, here realtime can mean sub-second to under a minute.

This chapter provides a useful overview of Big Data processing. The section on Hadoop is much too brief to be of use.

Chapter 7 Big Data Storage Technology

This chapter opens with a look at disk storage, being relatively cheap and used for long-term storage. This continues with a look at distributed file systems, providing data redundancy and high availability. Next, traditional relational database management systems (RDBMSs) are discussed, these have costly vertical scaling, and are generally unable to cater for the timely processing of large data volumes. NoSQL databases are then examined, these are generally highly scalable. The main types of NoSQL databases are outlined (i.e. key-value, document, column-family, and graph). NewSQL is briefly mentioned, this attempts to marry some NoSQL features with RDBMSs.

The chapter next looks at in-memory storage, while this is more expensive it can offer significantly improved performance. Useful lists of when in-memory storage is appropriate and inappropriate are given. The section ends with a look at the usage of in-memory data grids, and in-memory databases.

Chapter 8 Big Data Analysis Techniques

This chapter provides an introduction to the various common analysis techniques. The core of the chapter looks at

Statistical Analysis (e.g. A/B testing, correlation, regression)
Machine Learning – systems that learn from experience (e.g. classification, clustering)
Semantic Analysis – extracting meaning from text/speech (e.g. sentiment analysis)
Visual Analysis – graphical data representation (e.g. heat maps)

In each case, the technique is explained with adequate detail and useful diagrams. Some very useful questions are proposed to answer.

Appendix A. Case Study Conclusion

The use case is introduced in Chapter 1, and extended with additional detail at the end of each subsequent chapter. This approach is useful since the use case can be examined stand-alone, without interfering with the main body of the book.

Conclusion

This book aims to introduce the basics of Big Data and provides a suitable introduction to Big Data for managers. It is generally easy to read, with a good flow, and has plenty of helpful diagrams. Many sections are brief, which helps maintain focus and interest. Explanations are continuously put into a business context. The book describes Big Data generically, with little reference to specific tools.

As a developer/technologist, I found some sections too wordy with too much business emphasis, although this approach might be suitable for managers. Some sentences in the book felt like consultant-speak, i.e. long words used to say the obvious or little. For a developer-focused introduction to Big Data, I still recommend Big Data Made Easy, see my review here.

Reliable Source: Lessons from a Life in Software Engineering

Author: James Bonang
Date: January 2022
Pages: 608
Kindle: B09QCBVJ9V
Audience: General interest
Rating: 5
Reviewer: Kay Ewbank

This book combines a fun read with interesting insights into how to write reliable programs.

+ Full Review

Driving Value With Sprint Goals

Author: Maarten Dalmijn
Publisher: Addison-Wesley
Pages: 256
ISBN: 9780137381920
Print: 0137381921
Kindle:B0C7ZJR7N2
Audience: Scrum developers
Rating: 5
Reviewer: Kay Ewbank

Over the years I've read a lot of books about agile development and Scrum, and most concentrate on the methodology rather tha [ ... ]

+ Full Review

More Reviews

Last Updated ( Saturday, 07 May 2016 )

Recent Articles

Recent Book Reviews

Popular Articles