Spark in Action, 2nd Ed (Manning)

Author: Jean-Georges Perrin
Publisher: Manning
Pages: 576
ISBN: 978-1617295522
Print: 1617295523
Audience: Java developers interested in Spark
Rating: 4.5
Reviewer: Kay Ewbank

This is a rewritten version of an earlier title from a different author, so whether it counts as a second edition or not is open to question. What actually matters, though, is how well it fulfills the task of showing how to use Spark for big data analytics.

The book starts with a description of what Spark is and what it does, how you can use it and what you can do with it. Spark Architecture and flow are covered next, along with the dataframe and its importance. There's an interesting chapter on laziness, which in Spark terms refers to the way Spark makes your life easier by optimizing its workload. This chapter looks at Catalyst, Spark's built-in optimizer, and introduces directed acyclic graphs. Part 1 of the book ends with chapters on building a simple app and deploying it.

Part 2 is about ingestion, with chapters on ingestion from files and from databases. There's a chapter on advanced ingestion that covers finding data sources and building your own, and the section finishes with a look at ingestion through structured streaming.

Part 3 covers data transformation, with chapters on working with SQL, transforming your data, and transforming entire documents. There's a reasonable chapter on extending transformations with user defined functions. It's got some good information and shows the key elements of how to put together a UDF, and you should be fine so long as you're a competent programmer. The section ends with a chapter on aggregating your data.

Part 4, the final section of the book, looks at going further, with chapters on enhancing Spark's performance with caching and checkpointing; exporting data and building full data pipelines; and deployment constraints in which Yarn, Mesos and Kubernetes are considered.

Throughout the book author Jean-Georges Perrin uses Java-based examples to illustrate the ideas, including a complete data pipeline for processing NASA satellite data.

Verdict: The book is a thorough look at Spark, and should take you from first stages through to being a competent Spark user.

Related Reviews

Mastering Apache Spark

Learning Spark

Spark is one of the topics covered in Reading Your Way Into Big Data, an article on Programmer's Bookshelf in which Ian Stirk provides a roadmap of the reading required to take you from novice to competent in areas relating to data science.

To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.

Coding All-In-One For Dummies

Author: Chris Minnick
Publisher: For Dummies
Pages: 912
ISBN: 978-1119889564
Print: 1119889561
Kindle: B0B5BBNW9L
Audience: People wanting to learn to code in JavaScript, Flutter and Python
Rating: 3.5
Reviewer: Kay Ewbank

This book is described as offering an ideal starting place for learning th [ ... ]

+ Full Review

Killer ChatGPT Prompts (Wiley)

Author: Guy Hart-Davis
Publisher: Wiley
Pages: 240
ISBN: 978-1394225255
Print: 1394225253
ASIN: B0CF3WFTWM
Audience: Everyone
Rating: 5
Reviewer: Ian Stirk

This book aims to get optimal answers to your questions from ChatGPT, how does it fare?

+ Full Review

More Reviews

Last Updated ( Wednesday, 20 January 2021 )

Recent Articles

Recent Book Reviews

Popular Articles

Related Reviews