Kafka: The Definitive Guide

Author: Neha Narkhede, Gwen Shapira and Todd Palino
Publisher: O'Reilly
Pages: 322
ISBN: 978-1491936160
Print: 1491936169
Kindle: B0758ZYVVN
Audience: Developers using Kafka
Rating: 4.5
Reviewer: Kay Ewbank

b

Kafka is increasingly popular for moving large amounts of streaming data. This guide, subtitled Real-time data and stream processing at scale, has been written to show how the people who built Kafka control and use it. 

The authors are from Confluent and LinkedIn, and were among the team responsible for developing Kafka. They say that they wrote the book from the perspective of asking 'what are the most useful things we can share with new users to take them from beginner to expert'.

 

Banner

The book has some parts that are aimed at developers, others that are more useful for administrators of Kafka. It opens with a general introduction to Kafka and what it does, followed by a chapter on installing Kafka.

Having got those openers out of the way, the authors get into the heart of the book, beginning with a chapter on Kafka Producers and how to write messages to Kafka. Next comes a chapter on Kafka Consumers, and how to read data from Kafka. Both chapters have plenty of code snippets that illustrate the concepts being discussed. The samples are there to show the concepts rather than being full programs that you could copy and paste to produce a program you could run.

A chapter on Kafka Internals is next, looking at how Kafka replication works, how it handles requests from producers and consumers, and how it deals with message storage. There are explanations of how Kafka handles replication and partitions. All these topics are explained with the idea of giving a better understanding of why Kafka behaves in certain ways in certain situations.
 

 

The next chapter is titled Reliable Data Delivery, and looks at reliability guarantees and how to configure brokers.A chapter on building data pipelines comes next, starting with what to think about when building a pipeline, then going on to an introduction to Kafka Connect, with examples on connectors between a file source and a file sink, and between MySQL and ElasticSearch. There's also a discussion of alternatives to Connect.

Cross-cluster data mirroring is the next topic to be considered. The rest of the book concentrates on single Kafka cluster use, but this chapter shows how to handle the situation where you need to copy data between clusters using Kafka's MirrorMaker cross-cluster data replicator, including configuring and tuning it.
 

A chapter on administering Kafka is next, mainly looking at Kafka's command line utilities that you can use for basic cluster administration. However, as the authors point out, there are better third party tools available on the Kafka website. This chapter is followed by a look on how to monitor a Kafka cluster using the Java Management Extension (JMX) interface. The authors discuss the different metrics, which are the critical ones to monitor all the time, and what you should do in response to different results. They also look at which metrics are useful when debugging problems.

The final chapter looks at stream processing and how Kafka Streams works. This is Kafka's stream-processing library, and the authors show how to use it to build a topology and use it. The chapter ends with some stream processing use cases.

Overall, I found this book to be clearly written and it gave me a good explanation of what Kafka is capable of. The code samples illustrated the points well, and the authors obviously have a detailed knowledge of everything about Kafka. The one drawback of this is that sometimes it led to them giving a much shorter explanation of a point or concept where I'd have preferred a slower, more detailed description. That's still a minor point, and if you need to learn about Kafka, this is a very good book.

 

To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.

Banner


Expert Performance Indexing in Azure SQL and SQL Server 2022

Author: Edward Pollack & Jason Strate
Publisher: Apress
Pages: 659
ISBN: 9781484292143
Print: 1484292146
Kindle: B0BSWH65ST
Audience: DBAs & SQL devs
Rating: 4 or 1 (see review)
Reviewer: Ian Stirk 

This book discusses indexes, a primary means of improving performance in SQL Server, how does  [ ... ]



Grokking Machine Learning

Author: Luis G. Serrano
Publisher: Manning
Date: December 2021
Pages: 512
ISBN: 978-1617295911
Print: 1617295914
Kindle: B09LK7KBSL
Audience: Python developers interested in machine learning
Rating: 5
Reviewer: Mike James
Another book on machine learning - surely we have enough by now?


More Reviews

 

Last Updated ( Saturday, 28 November 2020 )