Author: Jason Bell
Audience: Developers and technical professionals
Reviewer: Mike James
Hands-On for Developers and Technical Professionals - sounds good.
Machine learning is becoming ever more important. Currently it is more or less taking over many of the tasks that would have been done by classical statistics and it is important to know something about it.
The problem is that it is an area where research is ongoing - it is the closest thing to rocket science we have in computing. So there is a need for a book that sets out the ideas in a more practical than theoretical way. This what this book does and as long as you don't expect in-depth coverage of the theory or suggestions for further research you probably won't be disappointed.
The first two chapters are a bit of a waste of space. In Chapter 1, Jason Bell goes over the history of machine learning and then spends far too long telling you about what you might use it for. The only useful part is about the software used in the rest of the book - yes it is hands-on and there are examples that you can try. If you were hoping for TensorFlow or one of the modern neural network kits then you have the wrong book. The software used is mainly the Weka toolkit and big data software - Mahout, SpringXD, Hadoop and so on. Chapter 2 then moves on to consider what you might call "management" issues - building a data team, looking after data security and so on.
The book really gets going in Chapter 3 with a simple example - decision trees. You don't get much in the way of theory here, but you do get a reasonable example of using a decision tree approach to data via Weka. The explanations are reasonable and you even get to calculate the entropy associated with a variable. As the conclusion says, decision trees are simple but they are often effective.
After this we move on to a more complex decision making process, the Bayesian network. Again it is light on theory. You get just enough to understand the reasoning behind Bayes theorem and a little about graphs; then you implement a Bayesian network example.
Chapter 5: Artificial Neural Networks is the one that might fall short of your expectations. This is not about neural networks as implemented in the vision systems that have made so many headlines recently. This isn't at all unreasonable because this isn't what the book is about. You don't apply convolutional neural networks to this sort of data. The chapter starts off with a brief explanation of single layer networks. The artificial neurons in use here are referred to as perceptrons and we don't get much discussion of the large number of different types of artificial neurons that you can use to build a neural network. There is coverage of multilayer perceptrons and back propagation, but only at the level of "this is how you train a network". Finally there's an example using Weka. What is missing from this chapter is huge - nothing in detail about cross validation, batch training, knockout and so forth. This is a big subject and this is a very short chapter.
The next three chapters introduce other standard AI techniques. Chapter 6 explains association rule learning; Chapter 7 is on support vector machines (SVM) and Chapter 8 is about clustering. These are all in the same style as the earlier chapters and feature a short explanation followed by a practical example. The description of the SVM doesn't have any math. but it does manage to convey the basic idea.
The final section of the book is more about big data than it is about AI. Chapter 9 explains how to use SpringXD to process data streams. Chapter 10 is about Hadoop and the batch processing approach to big data. Chapter 11 explains Spark and its relationship to Hadoop. The final chapter is a crash course in R and explains the basics of the language and then presents examples of regression, sentiment analysis and association rules.
This is not a book that will make you a machine learning expert, but it is a reasonable overview of the most commonly used methods and the examples show you how they are actually used. There are some errors in the examples, but if you aren't up to getting over them you probably aren't going to be up to doing real world analysis.
If you are looking for a book that explains the math and the deep theories then this is not the book for you. At best this will get you started and then it will take a long while before you feel comfortable with the ideas. What the book is good at is giving you an overview of the data processing approaches to machine learning - Hadoop and Spark, for example. In this sense it goes well beyond most machine learning books.
A First Course in Machine Learning
Machine Learning for Hackers
Machine Learning in Action
Mahout in Action
Machine Learning in Python
To keep up with our coverage of books for programmers, follow @bookwatchiprog on Twitter or subscribe to I Programmer's Books RSS feed for each day's new addition to Book Watch and for new reviews.