Author: Michael Bowles
Date: May 15, 2015
Audience: Python programmers with data to analyze
Reviewer: Mike James
Python is a good language to use to implement machine learning, but what exactly is machine learning?
I was quite surprised by the selection of topics covered by this book. Not so long ago techniques such as regression, stepwise regression and even ridge regression would have been the subject of a book on statistics. A book with a title like "machine learning" would have started off with the perceptron algorithm, Bayes theorem and worked its way though a collection of methods that mostly didn't have much statistical basis. Machine learning used to be dynamic and mostly heuristic based. Now, with the advent of "big data" and "data science" you can write a book on machine learning without much in the way of statistics or AI and the subtitle of Michael Bowles' book reveals that his focus is "Essential Techniques for Predictive Analysis".
This is a book that takes a static data set and proceeds to find ways to either predict or classify some dependent variable. It isn't quite AI and it isn't quite statistics.
Of course this isn't a problem if you are working in some sort of analytical capacity and want new way of dealing with data. Readers who want a mainstream machine learning book will be disappointed - there are no perceptrons, neural networks or Support Vector Machines. If you are looking for a book on statistical data analysis then again you need to look elsewhere. There are no discussions of significance, confidence intervals, principle components or anything similar.
In fact there isn't a lot of theory in this book at all. The few equations are there simply to make the model or method clear. A lot of readers will welcome this, but machine learning is a mathematical pursuit and so you probably need to master some of the deeper ideas to do a good job.
OK, what is the book about then?
Chapter 1 introduces the two main ideas in the book: penalized or constrained regression and ensemble methods. This is a little strange because ensemble methods aren't really prediction/classification methods but ways to improve other prediction/classification methods. This doesn't really matter too much because both ideas are worth knowing about. Overall, however, this choice makes the subject matter a little narrow.
Chapter 2 looks at basic data exploration and we encounter the usual ideas of graphical displays to help you get a feeling for your data. Here things are explained quite well but there is still a lot of "magic" going on. Why take a log transformation of the data? What exactly are we trying to do?
Chapter 3 moves on to predictive model building. This introduces some of the problems of modeling mostly overfitting which is a theme throughout the book. The main topic is regression modelling and forward stepwise regression is introduced as a way of finding a parsimonious model. There is no mention of backward or full stepwise regression.
Chapter 4 introduces penalized or constrained regression as a way of avoiding overfitting and as an alternative to stepwise regression. If you read the explanations carefully you might discover why adding a constraint to the usual least squares fit makes the algorithm find solutions with sparse parameter vectors but it isn't as clear as it could be. We also learn about some of the special algorithms invented to speed up constrained regression - LARS and Glmnet. Chapter 5 applies some of the ideas to sample data sets.
Chapter 6 moves on to the second major topic of the book - ensemble methods. The idea that using more than one imperfect rule gives an improved performance is introduced, but it isn't really explained. In particular, the important fact that the ensemble of models need to be independent is mentioned, but this isn't discussed or emphasises enough. What is more, exactly how the models that are explained are constructed to be independent it underplayed. As part of the ensemble idea the binary decision tree is introduced as an easy way to get a set of complex independent models. The ensemble methods described include bagging, gradient boosting and random forests. Chapter 7 applies the ideas to some data sets.
One feature of the book is that it is full of lots of Python code to compute the different methods. You might regard this as good but it doesn't make use of libraries such as numpy and there is a lot or repeated code. Arguably this is not the way to implement real data analysis routines and the value of the code is restricted to making sure that you understand how things work. On the other hand, relating the code to the theory isn't that easy.
If you want a practical introduction to penalized regression, binary decision trees, random forests and so on then this might give you a starting point. However you will need to read something with a more theoretical approach if you are to make any progress in the field.
For an alternative title see Mike James review of Machine Learning in Action which also uses Python.
For more books on Python see Books for Pythonistas in Programmers Bookshelf