Machine Learning in Python
Machine Learning in Python

Author:  Michael Bowles 
Publisher: Wiley
Date: May 15, 2015
Pages: 360
ISBN: 978-1118961742
Print: 1118961749
Kindle: B00VOY1I98
Audience: Python programmers with data to analyze
Rating: 3.5
Reviewer:  Mike James 

Python is a good language to use to implement machine learning, but what exactly is machine learning? 

I was quite surprised by the selection of topics covered by this book. Not so long ago techniques such as regression, stepwise regression and even ridge regression would have been the subject of a book on statistics. A book with a title like "machine learning" would have started off with the perceptron algorithm, Bayes theorem and worked its way though a collection of methods that mostly didn't have much statistical basis. Machine learning used to be dynamic and mostly heuristic based. Now, with the advent of "big data" and "data science" you can write a book on machine learning without much in the way of statistics or AI and the subtitle of Michael Bowles' book reveals that his focus is "Essential Techniques for Predictive Analysis".

This is a book that takes a static data set and proceeds to find ways to either predict or classify some dependent variable. It isn't quite AI and it isn't quite statistics. 

Of course this isn't a problem if you are working in some sort of analytical capacity and want new way of dealing with data. Readers who want a mainstream machine learning book will be disappointed - there are no perceptrons, neural networks or Support Vector Machines. If you are looking for a book on statistical data analysis then again you need to look elsewhere. There are no discussions of significance, confidence intervals, principle components or anything similar. 

In fact there isn't a lot of theory in this book at all. The few equations are there simply to make the model or method clear. A lot of readers will welcome this, but machine learning is a mathematical pursuit and so you probably need to master some of the deeper ideas to do a good job. 




OK, what is the book about then? 

Chapter 1 introduces the two main ideas in the book: penalized or constrained regression and ensemble methods. This is a little strange because ensemble methods aren't really prediction/classification methods but ways to improve other prediction/classification methods. This doesn't really matter too much because both ideas are worth knowing about. Overall, however, this choice makes the subject matter a little narrow. 

Chapter 2 looks at basic data exploration and we encounter the usual ideas of graphical displays to help you get a feeling for your data. Here things are explained quite well but there is still a lot of "magic" going on. Why take a log transformation of the data? What exactly are we trying to do? 

Chapter 3 moves on to predictive model building. This introduces some of the problems of modeling mostly overfitting which is a theme throughout the book. The main topic is regression modelling and forward stepwise regression is introduced as a way of finding a parsimonious model. There is no mention of backward or full stepwise regression. 

Chapter 4 introduces penalized or constrained regression as a way of avoiding overfitting and as an alternative to stepwise regression. If you read the explanations carefully you might discover why adding a constraint to the usual least squares fit makes the algorithm find solutions with sparse parameter vectors but it isn't as clear as it could be. We also learn about some of the special algorithms invented to speed up constrained regression - LARS and Glmnet. Chapter 5 applies some of the ideas to sample data sets. 




Chapter 6 moves on to the second major topic of the book - ensemble methods. The idea that using more than one imperfect rule gives an improved performance is introduced, but it isn't really explained. In particular, the important fact that the ensemble of models need to be independent is mentioned, but this isn't discussed or emphasises enough. What is more, exactly how the models that are explained are constructed to be independent it underplayed. As part of the ensemble idea the binary decision tree is introduced as an easy way to get a set of complex independent models. The ensemble methods described include bagging, gradient boosting and random forests. Chapter 7 applies the ideas to some data sets. 

One feature of the book is that it is full of lots of Python code to compute the different methods. You might regard this as good but it doesn't make use of libraries such as numpy and there is a lot or repeated code. Arguably this is not the way to implement real data analysis routines and the value of the code is restricted to making sure that you understand how things work. On the other hand, relating the code to the theory isn't that easy.

If you want a practical introduction to penalized regression, binary decision trees, random forests and so on then this might give you a starting point. However you will need to read something with a more theoretical approach if you are to make any progress in the field. 


For an alternative title see Mike James review of Machine Learning in Action which also uses Python.


For more books on Python see Books for Pythonistas in Programmers Bookshelf 


Becoming Functional

Author: Joshua Backfield
Publisher: O'Reilly
Pages: 152
ISBN: 978-1449368173
Print: 1449368174
Kindle: B00LH4H8TE
Audience: Developers interested in the functional approach
Rating: 3
Reviewer: Alex Armstrong 

Do you want to be non-functional? Of course not!

Head First Android Development

Authors: Dawn and David Griffiths
Publisher: O'Reilly
Pages: 734
ISBN: 978-1449362188
Print: 1449362184
Kindle: B00ZVG1REQ 
Audience: Java programmers moving to Android
Rating: 3.5
Reviewer: Mike James

Head First Android Development sounds like a good way to get started, h [ ... ]

More Reviews



Last Updated ( Monday, 02 November 2015 )

RSS feed of book reviews only
I Programmer Book Reviews
RSS feed of all content
I Programmer Book Reviews
Copyright © 2017 All Rights Reserved.
Joomla! is Free Software released under the GNU/GPL License.