Data Mining, Practical Machine Learning Tools and Techniques

Author: Ian H Witten, Eibe Frank & Mark A Hall
Publisher: Morgan Kaufmann,
Pages: 607
ISBN: 978-0123748560
Aimed at: Those wanting an in-depth introduction
Rating: 4.5
Pros: Very readable and understandable
Cons: Stops short of hard core statistics
Reviewed by: Kay Ewbank

If you are looking for information on data mining that has coverage of the concepts of machine learning is this the book for you?

This is a very readable book that covers an important topic; how can you find patterns in your data. Most companies store quantities of data that would in the past have seemed unbelievable, but very few companies make good use of that data. If it’s customer data, you’ll get the catalog through the post (or probably two catalogu even if you asked not to receive any), but that’ll be more or less it as far as making use of the data goes.

Most data has underlying patterns that can be useful and enlightening. If you’re working with customer data, why have some previous customers stopped buying? If you’re storing data on diseases and patient recovery, why do some patients survive while others don’t?

Banner

The book is divided into three parts; first an introduction to data mining, what it means, what machine learning is and how it is used. Knowledge representation and how you can evaluate results that are produced by machine learning round off this introduction. Even if you’re not planning on doing real world data mining, this first part of the book is worth reading so you know what the terms mean and what sort of things are possible.

Part Two of the book looks at more advanced data mining techniques. The chapters take you through real machine learning schemes, data transformations and ensemble learning, ending with an interesting chapter on the future of data mining. For me, this part was the heart of the book. If you’re interested, for example, in how to choose a test for classifying data, what an exemplar is and how to reduce the number you get, different types of clustering, Bayesian networks, they’re all covered. The chapter on ensemble learning was particularly interesting, showing how you can combine the results from different outputs in a variety of ways, use one output to boost the strength of another, and generally improve the results you get by using more than one machine learning technique.

Part Three of the book will either be incredibly useful or a complete waste of space, depending on the way you’re planning to do data mining. It is devoted entirely to the Weka data mining workbench, an open source data mining tool that was developed at the University of Waikato, New Zealand. All three of the authors have been involved in the development of the workbench, so this is an excellent introduction to it, but only if you plan on using it. However, even if you decide to use a different machine learning tool, the chapters on Weka do show how you can put together a data mining model and interpret the results; it’s just you’d have to apply the techniques to the application you were planning to use.

The examples in the book use several sets of data, some of which are well known - Fisher’s Iris data will make anyone who’s ever studied statistics feel immediately at home. Other examples cover weather data, types of contact lenses prescribed to patients, classification of soybean diseases, and Canadian Labor negotiations. While other books on data mining go into great depth on the statistical techniques, this book stops short of that. It does explain the concepts, and makes good use of diagrams and written explanations of how techniques work, but doesn’t really get into the actual equations and statistics. Whether this is an advantage or a drawback depends on your personal point of view, but even if you’re going to go on and get into the heavy statistics, it’s a good start so you know why you’re using one technique rather than another.

In summary, if you want a good introduction to data mining get this book.


Banner


Beginning Programming All-in-One For Dummies

Author: Wallace Wang
Publisher: For Dummies
Pages: 800
ISBN: 978-1119884408
Print: 1119884403
Kindle: B0B1BLY87B
Audience: Novice programmers
Rating: 3
Reviewer: Kay Ewbank

This is a collection of seven shorter books introducing key aspects of programming, but it fails through trying to cover too [ ... ]



Machine Learning with PyTorch and Scikit-Learn

Author: Sebastian Raschka, Yuxi (Hayden) Liu & Vahid Mirjalili
Publisher: Packt
Date: February 2022
Pages: 770
ISBN: 978-1801819312
Print: 1801819319
Kindle: B09NW48MR1
Audience: Python developers interested in machine learning
Rating: 5
Reviewer: Mike James
This is a very big book of machine le [ ... ]


More Reviews

Last Updated ( Saturday, 28 November 2020 )