Practical Machine Learning |
Page 1 of 3 Author: Sunila Gollapudi This book aims to introduce you to both basic and more advanced features of Machine Learning, how does it fare? Machine Learning involves algorithms that learn from experience. It is used increasingly for tasks such as product recommendation and fraud detection, so it’s a skill worth investigating further. To get the most out of this book, you should already have some knowledge of math, and general programming. The book is: “...aimed at being a guidebook for both established and aspiring data scientists/analysts”. Below is a chapter-by-chapter exploration of the topics covered.
Chapter 1 Introduction to Machine learning The book opens by outlining the landscape and basic concepts of Machine Learning, these are built upon in subsequent chapters. An emphasis on practical real-world examples is noted. The chapter looks at what Machine Learning is - basically an algorithm that learns from experience. The concept of training, validation and testing datasets are discussed, and various basic terms are defined (e.g. unlabeled data, task, algorithm, model, and supervised learning). The section ends with a brief description of some real-world Machine Learning examples, including: detecting spam, credit card fraud, face recognition, and product recommendations. The chapter next looks at the different types of learning problems, including: classification, clustering, and regression. In each case, the learning problem is outlined - remember more information is provided in subsequent chapters. Similarly, the different types of learning are briefly discussed, namely: supervised, unsupervised, semi-supervised, reinforcement, and deep. Next, performance measures are discussed, these are used to evaluate learning algorithms. This is followed by an examination of various error measures together with any solutions, namely: mean squared error and mean absolute error. Some related IT fields are briefly compared with Machine Learning, including: data mining, AI, statistical learning, and data science. The Machine Learning process lifecycle is briefly explained, followed by a succinct look at some popular algorithms. This chapter provides a useful overview of what’s in the rest of the book. Machine Learning is defined, together with related terminology. The different types of learning problem are outlined. The chapter is well written, and relatively easy to read. There are helpful diagrams and inter-chapter links. The chapter ends with a helpful summary (and links to example code in later chapters). While the chapter is wide-ranging, the detail is relatively brief. An understanding of math and general programming is required, since little explanation is given. These traits apply to the whole book.
Chapter 2 Machine learning and Large-scale datasets This chapter opens with a look at the recent rise of Big Data, and how it relates to Machine Learning. Cheaper hardware, distributed architecture, and support for parallel processing facilitate the use of scalable Machine Learning algorithms. The chapter continues with a brief look at core aspect of Big Data systems (e.g. volume, velocity, unstructured data, and lambda architecture). The impact of Big Data on the rise of non-relational databases, scale-out architecture, and distributed and parallel computing are briefly discussed. Next, the chapter looks briefly at applying measures to algorithms relating to concurrency, performance and data volumes. The chapter ends with a look at technologies for scaling up, including: High Performance Computing (HPC), and Graphics Processing Unit (GPU). This chapter provides an overview of how Big Data has produced changes (e.g. scalable architectures) that Machine Learning systems can take advantage of. Most of this chapter can be read without any reference to Machine Learning (i.e. it’s a basic chapter about Big Data). In the context of Machine Learning is a whole chapter necessary? I suggest it distracts from what should be the book’s core focus, additionally, other books provide a more detailed and considered approach to Big Data. I note the superscripts are wrong, giving misleading data sizes (e.g. 103 for 103). In the section “Algorithms and Concurrency”, program code has been omitted for the fourth instruction. Also, various terms are introduced before being defined (e.g. LINQ, directed graph). Chapter 3 An Introduction to Hadoop's Architecture and Ecosystem This chapter is concerned with the use of Hadoop as the preferred environment for scalable Big Data needs, including Machine Learning. The chapter opens with the briefest look at the evolution of Hadoop and its major components. This is followed with a look at Hadoop’s various architectural layers, namely: data source (e.g. RDBMS), ingestion (e.g. Sqoop), storage (e.g. HDFS), management (e.g. Pig), analytics (e.g. Spark), and consumption (e.g. D3). For each layer, its usage is described together with example components. The section ends with a look at MapReduce, which allows the processing of massive amount of data in parallel, its architecture and components are described. The chapter ends with a look at some useful features in Hadoop 2.x, which addresses many of the limitations in Hadoop 1.x. Next, various Hadoop ecosystem components are briefly listed, followed by Hadoop installation and setup instructions. Lastly, the components of various Hadoop distributions and vendors are given. This chapter provides an overview of Hadoop and its major components. Again, most of this chapter can be read without any reference to Machine Learning. The section on Hadoop is too brief to be of any real value. I’m not sure why there is an emphasis on MapReduce, since many shops are moving towards much faster in-memory processing (e.g. Spark) – perhaps it’s because some later libraries use MapReduce since they don’t have their own distributed and parallel processing engines. The code for the VowelMapper class should be formatted to improve its readability. I’m beginning to wonder when I might read something about Machine Learning...
|
||||
Last Updated ( Saturday, 28 November 2020 ) |