Python for Data Science

Author: Yuli Vasiliev
Publisher: No Starch
Date: August 2022
Pages: 240
ISBN: 978-1718502208
Print: 1718502206
Kindle: B09BKLV68X
Audience: Python developers interested in data analysis
Rating: 3
Reviewer: Mike James
Python seems to be the goto for data science, so why not read about it?

Data science is a difficult concept to pin down. If you are interpreting it as classical statistics then this book will disapoint you. It is more about Python and using the standard Python data modules than it is about data.

It starts off with a chapter on the basics of data - unstructured v structured, time series and so on. It also discusses sources of data inluding web page scraping and databases. Chapter 2 introduces basic Python focussing mainly on Lists but following up with tuples, dictionaries and sets.

 

Chapter 3 is an introduction to NumPy and pandas via a simple example. The rest of the chapter is on scikit. Chapter 4 is about acquiring data - basic file handling, and the use of the requests library to access web pages. Chapter 5 covers similar ground but for databases - mainly MySQL with a bit of no-SQL. Chapter 6 continues the data manipulation theme with aggregating data and Chapter 7 goes on with combining datasets.

Chapter 8 breaks into a different topic with a look at visualization - basically how to use Matplotlib. This is such a big subject that it does no more than provide an introduction. Chapter 9 is sort of related to chapter 8 in that it is about location data and how to visualize it. Chapter 10 introduces very basic time series analysis - nowhere near enough to do anything but the most simple analysis.

Chapter 11 and 12 is where you might expect to find traditional stats but 11 is on gaining insights from data and is more about detecting associations. Chapter 12 is about machine learning and I don't think I've ever managed to cope with running a logistic regression being described as training. 

Verdict:

This is not a good book for a variety of reasons. It mostly fails to explain the ideas involved in what it is making use of. The explainations are nearly all in terms of the program that does the task under discussion. As a result this is more a book about Python and its libraries than it is about data science. At best it gives you some idea of what libraries there are to get a particular job done. It certainly doesn't give you enough of an idea of what these libraries can do and what can go wrong.  Many of the examples involve special knowledge and are likely to leave the reader wondering if they lack that knowledge. All of the programs are availalble on GitHub. From my point of view the biggest problem is the way standard statistical techques are simply ignored - often in favour of simply looking at the data.

There isn't enough in this book for many Python programmers to understand data science and there isn't enough for most data scientists to appreciate Python.

To keep up with our coverage of books for programmers, follow @bookwatchiprog on Twitter or subscribe to I Programmer's Books RSS feed for each day's new addition to Book Watch and for new reviews.

Banner


Embracing Modern C++ Safely

Author: Dr. John Lakos, Vittorio Romeo, Dr. Rostislav Khlebnikov and  Alisdair Meredith
Publisher: Addison-Wesley
Date: December 2021
Pages: 1376
ISBN: 978-0137380350
Print: 0137380356
Kindle: B09HTFQB92
Audience: C++ developers
Rating: 4
Reviewer: Harry Fairhead
Writing safe C++ - sounds essential

 [ ... ]



Code: The Hidden Language of Computer Hardware and Software 2nd Ed

Top Book 2023
Author: Charles Petzold
Publisher: Microsoft Press
Date: August 2022
Pages: 480
ISBN: 978-0137909100
Print: 0137909101
Kindle: B0B123P5GV
Audience: General
Rating: 5
Reviewer: Mike James
Code! We all need to know about it.


More Reviews

Last Updated ( Wednesday, 13 March 2024 )