Programming Skills for Data Science
Written by Mike James   

Authors: Michael Freeman and Joel Ross
Publisher: Addison-Wesley
Date: December 2018
Pages: 384
ISBN: 978-0135133101
Print: 0135133106
Kindle: B07KMDCHT2
Audience: Would-be data scientists
Rating: 5
Reviewer: Mike James
If you are looking for a programmer's guide to R, this might be it.

As I've said before, the big problem with writing a book on R is whether it should concentrate on the programming language or the statistical procedures via the use of the language. This particular book is more toward the programming language with some simple statistical procedures - mostly with graphics acting as examples.

It starts off, Part I Chapter 1,  with setting up your computer and very sensibly covers IDEs  including RStudio, which is the obvious one to use. Don't try using just a text editor to program in R - it will cost you a lot of time unless you are already an expert and even then a good IDE will save you from mistakes. It also covers setting up GitHub which plays a moderately central role in the rest of the book. Collaboration is common in statistical work but it still isn't clear to me that Git or GitHub is a key component - it certainly makes things more complicated at first.

Chapter 2 shows you how to use R from the command line, including navigating the file system. Useful stuff, but I think it should be in an appendix.

Part II is about using Git but mainly via GitHub to manage your projects. If you don't plan using GitHub skip on to Part III. Chapter 4 is also about markdown as a way of creating simple documentation, another useful skill.

 

Banner

Part III gets to grips with R. It goes though the basics of variables, functions, conditionals and lists. Personally I think it should cover Data Frames as the ultimate R data structure, but this is postponed until Part IV. All of the descriptions are good and easy to read. There is a lot of intelligent writing in this part of the book - in fact there is a lot of intelligent writing in most of the book. This isn't a dummies book and you need to read it carefully.

Part IV is moving towards statistics but it is still mostly about using the R language to manipulate data. After a brief look at the generalities of data the book moves on to Data Frames. Then on to manipulating data mostly using the dplyr and the tidyr functions. Chapter 13 is a short introduction to accessing a SQL database. Chapter 14 covers REST and accessing web data including JSON.

Part V is much more about stats but only simple graphs and charts. Here you learn to plot with ggplot2, plotly, rbokeh and leaflet. Part VI returns to programming aspects of using R. Chapter 18 deals with dynamic reports using markdown, 19 is about websites using Shiny and 20 returns to the idea of using GitHub for collaboration. The final chapter provides some guidance on learning statistics, other language and so on.

This book will not teach you much about statistics apart from some very basic ideas about data. I will teach you quite a lot about R. For my tastes not quite enough about R but it does a better job than other books I have reviewed. The writing style is, as I said earlier "intelligent". There are plenty of comments and asides to set the scene and it is all easy to read.

Highly recommended as an introduction to R and the programming practices that surround it. You will still need to teach yourself statistics, but that is another, and much bigger, problem.

 

To keep up with our coverage of books for programmers, follow @bookwatchiprog on Twitter or subscribe to I Programmer's Books RSS feed for each day's new addition to Book Watch and for new reviews.

Banner


Reliable Source: Lessons from a Life in Software Engineering

Author: James Bonang
Date: January 2022
Pages: 608
Kindle: B09QCBVJ9V
Audience: General interest
Rating: 5
Reviewer: Kay Ewbank

This book combines a fun read with interesting insights into how to write reliable programs.



SQL Server Query Tuning and Optimization (Packt)

Author: Benjamin Nevarez
Publisher: Packt Publishing Pages: 446
ISBN: 9781803242620
Print: 1803242620
Kindle: B0B42SVBFY
Audience: Intermediate to advanced DBAs and developers
Rating: 4.7
Reviewer: Ian Stirk 

This book aims to give you the tools and knowledge to get peak performance from your que [ ... ]


More Reviews

Last Updated ( Tuesday, 30 July 2019 )