Telling Stories With Data

Author: Dr. Rohan Alexander
Publisher: Chapman & Hall/CRC
Date: July 2023
Pages: 598
ISBN: 978-1032134772
Print: 1032134771
Kindle: B0C97MMKPX
Audience: Data scientists
Level: Intermediate
Category: Data Science
Rating: 4.5
Reviewer: Kay Ewbank

The aim of this book is to show how you can build and share knowledge based on data and how to use R to build applications based on data. 

The book is organized into six parts - Foundations, Communications, Acquisition, Preparation, Modeling and Applications. 

Banner

The Foundations part of the book starts with an overview of the intent of the book, before the author moves on to a set of worked examples that show the principles from the rest of the book, and follow the recommended workflow of plan, simulate, acquire, model and communicate. 

storiesdata

Chapter 3 then introduces tools that can be used in the workflow to ensure your results can be reproduced. Specifically, Quarto for documents integrating text and R code, R Projects to make the project independent of a specific directory structure, and Git and GitHub for sharing code and data. The chapter also looks at using R. 

Part Two of the book considers communication, with chapters on how to write an effective report, and how to make good use of graphs, tables and maps. 

Part Three is concerned with how you acquire useful data. There's a chapter on measurement and sampling that also looks at publicly available data such as census data and other government statistics. This is followed by a chapter that looks more at tools you might use for getting data such as data scraping, OCR if the data isn't available digitally, and extraction from PDFs.  This part of the book ends with techniques that you can use to acquire your own data including conducting an experiment, running an A/B test, and running surveys.

Having acquired your data, the next part of the book considers how to prepare the data and turn it from raw into something that can be shared and explored. There's a good chapter on cleaning and preparing the data, and another useful one on storing and retrieving it, including how to use R data packages and Parquet. 

Part Five gets on to data modeling, from exploratory data analysis so you understand the data, through the use of linear models, to generalised linear models including logistic, Poisson, and negative binomial regression. 

The final main part of the book considers applications of modeling. There's a chapter on making causal claims from observational data that looks at how you might make use of difference-in-differences, regression discontinuity, and instrumental variables. A chapter on multilevel regression with post-stratification shows how to use a statistical model to adjust for known biases. This part of the book ends with a chapter on the analysis of text-based data.

The final chapter is made up of advice on how you go further and what to read to support this. 

Overall, this is a useful book if you want to do data analysis with some use of R. You do need to be reasonably confident with statistics, or willing to read around the material, but each chapter does come with a list of things you can read ahead of working through the chapter, and there are frequent suggestions for more material throughout the text. There are also lots of examples in R, and plenty of exercises to follow. If you're willing to put the work in, this is a book that will teach you a lot.  

To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.

Banner


Beginning Programming All-in-One For Dummies

Author: Wallace Wang
Publisher: For Dummies
Pages: 800
ISBN: 978-1119884408
Print: 1119884403
Kindle: B0B1BLY87B
Audience: Novice programmers
Rating: 3
Reviewer: Kay Ewbank

This is a collection of seven shorter books introducing key aspects of programming, but it fails through trying to cover too [ ... ]



Modern Software Engineering (Addison-Wesley)

Author: David Farley
Pages: 256
ISBN: 978-0137314911
Print:0137314914
Kindle: B09GG6XKS4
Audience: Software Engineers
Rating: 3.5
Reviewer: Kay Ewbank

This book is subtitled 'doing what works to build better software faster' - does it teach you how to achieve that?


More Reviews