Telling Stories With Data |
Author: Dr. Rohan Alexander The aim of this book is to show how you can build and share knowledge based on data and how to use R to build applications based on data. The book is organized into six parts - Foundations, Communications, Acquisition, Preparation, Modeling and Applications. The Foundations part of the book starts with an overview of the intent of the book, before the author moves on to a set of worked examples that show the principles from the rest of the book, and follow the recommended workflow of plan, simulate, acquire, model and communicate. Chapter 3 then introduces tools that can be used in the workflow to ensure your results can be reproduced. Specifically, Quarto for documents integrating text and R code, R Projects to make the project independent of a specific directory structure, and Git and GitHub for sharing code and data. The chapter also looks at using R. Part Two of the book considers communication, with chapters on how to write an effective report, and how to make good use of graphs, tables and maps. Part Three is concerned with how you acquire useful data. There's a chapter on measurement and sampling that also looks at publicly available data such as census data and other government statistics. This is followed by a chapter that looks more at tools you might use for getting data such as data scraping, OCR if the data isn't available digitally, and extraction from PDFs. This part of the book ends with techniques that you can use to acquire your own data including conducting an experiment, running an A/B test, and running surveys. Having acquired your data, the next part of the book considers how to prepare the data and turn it from raw into something that can be shared and explored. There's a good chapter on cleaning and preparing the data, and another useful one on storing and retrieving it, including how to use R data packages and Parquet. Part Five gets on to data modeling, from exploratory data analysis so you understand the data, through the use of linear models, to generalised linear models including logistic, Poisson, and negative binomial regression. The final main part of the book considers applications of modeling. There's a chapter on making causal claims from observational data that looks at how you might make use of difference-in-differences, regression discontinuity, and instrumental variables. A chapter on multilevel regression with post-stratification shows how to use a statistical model to adjust for known biases. This part of the book ends with a chapter on the analysis of text-based data. The final chapter is made up of advice on how you go further and what to read to support this. Overall, this is a useful book if you want to do data analysis with some use of R. You do need to be reasonably confident with statistics, or willing to read around the material, but each chapter does come with a list of things you can read ahead of working through the chapter, and there are frequent suggestions for more material throughout the text. There are also lots of examples in R, and plenty of exercises to follow. If you're willing to put the work in, this is a book that will teach you a lot. To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.
|