Data Smart (Wiley) |
Author: Jordan Goldmeier This is an updated edition of a well regarded title which looks at accessible ways to combine statistics and machine learning, along with Excel, to discover insights in your data, It has been revised by Jordan Goldmeier who wasn't the original author and is a self-confessed Excel lover who's also a Microsoft MVP. The book kicks off with a chapter titled 'everything you ever needed to know about spreadsheets but were too afraid to ask', in which Goldmeier introduces Excel tables and lookup formulas, pivot tables and array formulas. He then goes on to look at Power Query, Microsoft's data transformation and data preparation engine. The chapter considers how to use Power Query's graphical interface to retrieve data, and the editor for applying transformations, and carrying out the extract, transform, and load (ETL) processing of data. Chapter three has the light-hearted title "Native Bayes and the Incredible Lightness of Being an Idiot." Goldmeier starts with what he says is the world's fastes intro to probability theory before going on to consider the chain rule, Bayes rule, and how to use Bayes to create an AI model. Two chapters on cluster analysis are next, starting with a look at using K-Means to segment your customer base, then going on to network graphs and community detection. Goldmeier then looks at regression, which he describes as the granddaddy of supervised artificial intelligence. The concepts are explained well, and the examples are carefully chosen to make the ideas clear. Next comes a chapter on ensemble models that Goldmeier describes as a whole lot of bad pizza. By this he's referring to an episode of the US version of the sitcom The Office when the boss asks whether its better to have a small amount of really good pizza or a lot of really bad pizza. He then goes on to extrapolate, saying many AI implementations are closer to the 'lots of bad pizza' model. A chapter on forecasting starts from the premise that there's no point worrying because you can't win, and Goldmeier backs up his assertion with a statement saying that the only guarantee in forecasting is that your forecast is wrong. He then goes on to say this doesn't mean you shouldn't try forecasting and that you'll still end up knowing more than nothing. Chapters on optimization modeling and outlier detection consider whether these techniques could be described as data science. Goldmeier then looks at how to go beyond spreadsheets with a chapter on R. Overall, this is a good introduction to data analysis using straightforward tools and mainstream techniques. I suspect most developers would find it more useful to use R and go further, but the book could help you get started with data analysis. Worth reading.
To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.
|