Azure Storage, Streaming and Batch Analytics

Author: Richard Nuckolls
Publisher: Manning
Pages:448
ISBN: 978-1617296307
Print:1617296309
Kindle: B09781TWFJ
Audience: Data engineers
Rating: 4.5
Reviewer: Kay Ewbank

This book is aimed at developers and system engineers who need to successfully collect and process data in Azure, specifically using the Azure Lambda architecture.

The aim of the book is to show how to put together the Azure services to create a working system. The author opens with an overview of what he means by data engineering, then moves on to discussing the main concepts of Azure, what services are available and how they can be put together to build a data processing system based on Lambda. This chapter provides a good overview of what services you're going to need, and how they interact, as well as what Lambda provides.

The next few chapters look at each of the services in order, starting with Azure storage accounts. This begins with how to create a storage account, then discusses the Storage account services -blob storage, queues, and so on. Azure Data Lake Storage is the next service to be examined in detail, and like the other chapters this starts with creation, in this case how to create an Azure Data Lake store. Nuckolls then looks at Data Lake store access, folder structure and data drift, finishing with a look at the copy tools for Data Lake stores.

Chapter 5 is a meaty chapter on message handling with Event Hubs. as the name suggests, these are used to ingest and serve event messages, time-based series of event data from apps. Nuckolls looks in detail at how event hubs work, how to create a namespace and an event hub, partitioning, configuring capture, and securing access to event hubs.

Real-time queries with Azure Stream Analytics is the next topic. Stream Analytics is used to read data sources, execute operations on the data, and output the results to data synics. The chapter shows how to create a service, then create and run jobs on the service using the Azure portal and with PowerShell. The queries are based on SQL, and Nuckolls looks at creating a job query, writing job queries, and managing their performance.

Batch queries with Azure Data Lake Analytics are covered in the next few chapters. This starts with a look at U-SQL and how it is a blend of SQL and C#. The role of extractors to read files and outputters to write rowsets in Data Lake Storage are well explained, as are expressions for transforming rowsets. The chapter also covers schema extraction and aggregation before moving back up a level to describe how to create a Data Lake Analytics service. I think you'd need to read more detailed material if you're not already familiar with SQL and C# - the descriptions are fine as an overview and introduction, but essentially you've got two languages and how they're used in combination and that's a big topic.

Chapter 8 goes into U-SQL in more detail with regards to how to use it for complex analytics, with good sections on Windows functions and local C# functions. This is followed by a look at how to integrate with Data Lake Analytics, specifically processing unstructured data, connecting to remote sources, and working with different file types.

Azure Data Factory, which manages task execution, is the topic for the next chapter, including how to create the service, authenticate securely, and copy files with ADF.

The final two chapters go back to SQL, starting with managed SQL with Azure SQL Database. This chapter covers creating a database, securing it, and ensuring availability and recovery, along with cost optimization. The following chapter looks at integrating Data Factory with Azure SQL Database, mainly how to import data into it.

The book ends with a chapter on where to go next, looking at Data catalog, version control and backups.

This is a useful book, with clear descriptions of how to set up and use the many services that Azure provides. It gives enough information to get you started, so your services work and talk to each other. You won't be an expert in U-SQL or Azure Analytics once you've read the book, but you will have a working system that you can then fine-tune.

For recommendations of other books on Azure see Cloud Computing Books - Pick Of The Bunch in our Programmer's Bookshelf section.

To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.

The Programmer's Brain (Manning)

Author: Dr. Felienne Hermans
Publisher: Manning
Date: September 2021
Pages: 256
ISBN: 978-1617298677
Print: 1617298670
Kindle: B09CQHBVQZ
Rating: 4
Reviewer: Mike James
Programmers have a brain - but what is it doing?

+ Full Review

Algorithmic Thinking, 2nd Ed (No Starch Press)

Author: Dr. Daniel Zingaro
Publisher: No Starch
Date: January 2024
Pages: 480
ISBN: 978-1718503229
Print: 1718503229
Kindle: B0BZGZHK3B
Audience: C programmers
Rating: 4
Reviewer: Mike James
What exactly is algorithmic thinking?

+ Full Review

More Reviews

Last Updated ( Thursday, 03 February 2022 )