|Azure Storage, Streaming and Batch Analytics|
Author: Richard Nuckolls
This book is aimed at developers and system engineers who need to successfully collect and process data in Azure, specifically using the Azure Lambda architecture.
The aim of the book is to show how to put together the Azure services to create a working system. The author opens with an overview of what he means by data engineering, then moves on to discussing the main concepts of Azure, what services are available and how they can be put together to build a data processing system based on Lambda. This chapter provides a good overview of what services you're going to need, and how they interact, as well as what Lambda provides.
The next few chapters look at each of the services in order, starting with Azure storage accounts. This begins with how to create a storage account, then discusses the Storage account services -blob storage, queues, and so on. Azure Data Lake Storage is the next service to be examined in detail, and like the other chapters this starts with creation, in this case how to create an Azure Data Lake store. Nuckolls then looks at Data Lake store access, folder structure and data drift, finishing with a look at the copy tools for Data Lake stores.
Chapter 5 is a meaty chapter on message handling with Event Hubs. as the name suggests, these are used to ingest and serve event messages, time-based series of event data from apps. Nuckolls looks in detail at how event hubs work, how to create a namespace and an event hub, partitioning, configuring capture, and securing access to event hubs.
Real-time queries with Azure Stream Analytics is the next topic. Stream Analytics is used to read data sources, execute operations on the data, and output the results to data synics. The chapter shows how to create a service, then create and run jobs on the service using the Azure portal and with PowerShell. The queries are based on SQL, and Nuckolls looks at creating a job query, writing job queries, and managing their performance.
Batch queries with Azure Data Lake Analytics are covered in the next few chapters. This starts with a look at U-SQL and how it is a blend of SQL and C#. The role of extractors to read files and outputters to write rowsets in Data Lake Storage are well explained, as are expressions for transforming rowsets. The chapter also covers schema extraction and aggregation before moving back up a level to describe how to create a Data Lake Analytics service. I think you'd need to read more detailed material if you're not already familiar with SQL and C# - the descriptions are fine as an overview and introduction, but essentially you've got two languages and how they're used in combination and that's a big topic.
Chapter 8 goes into U-SQL in more detail with regards to how to use it for complex analytics, with good sections on Windows functions and local C# functions. This is followed by a look at how to integrate with Data Lake Analytics, specifically processing unstructured data, connecting to remote sources, and working with different file types.
Azure Data Factory, which manages task execution, is the topic for the next chapter, including how to create the service, authenticate securely, and copy files with ADF.
The final two chapters go back to SQL, starting with managed SQL with Azure SQL Database. This chapter covers creating a database, securing it, and ensuring availability and recovery, along with cost optimization. The following chapter looks at integrating Data Factory with Azure SQL Database, mainly how to import data into it.
The book ends with a chapter on where to go next, looking at Data catalog, version control and backups.
This is a useful book, with clear descriptions of how to set up and use the many services that Azure provides. It gives enough information to get you started, so your services work and talk to each other. You won't be an expert in U-SQL or Azure Analytics once you've read the book, but you will have a working system that you can then fine-tune.
|Last Updated ( Thursday, 03 February 2022 )|