Introducing SQL Server 2019 (Packt)
Article Index
Introducing SQL Server 2019 (Packt)
Chapters 4 - 9
Chapters 10 - 14, Conclusion

 IntroMSSQLS2019Packt

Chapter 4. Hybrid Features – SQL Server and Microsoft Azure

This chapter discusses how Azure features can be incorporated into on-premise SQL Servers. This can be viewed as an easy and effective way to start to use Azure. Features examined include: 

  • Backup databases to a URL – improving HA/DR

  • Azure storage – provides scalable and flexible storage

  • Hosting SQL Server data files in Azure

  • Hosting DR replica in Azure (i.e. secondary data center)

  • Transactional replication between on-premise SQL Server and Azure – useful for migration to Azure 

In each case, prerequisites and setup instructions are discussed. Examples are provided and the benefits described. 

I’m not sure we need the complete output from the restore to be shown (i.e. 5 percent processed etc). There is a small error in figure 4.13, where MDF is given twice (it should be MDF and NDF). 

Overall, an interesting and useful chapter for those wanting to take advantage of Azure’s functionality, as well as those taking their first steps into the world of Azure.

Chapter 5. SQL Server 2019 on Linux

Linux is the most widely used OS on enterprise servers, and now there are more virtual machines (VMs) running Linux than Windows on the Azure cloud. Recognizing this shift early, Microsoft introduced SQL Server for Linux in 2017, and this has been enhanced in 2019 (e.g. Machine Learning Services added).

The chapter opens with a look at why you might want to move to SQL Server on Linux (e.g. most popular OS on servers). Then, there’s a step-by-step walkthrough of the installation and configuration of the RedHat version of Linux.

Next, there’s a brief look at various Linux features that have been added or improved, in SQL Server 2019 including: 

  • an overview of Kubernetes (an orchestrator) and Docker (a container) – with the aim of high availability or scalability, and ease of distribution/maintenance

  • Change Data Capture – details what data has changed

  • Distributed Transaction Coordinator on Linux

  • Replication 

In each case, only a brief overview of the feature is given, and in some cases some prior knowledge is assumed (e.g. containers). 

The chapter next moves on to some development tools, including Azure Data Studio, SQLCMD, and MSSQL-CLI. There’s an interesting discussion on the use of PowerShell or Bash for scripting.

The chapter ends with a brief look at users, groups, root, and super user, from a Linux perspective. 

Both containers and kubernetes are outlined here, but they are discussed in detail in chapter 6, no cross reference to that chapter is made, and perhaps chapter 6 should have been given before this current chapter. Better editing would have helped here.

This feels like a chapter that falls between two stools. If you know Linux already, the chapter feels weak/basic, if you don’t know Linux then the chapter doesn’t feel like a good introduction because it assumes you know some things already. 

Chapter 6. SQL Server 2019 in Containers and Kubernetes

The chapter opens with a look at the advantages of VMs, which allow a large physical server to be split into smaller virtual machines. While VMs abstract the hardware, containers provide an additional layer of abstraction, abstracting the OS, allowing you to concentrate on your application – and providing much smaller deployments. Containers allow changes to be implemented (and rolled back!) quickly – by swapping the container. All this fits in very nicely with DevOps and continuous integration and deployment (CI/CD).

The chapter next digs deeper into deploying a SQL Server container using Docker, providing a step-by-step example for you to follow along with. A limited degree of customizing SQL Server containers is possible (e.g. change collation or language).

This chapter provides a quick look at containers and why they are useful, the example deployment of a SQL Server container using Docker should prove helpful.  

Chapter 7. Data Virtualization

Data Virtualization, big words for a little concept. It just means accessing remote data (similar to a linked server or ODBC). Instead of collecting distributed data into a centralized system, data virtualization provides a different method ‘unifying’ data. 

Some problems of the centralized system approach are outlined (e.g. data duplication), before moving on to how data virtualization can address these concerns. The chapter briefly describes 4 use cases relating to modern systems (e.g. data lakes, federated systems) – all very useful, but so very brief. 

Having laid the background, the core of this chapter looks at what SQL Server offers for data virtualization, which is namely PolyBase. Its external data sources are used to link to remote data, and an example of this is provided. A wide variety of data sources are supported, aided with the ubiquitous ODBC external data source. External file formats apply a schema to these external data sources. In essence, you register the external data sources, and run your queries against them, logically quite similar to linked servers.

There’s an interesting section that compares linked servers with PolyBase, with the recommendation to use PolyBase for any new development.

There’s a helpful section on installing the optional SQL Server PolyBase component, on Windows, Linux, and Docker. Helpful pre and post installation guidance is provided. The chapter ends with some useful PolyBase tips. 

This chapter provides a useful look at how PolyBase can be used to run queries over external data sources. Like many chapters, this one is largely self-contained, there’s a useful, if rare, cross reference to another chapter (Big Data Clusters) – so I would expect that related chapter to appear next, but it doesn’t. Some knowledge of Big Data is assumed (e.g. parquet file format)

There are some useful PolyBase tips, but I fear you need an understanding of networks to get the most from them – and that’s one of the concerns of this chapter and the book as a whole, who is it aimed at? 

Chapter 8. Machine Learning Services Extensibility Framework

Machine Learning is a topic much in vogue. I wonder if its old name of “applied statistics” would garner as much interest. It essentially involves looking for patterns and determining the probability of some result.

The chapter opens with a quick overview of what Machine Learning (ML) is and how it works (i.e. using statistical techniques to find patterns in data). Both supervised (where you train with known results) and unsupervised ML algorithms are discussed.

Various ML languages and tools are briefly examined, before looking at how to configure and use R, Python, and Java for ML in SQL Server 2019. Example usage is provided for each language. Various methods for running the code is examined (e.g. ML Services, PREDICT T-SQL). There’s a useful section on performance monitoring together with links for useful downloads (e.g. reports for ML Services).

The chapter ends with a brief look at the 5 phases of the Team Data Science Process (an open source framework for data science projects), a useful framework to implement your projects.  

The chapter provides a useful overview of ML and how it can be used with SQL Server 2019. There’s a helpful overview of the languages and tools, together with some example code. There’s a useful link to the different questions that the different ML algorithms can answer (a useful starting point!).

While the chapter makes a valiant effort to explain the topic, I suspect the reader needs a wider background in ML and Big Data to appreciate it fully. I wonder how many readers, without a suitable background, would understand this statement:

“…in data science, you can't use a k-means algorithm on a prediction that requires linear regression…”

Chapter 9. SQL Server 2019 Big Data Clusters

This chapter opens with a few words about Big Data, growing volumes of data that require a non-relational database approach to processing. Various scale-out architectures are briefly discussed (e.g. PolyBase) before looking at how Big Data Clusters (BDC) brings together the topics discussed. BDC provides the co-ordination and control of processes (e.g. Spark and HDFS).

The chapter continues with an outline walkthrough of BDC installation and configuration, the detail associated with the steps are given as a series of web links. Next, there’s a look at some tools for programming BDC, including Azure Data Studio and IntelliJ, which are used to create various example components (e.g. submitting a Spark job from IntelliJ). Again, the detail is typically provided with website links.

The chapter moves on to look at management (via Kubernetes and azdata commands) and monitoring (via Grafana and Kibana). The chapter ends with a look at security, a special concern since we’re going across a wide range of components (e.g. Spark, HDFS) – only a general view of security is provided here. 

This chapter provides a look at what BDC is, its components, architecture and tools.  

Again, the readers of this chapter will need an adequate background in Big Data (e.g. Spark, HDFS, Parquet) to understand it fully. Links for more detailed information are provided, which may not be optimal. 

Banner



Last Updated ( Wednesday, 14 July 2021 )