Azure Big Data Announcements
Written by Kay Ewbank   
Tuesday, 05 May 2015

At Build 2015, developers learned about new options for big data: a data warehouse service; a way to run elastic databases; and a data lake where customers can store large amounts of data.

The announcements, all designed to make it easier to work with data no matter how big or complex, were made in his keynote by Scott Guthrie.

Azure Data Lake is a hyper-scale data store for big data analytic workloads. According to a post on the SQL Server Blog by Microsoft’s T.K. Ranga Rengarajan, corporate vice president for Microsoft’s data platform it is:

a single place to store every type of data in its native format with no fixed limits on account size or file size, high throughput to increase analytic performance and native integration with the Hadoop ecosystem.

Data Lake is compatible with HDFS (Hadoop Distributed File System) that is integrated with Azure HDInsight, and will be integrated with Microsoft offerings such as Revolution-R Enterprise and industry standard distributions like Hortonworks and Cloudera.

AzureDataLake

 

 

Data Lake will be geo-distributed, will be aware of the location of your data, and can handle individual files at petabyte scale. The preview for Azure Data Lake will be available later this year. More information is provided in this video:

 

The preview of Azure SQL Database elastic databases service is already available in preview. It will let you build SaaS applications to manage large numbers of databases that have ‘unpredictable resource demands’. You’ll be able to pool resources across databases rather than overprovisioning to accommodate peak demand. Resources can be shared across hundreds of databases, and Microsoft is releasing tools to help query and aggregate results across these databases as well as implement policies and perform transactions across the database pool. This video provides and introductory demo: 

 

 

Other announcements for Azure SQL Database include a new security option in the form of transparent data encryption. There’s already a preview of row-level security and dynamic data masking, and this new option adds to it. Another new preview is that of full-text search.

The third major announcement is Azure SQL Data Warehouse. Microsoft is describing this as an elastic data warehouse in the cloud, that can dynamically grow, shrink and pause compute independent of storage, so you can increase capability to improve query performance when you need it.

 

datawarehouse

 

The interesting thing about Azure Data Warehouse is the way Microsoft plans to charge for it. Customers using rival products such as Amazon Redshift have to increase both the amount of computing power and the amount of storage at the same time. Azure Data Warehouse will let customers increase one without the other. You’ll also be able to pause the computing you’re using when you don’t need it, then resume its use on-demand when you do, paying only for what you use as opposed to paying all the time for all the virtual machines that make up the nodes in your cluster.

This decoupling is achieved by the use of Azure Storage Blobs rather than using local drives on the virtual machines. While this might be less expensive as you’re charged on a usage basis, it could be much slower than using the local drives.

Azure SQL Data Warehouse is based on the massively parallel processing architecture currently available in both SQL Server and the Analytics Platform System appliance. This means it includes Microsoft’s PolyBase technology, which is a multi-data source query engine that lets you query your big data, regardless of whether it is stored in an on-premises Hadoop/HDFS cluster, Azure storage, Parallel Data Warehouse, and other relational DBMS system. Microsoft says SQL Data Warehouse will work with existing data tools including Power BI for data visualization, Azure Machine Learning, Azure Data Factory, and Azure HDInsight, Microsoft’s Apache Hadoop managed big data service. The preview for Azure SQL Data Warehouse will be available later this year.


 msazuresq

Banner


DuckDB + Webassembly = WhatTheDuck
02/01/2025

Run DuckDB inside your browser thanks to Webassembly. When is that useful?



Gleam 1.7 Improves Performance
09/01/2025

Gleam 1.7 has been released with faster record updates and more secure package manager credential handling. Gleam is a statistically typed-language the compiles to Erlang or JavaScript.


More News

 

espbook

 

Comments




or email your comment to: comments@i-programmer.info

Last Updated ( Tuesday, 05 May 2015 )