Amazon’s cloud-based petabyte-scale data warehouse service is now available.
When Amazon Redshift was announced in November it was restricted to a limited preview. The full version, which is still in beta, is now available, at costs of less than $1,000 per terabyte per year, which Amazon says is a tenth of the cost of most traditional data warehousing solutions.
Redshift data can be analyzed using ‘normal’ SQL-based tools and business intelligence applications, and the clusters can be set up using a few clicks in the AWS Management Console. Queries can be distributed and parallelized across multiple nodes, and Amazon has automated most of the common administrative tasks associated with provisioning, configuring, monitoring, backing up, and securing a data warehouse to make Redshift easier to administer.
Redshift supports Amazon VPC out of the box and encrypting and backing up data are simple operations. There’s an automated snapshot feature that continuously backs up new data on the cluster to Amazon S3. Snapshots are automated, incremental, and continuous, and you can also take your own snapshots at any time.
The lower costs will be a major reason for considering Redshift. It is based on a pay as you go model, and you’re charged an hourly rate based on the node type and the number of nodes in your cluster. Amazon Redshift supports two types of data warehouse nodes, a High Storage Extra Large (XL) with 2TB of storage and a High Storage Eight Extra Large (8XL) with 16TB of storage. You can start with a single XL node, 2TB data warehouse for $0.85 per hour and prices scale linearly up. If you use reserved instance pricing, the cost drops to under $1,000 per TB per year.
This promotional video is both watchable and informative. The presenter talks very quickly so it’s short!
Rumors that Google was acquiring the data science community Kaggle were confirmed at the Google Cloud Next Conference yesterday. This confers the benefit of the ability to store and query large datase [ ... ]