Apache Drill Reaches 0.6
Written by Kay Ewbank   
Wednesday, 19 November 2014

The developers of Apache Drill, the open source software that you can use to write SQL queries on data stored in Hadoop, have released version 0.6.

 

 

The update follows the release in September of version 0.5, Drill’s first beta release. The Drill community is making rapid progress with monthly releases, and this latest release adds the ability to run SQL queries directly on MongoDB. Until now, the data sources were file system, HBase, and Hive.

There are, of course, an increasing number of SQL-on-Hadoop products being developed by companies such as Cloudera and Hortonworks, but Drill lets you analyze Hadoop data without any ETL or having to create schema definitions prior to beginning analysis. Instead, Drill generates the schemas for you on the fly, keeping files in their original formats rather than having to convert them to meet analysis requirements.

When the idea for Drill was first announced in 2012, one of the advantages was the fact that right from the start it was designed to support a nested data model with data encoded in a number of formats such as JSON, Avro or Protocol Buffers, and the types of data ranged from simple types such as string, integer, dates to more complex multi-structured data such as nested maps and arrays.

This all means you can start working with queries of the data very rapidly. MapR, the Hadoop vendor that is closely involved with the Apache Drill project, has integrated the beta version of Drill into its big data platform, choosing Drill because it offers more features than the rival products. There’s an interesting writeup of using Drill on the MapR blog, in which Neeraja Rentachintala discusses How to Turn Raw Data from Yelp into Insights in Minutes with Apache Drill.

The new release uses the Hadoop 2.4.1 APIs, which upgrade Parquet to use direct memory and add the ability to write larger Parquet files when using CREATE TABLE AS. Another improvement is better JOIN planning when using HBase tables based on row count approximations using region level statistics. JSON handling has also been improved, with support for JSON projection pushdown, an all text JSON mode and boolean short circuit. The handling of SELECT * when interacting with schema-less data sources is another area that has been improved.

The latest beta is available for download on the Apache incubator webpage. 

apachedrill

More Information

Drill Overview

Download Apache Drill

How to Turn Raw Data from Yelp into Insights in Minutes with Apache Drill

Drill on GitHub

Related Articles

Perform Data Queries Faster With Drill

 

 

To be informed about new articles on I Programmer, install the I Programmer Toolbar, subscribe to the RSS feed, follow us on, Twitter, FacebookGoogle+ or Linkedin,  or sign up for our weekly newsletter.

 

Banner


Fermyon's Spin WebAssembly Version 3.0 Released
26/11/2024

The open source developer tool for building, distributing, and running serverless WebAssembly applications reaches version 3.0. What's new?



C23 ISO Standard Is Here But You Probably Won't Read It
06/11/2024

At last ISO C23 has been published, but at $250 you probably aren't going to read it. Can we really tolerate this sort of profiteering on the work of others? This is worse than academic publishing!


More News

 

espbook

 

Comments




or email your comment to: comments@i-programmer.info

 

Last Updated ( Wednesday, 19 November 2014 )