The Apache Software Foundation has officially announced the release of Hadoop 1.0. The release adds some features in the areas of security and support for the Hadoop HBase database, but the most important aspect of the release is that Hadoop is now mature enough to warrant the 1.0 marker.
Hadoop may have been making the headlines for months, if not years, in the field of big data, but it has only now reached the status of a 1.0 release.
The Apache Software Foundation has now officially announced the release of Hadoop 1.0. The release adds some features in the areas of security and support for the Hadoop HBase database, but the most important aspect of the release is that Apache Software Foundation thinks Hadoop is now mature enough to warrant the 1.0 marker.
Despite its earlier 0.x status, Hadoop, which started life at Yahoo! and has been under development for the last six years, is already in use at high profile sites including Yahoo, Facebook and LinkedIn.
The security improvement comes in the form of support for Kerberos strong authentication so you can encrypt and protect your data. Another significant feature in the 1.0 release is Webhdfs, an HTTP web interface to the Hadoop Distributed File System (HDFS). This will let you use HTTP rather than having to go via a Java or C client when you want to interact with HDFS.
The inclusion of support for HBase is interesting because it shows how Hadoop is moving more towards the type of apps that require real time web apps. Hadoop was designed to replicate Google MapReduce, the software that was used to build Google’s web index, and because of this is great for data analysis where the end results of the analysis are used by other apps - to create a web index that can then be used by a search engine, say. It wasn’t designed to provide instant responses to queries. HBase, by contrast, is a distributed database that works with HDFS and is more suited to real-time applications.
Apache Software Foundation suggests you use HBase when you need random, realtime read/write access to your Big Data, saying that the goal of HBase is to host very large tables -- billions of rows X millions of columns -- atop clusters of commodity hardware. The column-oriented store is modeled after Google's Bigtable.
Despite the increased support for HBase in Hadoop 1.0, the companies behind Hadoop are hedging their bets as to whether it will be a winner or not. For example, while Yahoo! uses HBase for some of its services, it is also working on other alternatives including MapReduce Online and S4, both of which provide an online window onto data sets that have been first mapped then reduced so the data set is small enough to be workable.
Statistics from the latest State of the Developer Nation reveal that over 40% or developers and now involved in data science or machine learning, while almost a quarter are getting into virtual or aug [ ... ]