Cloudera Extends Apache HBase To Use Amazon S3
Written by Kay Ewbank   
Friday, 04 October 2019

Cloudera has updated Cloudera Data Platform to provide a way for Apache HBase deployments to use Amazon Simple Storage Service (S3) as its main persistence layer for saving table data.

The advantage this offers is that Amazon S3 uses a pay-per-use payment method with no server-side component to run or manage for S3. Cloudera Data Platform (CDP) is described as combining the best of Hortonworks' and Cloudera's technologies to create an enterprise data cloud that includes cloud-native services for data warehousing, machine learning, streaming ingest, and operational data stores.

cloudera

Apache HBase is Hadoop's open-source, distributed, versioned, non-relational database, modeled after Google's BigTable, which offers random, realtime read/write access to big data. Apache's goal for this project is for it to host very large tables -- billions of rows X millions of columns -- on top clusters of commodity hardware.

Amazon Simple Storage Service (S3) is designed to offer secure, durable, highly scalable object storage at a low cost.

hbases3

Until now, it's not been possible to use S3 directly from HBase because HBase requires a consistent and atomic file system, whereas S3 provides an eventually consistent object store. This means that HBase has been limited to using HDFS rather than being able to natively use S3. Cloudera has now created a solution that is being offered via CDP. When you launch an Operational Database (HBase) cluster on CDP, HBase StoreFiles (the backing files for HBase tables) are stored in S3 and HBase write-ahead-logs (WAL) are stored in an HDFS instance run alongside HBase per usual.

Under the covers, this relies on using the Hadoop S3A filesystem adapter which accesses data in S3 via the standard FileSystem APIs. Hadoop's S3Guard is also used for directory listing and object status for the S3A adapter so that HBase sees when new StoreFiles are added to an HBase table.

The new element is the HBase Object Store Semantics (HBOSS), a new software project that has been added to the Apache HBase project to handle the gap between S3Guard and HBase. HBOSS is a facade on top of the S3A adapter and S3Guard which uses a distributed lock to ensure that HBase operations can atomically manipulate its files on S3.

 cloudera

More Information

Trial Installation Of HBase Running On S3 In CDP

Cloudera Data Platform

Related Articles

HBase 1.4 With New Shaded Client

Exploring Storage Options on AWS

AWS Storage Gateway 

Amazon Glacier For Cold Storage

Amazon Updates Data Offerings

HBase Adds MultiWAL Support 

Apache Spark 2.0 Released

First Hybrid Open-Source RDBMS Powered By Hadoop and Spark

HBase 1.0 Released   

To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.

Banner


Swimm Releases Copilot Extension For Documentation
03/12/2024

Swimm, best known for its AI-driven software documentation tools, has announced an extension for GitHub Copilot. The Swimm team says the extension means developers using GitHub Copilot Chat can turn C [ ... ]



pg_parquet - Postgres To Parquet Interoperability
28/11/2024

pg_parquet is a new extension by Crunchy Data that allows a PostgreSQL instance to work with Parquet files. With pg_duckdb, pg_analytics and pg_mooncake all of which can access Parquet files, is  [ ... ]


More News

espbook

 

Comments




or email your comment to: comments@i-programmer.info

Last Updated ( Friday, 04 October 2019 )