What is HBase? – Hadoop HBase Introduction

What is HBase used for?
HBase is a Hadoop’s distributed, scalable, NoSQL database for big data which is run on HDFS – The distributed file system and primary storage layer for Hadoop.

The HBase Physical Architecture consists of servers in a Master-Slave relationship as shown below. Typically, the HBase cluster has one Master node, called HMaster and multiple Region Servers called HRegion Server. Each Region Server contains multiple Regions – HRegions. Just like in a Relational Database, data in HBase is stored in Tables and these Tables are stored in Regions.

HBase Architecture Explanation
When a Table becomes too big, the Table is partitioned into multiple Regions. These Regions are assigned to Region Servers across the cluster. Each Region Server hosts roughly the same number of Regions.

HBase depends on HDFS for data storage. Region Servers collocate with the HDFS Data Nodes. This enables data locality for the data served by the Region Servers, at least in the common case. Region assignment, DDL operations, and other book-keeping facilities are handled by the HBase Master process. It uses Zookeeper to maintain live cluster state. When accessing data, clients communicate with HBase Region Servers directly. That way, Zookeeper and the Master process don’t bottle-neck data throughput. No persistent state lives in Zookeeper or the Master. HBase is designed to recover from complete failure entirely from data persisted durably to HDFS.
How the rows are scalable in HBase or What is HBase Compaction
Continuous sequences of rows are divided into “Regions”. These Regions are then assigned to the worker machines in the cluster, conveniently called “Region Servers”. Assignment and distribution of Regions to Region Servers is automatic and largely hands-off for the operator. When data is inserted into a Region and the Region’s size reaches a threshold, the Region is split into two child Regions. The split happens along a row key boundary; rows are never divided and a single Region always hosts an entire row. This is another important semantic HBase provides for its users.

Here is the HBase introduction Video which covers No SQL database concepts and deep dive into HBase and HBase architecture and How the HBase scalable to store the data.



Leave a Reply

Your email address will not be published. Required fields are marked *