Overview of HBase Architecture and its Components

Understand working of Apache HBase Architecture and different components involved in the high level functioning of the column oriented NoSQL database.

Get access to all Big Data Projects View all Big Data Projects

Overview of HBase Architecture and its Components

Last Updated: 11 Apr 2024 | BY ProjectPro

Pinterest runs 38 different HBase clusters with some of them doing up to 5 million operations every second. Goibibo uses HBase for customer profiling. Facebook Messenger uses HBase architecture and many other companies like Flurry, Adobe Explorys use HBase in production. In spite of a few rough edges, HBase has become a shining sensation within the white hot Hadoop market. The NOSQL column oriented database has experienced incredible popularity in the last few years. You might have come across several resources that explain HBase architecture and guide you through HBase installation process. However, this blog post focuses on the need for HBase, which data structure is used in HBase, data model and the high level functioning of the components in the apache HBase architecture.

Streaming Data Pipeline using Spark, HBase and Phoenix

Downloadable solution code | Explanatory videos | Tech Support

Start Project

Table of Contents

Need for HBase
HBase –Understanding the Basics
HBase Architecture Explained
Components of Apache HBase Architecture

HMaster
Region Server
Zookeeper

Need for HBase

Apache Hadoop has gained popularity in the big data space for storing, managing and processing big data as it can handle high volume of multi-structured data. However, Hadoop cannot handle high velocity of random writes and reads and also cannot change a file without completely rewriting it. HBase is a NoSQL, column oriented database built on top of hadoop to overcome the drawbacks of HDFS as it allows fast random writes and reads in an optimized way. Also, with exponentially growing data, relational databases cannot handle the variety of data to render better performance. HBase provides scalability and partitioning for efficient storage and retrieval.

“Anybody who wants to keep data within an HDFS environment and wants to do anything other than brute-force reading of the entire file system [with MapReduce] needs to look at HBase. If you need random access, you have to have HBase."- said Gartner analyst Merv Adrian.

HBase –Understanding the Basics

HBase is a data model similar to Google’s big table that is designed to provide random access to high volume of structured or unstructured data. HBase is an important component of the Hadoop ecosystem that leverages the fault tolerance feature of HDFS. HBase provides real-time read or write access to data in HDFS. HBase can be referred to as a data store instead of a database as it misses out on some important features of traditional RDBMs like typed columns, triggers, advanced query languages and secondary indexes.

Get FREE Access to Data Analytics Example Codes for Data Cleaning, Data Munging, and Data Visualization

HBase Data Model

HBase data model stores semi-structured data having different data types, varying column size and field size. The layout of HBase data model eases data partitioning and distribution across the cluster. HBase data model consists of several logical components- row key, column family, table name, timestamp, etc. Row Key is used to uniquely identify the rows in HBase tables. Column families in HBase are static whereas the columns, by themselves, are dynamic.

HBase Tables – Logical collection of rows stored in individual partitions known as Regions.
HBase Row – Instance of data in a table.
RowKey -Every entry in an HBase table is identified and indexed by a RowKey.
Columns - For every RowKey an unlimited number of attributes can be stored.
Column Family – Data in rows is grouped together as column families and all columns are stored together in a low level storage file known as HFile.

Here's what valued users are saying about ProjectPro

ProjectPro is a unique platform and helps many people in the industry to solve real-life problems with a step-by-step walkthrough of projects. A platform with some fantastic resources to gain hands-on experience and prepare for job interviews. I would highly recommend this platform to anyone...

Anand Kumpatla

Sr Data Scientist @ Doubleslash Software Solutions Pvt Ltd

As a student looking to break into the field of data engineering and data science, one can get really confused as to which path to take. Very few ways to do it are Google, YouTube, etc. I was one of them too, and that's when I came across ProjectPro while watching one of the SQL videos on the...

Savvy Sahai

Data Science Intern, Capgemini

Not sure what you are looking for?

View All Projects

HBase Use Cases- When to use HBase

HBase is an ideal platform with ACID compliance properties making it a perfect choice for high-scale, real-time applications. It does not require a fixed schema, so developers have the provision to add new data as and when required without having to conform to a predefined model.

It provides users with database like access to Hadoop-scale storage, so developers can perform read or write on subset of data efficiently, without having to scan through the complete dataset. HBase is the best choice as a NoSQL database, when your application already has a hadoop cluster running with huge amount of data. HBase helps perform fast read/writes.

New Projects

HBase Architecture Explained

HBase provides low-latency random reads and writes on top of HDFS. In HBase, tables are dynamically distributed by the system whenever they become too large to handle (Auto Sharding). The simplest and foundational unit of horizontal scalability in HBase is a Region. A continuous, sorted set of rows that are stored together is referred to as a region (subset of table data). HBase architecture has a single HBase master node (HMaster) and several slaves i.e. region servers. Each region server (slave) serves a set of regions, and a region can be served only by a single region server. Whenever a client sends a write request, HMaster receives the request and forwards it to the corresponding region server.

HBase Architecture, Data Flow, and Use cases

Image Credit : Cloudera

HBase can be run in a multiple master setup, wherein there is only single active master at a time. HBase tables are partitioned into multiple regions with every region storing multiple table’s rows.

Get More Practice, More Big Data and Analytics Projects, and More guidance.Fast-Track Your Career Transition with ProjectPro

Components of Apache HBase Architecture

HBase architecture has 3 important components- HMaster, Region Server and ZooKeeper.

HMaster

HBase HMaster is a lightweight process that assigns regions to region servers in the Hadoop cluster for load balancing. Responsibilities of HMaster –

Manages and Monitors the Hadoop Cluster
Performs Administration (Interface for creating, updating and deleting tables.)
Controlling the failover
DDL operations are handled by the HMaster
Whenever a client wants to change the schema and change any of the metadata operations, HMaster is responsible for all these operations.

Get confident to build end-to-end projects

Access to a curated library of 250+ end-to-end industry projects with solution code, videos and tech support.

Request a demo

Region Server

These are the worker nodes which handle read, write, update, and delete requests from clients. Region Server process, runs on every node in the hadoop cluster. Region Server runs on HDFS DataNode and consists of the following components –

Block Cache – This is the read cache. Most frequently read data is stored in the read cache and whenever the block cache is full, recently used data is evicted.
MemStore- This is the write cache and stores new data that is not yet written to the disk. Every column family in a region has a MemStore.
Write Ahead Log (WAL) is a file that stores new data that is not persisted to permanent storage.
HFile is the actual storage file that stores the rows as sorted key values on a disk.

Zookeeper

HBase uses ZooKeeper as a distributed coordination service for region assignments and to recover any region server crashes by loading them onto other region servers that are functioning. ZooKeeper is a centralized monitoring server that maintains configuration information and provides distributed synchronization. Whenever a client wants to communicate with regions, they have to approach Zookeeper first. HMaster and Region servers are registered with ZooKeeper service, client needs to access ZooKeeper quorum in order to connect with region servers and HMaster. In case of node failure within an HBase cluster, ZKquoram will trigger error messages and start repairing failed nodes.

Build an Awesome Job Winning Project Portfolio with Solved End-to-End Big Data Projects

ZooKeeper service keeps track of all the region servers that are there in an HBase cluster- tracking information about how many region servers are there and which region servers are holding which DataNode. HMaster contacts ZooKeeper to get the details of region servers. Various services that Zookeeper provides include –

Establishing client communication with region servers.
Tracking server failure and network partitions.
Maintain Configuration Information
Provides ephemeral nodes, which represent different region servers.

Understanding the fundamental of HBase architecture is easy but running HBase on top of HDFS in production is challenging when it comes to monitoring compactions, row key designs manual splitting, etc. If you would like to learn how to design a proper schema, derive query patterns and achieve high throughput with low latency then enrol now for comprehensive hands-on Hadoop Training.

ProjectPro

ProjectPro is the only online platform designed to help professionals gain practical, hands-on experience in big data, data engineering, data science, and machine learning related technologies. Having over 270+ reusable project templates in data science and big data with step-by-step walkthroughs,

Meet The Author

Overview of HBase Architecture and its Components

Need for HBase

HBase –Understanding the Basics

HBase Data Model

Here's what valued users are saying about ProjectPro

HBase Use Cases- When to use HBase

HBase Architecture Explained

Components of Apache HBase Architecture

HMaster

Region Server

Zookeeper

About the Author