How to balance leadership in Kafka

This recipe helps you balance leadership in Kafka

Recipe Objective: How to balance leadership in Kafka?

In Apache Kafka, every consumer in a consumer group is assigned to one or more topic partitions. When a consumer joins a group or shuts down, or is considered dead by the group coordinator, or when new partitions are added, partition ownership is reassigned among the consumers. This process of moving partition ownership from one consumer to another is called rebalancing. In this recipe, we see how to balance leadership in Kafka.

Kafka Interview Questions to Help you Prepare for your Big Data Job Interview

Prerequisites:

Before proceeding with the recipe, make sure Kafka cluster and Zookeeper are set up in your local EC2 instance. In case not done, follow the below link for the installations.

Steps to verify the installation:

To verify the zookeeper installation, follow the steps listed below.

  • You need to get inside the Kafka directory. Go to the Kafka directory using the cd kafka_2.12-2.3.0/ command and then start the Zookeeper server using the bin/zookeeper-server-start.sh config/zookeeper.properties command. You should get the following output.

bigdata_1

Verifying Kafka installation:

Before going through this step, please ensure that the Zookeeper server is running. To verify the Kafka installation, follow the steps listed below:

  • Leave the previous terminal window as it is and log in to your EC2 instance using another terminal.
  • Go to the Kafka directory using the cd downloads/kafka_2.12-2.3.0 command.
  • Start the Kafka server using the bin/kafka-server-start.sh config/server.properties command.
  • You should get an output that displays a message something like “INFO [KafkaServer id=0] started (kafka.server.KafkaServer).”

Balancing leadership in Kafka:

Kafka consumers can subscribe to multiple topics and start receiving messages. Rebalance is a short window of unavailability to the entire consumer group when consumers cannot consume messages. To balance the leadership, it has two criteria for a broker to be preferred as a leader. One: It should be an in-sync replica, and two: it has to be the first element on the replicas list. You can run the code:

bin/kafka-topics.sh --describe --zookeeper rhost:2181

This displays the leadership details of the Kafka broker. These Kafka brokers have a property that can be set in the server.properties file, which enables us to auto-rebalance the leadership. Set auto.leader.rebalance.enable=true to brokers and restart Kafka. This way, we can do the balancing of leadership in Kafka.

Download Materials

What Users are saying..

profile image

Gautam Vermani

Data Consultant at Confidential
linkedin profile url

Having worked in the field of Data Science, I wanted to explore how I can implement projects in other domains, So I thought of connecting with ProjectPro. A project that helped me absorb this topic... Read More

Relevant Projects

A Hands-On Approach to Learn Apache Spark using Scala
Get Started with Apache Spark using Scala for Big Data Analysis

GCP Data Ingestion with SQL using Google Cloud Dataflow
In this GCP Project, you will learn to build a data processing pipeline With Apache Beam, Dataflow & BigQuery on GCP using Yelp Dataset.

Streaming Data Pipeline using Spark, HBase and Phoenix
Build a Real-Time Streaming Data Pipeline for an application that monitors oil wells using Apache Spark, HBase and Apache Phoenix .

Build a Real-Time Dashboard with Spark, Grafana, and InfluxDB
Use Spark , Grafana, and InfluxDB to build a real-time e-commerce users analytics dashboard by consuming different events such as user clicks, orders, demographics

GCP Project to Learn using BigQuery for Exploring Data
Learn using GCP BigQuery for exploring and preparing data for analysis and transformation of your datasets.

Orchestrate Redshift ETL using AWS Glue and Step Functions
ETL Orchestration on AWS - Use AWS Glue and Step Functions to fetch source data and glean faster analytical insights on Amazon Redshift Cluster

COVID-19 Data Analysis Project using Python and AWS Stack
COVID-19 Data Analysis Project using Python and AWS to build an automated data pipeline that processes COVID-19 data from Johns Hopkins University and generates interactive dashboards to provide insights into the pandemic for public health officials, researchers, and the general public.

Project-Driven Approach to PySpark Partitioning Best Practices
In this Big Data Project, you will learn to implement PySpark Partitioning Best Practices.

Movielens Dataset Analysis on Azure
Build a movie recommender system on Azure using Spark SQL to analyse the movielens dataset . Deploy Azure data factory, data pipelines and visualise the analysis.

SQL Project for Data Analysis using Oracle Database-Part 3
In this SQL Project for Data Analysis, you will learn to efficiently write sub-queries and analyse data using various SQL functions and operators.