How to implement GROUPING SETS clause in Hive

This recipe helps you implement GROUPING SETS clause in Hive
Last Updated: 19 Aug 2022

Get access to Big Data projects View all Big Data projects

APACHE HADOOP PROJECTS DATA CLEANING PYTHON DATA MUNGING MACHINE LEARNING RECIPES PANDAS CHEATSHEET ALL TAGS

Recipe Objective: How to implement the GROUPING SETS clause in Hive?

Hive supports the GROUPING SETS clause in GROUP BY that specifies more than one GROUP BY option in the same record set. In this recipe, we look at the implementation of this GROUPING SETS clause in Hive.

Recipe Objective: How to implement the GROUPING SETS clause in Hive?
- Prerequisites:
- Steps to set up an environment:
Implementing GROUPING SETS clause in Hive:

Prerequisites:

Before proceeding with the recipe, make sure Single node Hadoop and Hive are installed on your local EC2 instance. If not already installed, follow the below link to do the same.

Steps to set up an environment:

In the AWS, create an EC2 instance and log in to Cloudera Manager with your public IP mentioned in the EC2 instance. Login to putty/terminal and check if HDFS and Hive are installed. If not installed, please find the links provided above for installations.
Type "&ltyour public IP&gt:7180" in the web browser and log in to Cloudera Manager, where you can check if Hadoop is installed.
If they are not visible in the Cloudera cluster, you may add them by clicking on the "Add Services" in the cluster to add the required services in your local instance.

Implementing GROUPING SETS clause in Hive:

Throughout the recipe, we used the "user_info" table present in the "demo" database. Firstly, enter the database using the use demo; command and list all the tables in it using the show tables; command. Let us also look at the user_info table schema using the describe user_info; command.

Check Out Top SQL Projects to Have on Your Portfolio

bigdata_1

Implementing the GROUPING SETS clause over the attributes- id, profession, and reviews to fetch these details of the user grouped by the (id, profession) set. The query for the same is given below:

SELECT id, profession, sum(reviews) as total_reviews FROM user_info GROUP BY id, profession GROUPING SETS ((id,profession),id); bigdata_2

The above query calculates the number of users present in each profession and the total reviews given by all the users belonging to the same profession. Where the profession column is null, we have the total sum of reviews users' reviews across all the professions. The sample output is given below.

bigdata_3

Download Materials

bigdata_1

bigdata_2

bigdata_3

What Users are saying..

Jingwei Li

Graduate Research assistance at Stony Brook University

ProjectPro is an awesome platform that helps me learn much hands-on industrial experience with a step-by-step walkthrough of projects. There are two primary paths to learn: Data Science and Big Data.... Read More

Relevant Projects

Machine Learning Projects

Data Science Projects

Python Projects for Data Science

Data Science Projects in R

Machine Learning Projects for Beginners

Deep Learning Projects

Neural Network Projects

Tensorflow Projects

NLP Projects

Kaggle Projects

IoT Projects

Big Data Projects

Hadoop Real-Time Projects Examples

Spark Projects

Data Analytics Projects for Students

Relevant Projects

A Hands-On Approach to Learn Apache Spark using Scala

Get Started with Apache Spark using Scala for Big Data Analysis

View Project Details

Explore features of Spark SQL in practice on Spark 2.0

The goal of this spark project for students is to explore the features of Spark SQL in practice on the latest version of Spark i.e. Spark 2.0.

View Project Details

Build an ETL Pipeline for Financial Data Analytics on GCP-IaC

In this GCP Project, you will learn to build an ETL pipeline on Google Cloud Platform to maximize the efficiency of financial data analytics with GCP-IaC.

View Project Details

AWS Project-Website Monitoring using AWS Lambda and Aurora

In this AWS Project, you will learn the best practices for website monitoring using AWS services like Lambda, Aurora MySQL, Amazon Dynamo DB and Kinesis.

View Project Details

Python and MongoDB Project for Beginners with Source Code-Part 1

In this Python and MongoDB Project, you learn to do data analysis using PyMongo on MongoDB Atlas Cluster.

View Project Details

Build an ETL Pipeline with Talend for Export of Data from Cloud

In this Talend ETL Project, you will build an ETL pipeline using Talend to export employee data from the Snowflake database and investor data from the Azure database, combine them using a Loop-in mechanism, filter the data for each sales representative, and export the result as a CSV file.

View Project Details

How to implement GROUPING SETS clause in Hive

Recipe Objective: How to implement the GROUPING SETS clause in Hive?

Table of Contents

Prerequisites:

Steps to set up an environment:

Implementing GROUPING SETS clause in Hive:

Jingwei Li

Relevant Projects

You might also like

Relevant Projects