How to work with analytic functions in Hive

This recipe helps you work with analytic functions in Hive
Last Updated: 25 Aug 2022

Get access to Big Data projects View all Big Data projects

APACHE HADOOP PROJECTS DATA CLEANING PYTHON DATA MUNGING MACHINE LEARNING RECIPES PANDAS CHEATSHEET ALL TAGS

Recipe Objective: How to work with analytic functions in Hive?

In this recipe, we will work with analytic functions in Hive - CUME_DIST, DENSE_RANK, NTILE, PERCENT_RANK, RANK, ROW_NUMBER.

Access Source Code for Airline Dataset Analysis using Hadoop

Recipe Objective: How to work with analytic functions in Hive?

Prerequisites:

Before proceeding with the recipe, make sure Single node Hadoop and Hive are installed on your local EC2 instance. If not already installed, follow the below link to do the same.

Steps to set up an environment:

In the AWS, create an EC2 instance and log in to Cloudera Manager with your public IP mentioned in the EC2 instance. Login to putty/terminal and check if HDFS and Hive are installed. If not installed, please find the links provided above for installations.
Type "&ltyour public IP&gt:7180" in the web browser and log in to Cloudera Manager, where you can check if Hadoop is installed.
If they are not visible in the Cloudera cluster, you may add them by clicking on the "Add Services" in the cluster to add the required services in your local instance.

Working with analytics functions in Hive:

Throughout this recipe, we used the "employee" table present in our database. Following is the schema of the table.

bigdata_1

CUME_DIST: Below is the sample output for the query that uses the CUME_DIST() window function to calculate each employee's cumulative distribution of salary.

bigdata_2

DENSE_RANK: Below is the sample output for the query that uses the DENSE_RANK() window function to calculate each employee's cumulative distribution of salary.

bigdata_3

NTILE: Below is the sample output that uses the NTILE window function to divide the employees' salaries into three groups and list the salary in ascending order.

bigdata_4

PERCENT_RANK: Below is the sample output that uses the PERCENT_RANK window function to calculate the percentile rank for employees' salaries.

bigdata_5

RANK: The following query uses the RANK() window function to rank the salary of all the employees.

bigdata_6

ROW_NUMBER: The following query uses the ROW_NUMBER() window function to number the salaries for all the employees.

bigdata_7

Download Materials

bigdata_1

bigdata_2

bigdata_3

bigdata_4

bigdata_5

bigdata_6

bigdata_7

What Users are saying..

Anand Kumpatla

Sr Data Scientist @ Doubleslash Software Solutions Pvt Ltd

ProjectPro is a unique platform and helps many people in the industry to solve real-life problems with a step-by-step walkthrough of projects. A platform with some fantastic resources to gain... Read More

Relevant Projects

Machine Learning Projects

Data Science Projects

Python Projects for Data Science

Data Science Projects in R

Machine Learning Projects for Beginners

Deep Learning Projects

Neural Network Projects

Tensorflow Projects

NLP Projects

Kaggle Projects

IoT Projects

Big Data Projects

Hadoop Real-Time Projects Examples

Spark Projects

Data Analytics Projects for Students

Relevant Projects

Retail Analytics Project Example using Sqoop, HDFS, and Hive

This Project gives a detailed explanation of How Data Analytics can be used in the Retail Industry, using technologies like Sqoop, HDFS, and Hive.

View Project Details

Build a Real-Time Spark Streaming Pipeline on AWS using Scala

In this Spark Streaming project, you will build a real-time spark streaming pipeline on AWS using Scala and Python.

View Project Details

AWS Project - Build an ETL Data Pipeline on AWS EMR Cluster

Build a fully working scalable, reliable and secure AWS EMR complex data pipeline from scratch that provides support for all data stages from data collection to data analysis and visualization.

View Project Details

GCP Project to Explore Cloud Functions using Python Part 1

In this project we will explore the Cloud Services of GCP such as Cloud Storage, Cloud Engine and PubSub

View Project Details

Learn Real-Time Data Ingestion with Azure Purview

In this Microsoft Azure project, you will learn data ingestion and preparation for Azure Purview.

View Project Details

COVID-19 Data Analysis Project using Python and AWS Stack

COVID-19 Data Analysis Project using Python and AWS to build an automated data pipeline that processes COVID-19 data from Johns Hopkins University and generates interactive dashboards to provide insights into the pandemic for public health officials, researchers, and the general public.

View Project Details

How to work with analytic functions in Hive

Recipe Objective: How to work with analytic functions in Hive?

Table of Contents

Prerequisites:

Steps to set up an environment:

Working with analytics functions in Hive:

Anand Kumpatla

Relevant Projects

You might also like

Relevant Projects