How to create static and dynamic partitions in hive?

This recipe helps you create static and dynamic partitions in hive
Last Updated: 16 Dec 2022

Get access to Data Science projects View all Data Science projects

DATA SCIENCE PROJECTS IN PYTHON DATA CLEANING PYTHON DATA MUNGING MACHINE LEARNING RECIPES PANDAS CHEATSHEET ALL TAGS

Recipe Objective

In most of the big data scenarios , Hive is an ETL and data warehouse tool on top of the hadoop ecosystem, it is used for the processing of the different types structured and semi-structured data, it is a database. Most of the times, we need to store the data based on conditions as partitions. Partitions are used to arrange table data into partitions by splitting tables into different parts based on the values to create partitions. Partitions are mainly useful for hive query optimisation to reduce the latency in the data. In hive we have two different partitions that are static and dynamic

Build a Chatbot in Python from Scratch!

System requirements :

Install ubuntu in the virtual machine click here
Install single node hadoop machine click here
Install apache hive click here

Recipe Objective

Step1 : Prepare the dataset

Here we are using the food related comma separated values dataset to perform the csv file

Use linux command to check data as follows:

head -10 food_prices.csv

Data of Output looks as follows:

Before create a table open the hive shell , we need to create a database as follows Open the hive shell

To create database :

Create database dezyre_db; use dezyre_db;

Step 2 : Create a Hive Table and Load the data

Here we are going create a hive table as follows below hive script to create a table as follows

CREATE TABLE food_prices ( series_reference string, Period string, data_value int, status string, units string, subject string, product string, series_title string, year string )ROW FORMAT DELIMITED fields terminated by ',' TBLPROPERTIES ("skip.header.line.count"="1");

In the above hql script we are creating a hive temporary table also skipping the headers which we are going to read from the csv file .

Step 3 : Load data into hive table

load data local inpath '/home/bigdata/Downloads/food_prices.csv' into table food_prices;

Step 4 : Query and verify the data

To query the data using below query in hive shell

Select * from food_prices limit 10;

Output of the above query :

Step 5 : Create a Partition table with Partition key

Here we are going to create a partition table by specifying the "partition by" while creating the table.

Static Partition : In static partitioning we need to pass the values of the partitioned column manually when we load the data into the table. So, We need to manually create each partition before inserting data into a partition

Syntax to create a partition table :

create table partioned_food_prices ( series_reference string, data_value int, status string, units string, subject string, product string ) partitioned by (year string) ;

Loading the data by specific the partition key as follows :

hive> insert overwrite table partioned_food_prices partition(year = '2019') select series_reference, data_value, status, units, subject, product from food_prices where year='2019';

Here in the above we are loading the data specifying year as partition key and value, that particular data will load into the table.

Output of the query: it will run the mapreduce to insert the data into the partitioned table from the temporary table.

We can verify the data by run the hdfs command as below:

hdfs dfs -cat /user/hive/warehouse/partioned_food_prices/year=2019/000000_0 head -10

Output of the above command:

We can also verify the data by running select query as below:

Dynamic partition : In Dynamic partitioning, Partitions will be created dynamically based on input data to the table.

Syntax to create partition table:

create table partitioned_food_prices ( series_reference string, Period string, data_value int, status string, units string, subject string, product string, series_title string )partitioned by (year string);

Enable properties for the partition set hive.exec.dynamic.partition.mode=nonstrict;

Loading the data into partitioned table from the temp table :

hive> insert overwrite table partitioned_food_prices partition(year) select * from default.food_prices;

in the Static partition while creating the partition we were giving some static value for the partition column. In a dynamic partition, the partition will happen dynamically i.e. it will create a partition based on the value of the partition column.

Query run as below: -

Output of the above query : it will create partitioned directories based on the year.

We can also verify the data by running the below query

Step 6 : To drop or delete the static/dynamic partition column

To drop or delete the partition column by using the alter table with delete partition command

What Users are saying..

Gautam Vermani

Data Consultant at Confidential

Having worked in the field of Data Science, I wanted to explore how I can implement projects in other domains, So I thought of connecting with ProjectPro. A project that helped me absorb this topic... Read More

Relevant Projects

Machine Learning Projects

Data Science Projects

Python Projects for Data Science

Data Science Projects in R

Machine Learning Projects for Beginners

Deep Learning Projects

Neural Network Projects

Tensorflow Projects

NLP Projects

Kaggle Projects

IoT Projects

Big Data Projects

Hadoop Real-Time Projects Examples

Spark Projects

Data Analytics Projects for Students

Relevant Projects

Demand prediction of driver availability using multistep time series analysis

In this supervised learning machine learning project, you will predict the availability of a driver in a specific area by using multi step time series analysis.

View Project Details

Insurance Pricing Forecast Using XGBoost Regressor

In this project, we are going to talk about insurance forecast by using linear and xgboost regression techniques.

View Project Details

Langchain Project for Customer Support App in Python

In this LLM Project, you will learn how to enhance customer support interactions through Large Language Models (LLMs), enabling intelligent, context-aware responses. This Langchain project aims to seamlessly integrate LLM technology with databases, PDF knowledge bases, and audio processing agents to create a comprehensive customer support application.

View Project Details

How to create static and dynamic partitions in hive?

Recipe Objective

System requirements :

Table of Contents

Step1 : Prepare the dataset

Step 2 : Create a Hive Table and Load the data

Step 3 : Load data into hive table

Step 4 : Query and verify the data

Step 5 : Create a Partition table with Partition key

Step 6 : To drop or delete the static/dynamic partition column

Gautam Vermani

Relevant Projects

You might also like

Relevant Projects