How to create static and dynamic partitions in hive?

This recipe helps you create static and dynamic partitions in hive

Recipe Objective

In most of the big data scenarios , Hive is an ETL and data warehouse tool on top of the hadoop ecosystem, it is used for the processing of the different types structured and semi-structured data, it is a database. Most of the times, we need to store the data based on conditions as partitions. Partitions are used to arrange table data into partitions by splitting tables into different parts based on the values to create partitions. Partitions are mainly useful for hive query optimisation to reduce the latency in the data. In hive we have two different partitions that are static and dynamic

Build a Chatbot in Python from Scratch!

System requirements :

Step1 : Prepare the dataset

Here we are using the food related comma separated values dataset to perform the csv file

Use linux command to check data as follows:

head -10 food_prices.csv

Data of Output looks as follows:

Before create a table open the hive shell , we need to create a database as follows Open the hive shell

To create database :

Create database dezyre_db; use dezyre_db;

Step 2 : Create a Hive Table and Load the data

Here we are going create a hive table as follows below hive script to create a table as follows

CREATE TABLE food_prices ( series_reference string, Period string, data_value int, status string, units string, subject string, product string, series_title string, year string )ROW FORMAT DELIMITED fields terminated by ',' TBLPROPERTIES ("skip.header.line.count"="1");

In the above hql script we are creating a hive temporary table also skipping the headers which we are going to read from the csv file .

Step 3 : Load data into hive table

load data local inpath '/home/bigdata/Downloads/food_prices.csv' into table food_prices;

Step 4 : Query and verify the data

To query the data using below query in hive shell

Select * from food_prices limit 10;

Output of the above query :

Step 5 : Create a Partition table with Partition key

Here we are going to create a partition table by specifying the "partition by" while creating the table.

Static Partition : In static partitioning we need to pass the values of the partitioned column manually when we load the data into the table. So, We need to manually create each partition before inserting data into a partition

Syntax to create a partition table :

create table partioned_food_prices ( series_reference string, data_value int, status string, units string, subject string, product string ) partitioned by (year string) ;

Loading the data by specific the partition key as follows :

hive> insert overwrite table partioned_food_prices partition(year = '2019') select series_reference, data_value, status, units, subject, product from food_prices where year='2019';

Here in the above we are loading the data specifying year as partition key and value, that particular data will load into the table.

Output of the query: it will run the mapreduce to insert the data into the partitioned table from the temporary table.

We can verify the data by run the hdfs command as below:

hdfs dfs -cat /user/hive/warehouse/partioned_food_prices/year=2019/000000_0 head -10

Output of the above command:

We can also verify the data by running select query as below:

Dynamic partition : In Dynamic partitioning, Partitions will be created dynamically based on input data to the table.

Syntax to create partition table:

create table partitioned_food_prices ( series_reference string, Period string, data_value int, status string, units string, subject string, product string, series_title string )partitioned by (year string);

Enable properties for the partition set hive.exec.dynamic.partition.mode=nonstrict;

Loading the data into partitioned table from the temp table :

hive> insert overwrite table partitioned_food_prices partition(year) select * from default.food_prices;

in the Static partition while creating the partition we were giving some static value for the partition column. In a dynamic partition, the partition will happen dynamically i.e. it will create a partition based on the value of the partition column.

Query run as below: -

Output of the above query : it will create partitioned directories based on the year.

We can also verify the data by running the below query

Step 6 : To drop or delete the static/dynamic partition column

To drop or delete the partition column by using the alter table with delete partition command

What Users are saying..

profile image

Gautam Vermani

Data Consultant at Confidential
linkedin profile url

Having worked in the field of Data Science, I wanted to explore how I can implement projects in other domains, So I thought of connecting with ProjectPro. A project that helped me absorb this topic... Read More

Relevant Projects

Demand prediction of driver availability using multistep time series analysis
In this supervised learning machine learning project, you will predict the availability of a driver in a specific area by using multi step time series analysis.

Insurance Pricing Forecast Using XGBoost Regressor
In this project, we are going to talk about insurance forecast by using linear and xgboost regression techniques.

Langchain Project for Customer Support App in Python
In this LLM Project, you will learn how to enhance customer support interactions through Large Language Models (LLMs), enabling intelligent, context-aware responses. This Langchain project aims to seamlessly integrate LLM technology with databases, PDF knowledge bases, and audio processing agents to create a comprehensive customer support application.

Deep Learning Project for Beginners with Source Code Part 1
Learn to implement deep neural networks in Python .

Build Multi Class Text Classification Models with RNN and LSTM
In this Deep Learning Project, you will use the customer complaints data about consumer financial products to build multi-class text classification models using RNN and LSTM.

NLP and Deep Learning For Fake News Classification in Python
In this project you will use Python to implement various machine learning methods( RNN, LSTM, GRU) for fake news classification.

Time Series Forecasting Project-Building ARIMA Model in Python
Build a time series ARIMA model in Python to forecast the use of arrival rate density to support staffing decisions at call centres.

Learn How to Build a Linear Regression Model in PyTorch
In this Machine Learning Project, you will learn how to build a simple linear regression model in PyTorch to predict the number of days subscribed.

MLOps Project to Build Search Relevancy Algorithm with SBERT
In this MLOps SBERT project you will learn to build and deploy an accurate and scalable search algorithm on AWS using SBERT and ANNOY to enhance search relevancy in news articles.

Deploy Transformer-BART Model on Paperspace Cloud
In this MLOps Project you will learn how to deploy a Tranaformer BART Model for Abstractive Text Summarization on Paperspace Private Cloud