How to Write a Hive SQL Query to Create Tables & Query Data?

Create tables & extract insights using Hive SQL. This recipe provides a practical roadmap to Hive, making big data analytics a breeze. | ProjectPro
Last Updated: 06 Nov 2023

Get access to Data Science projects View all Data Science projects

DATA SCIENCE PROJECTS IN R DATA CLEANING PYTHON DATA MUNGING MACHINE LEARNING RECIPES PANDAS CHEATSHEET ALL TAGS

Recipe Objective - How to Write a Hive SQL Query to Create Tables & Query Data?

Hive is a powerful data warehousing and SQL-like query language that facilitates the management and analysis of large datasets in a Hadoop ecosystem. It is widely used for querying, summarizing, and analyzing structured and semi-structured data. Check out this recipe to go through the process of creating tables and querying data in Hive using SQL queries. It will also cover the essential steps and provide examples to help you understand the concepts involved.

Understanding Hive SQL

Before diving into writing SQL queries in Hive, it's essential to understand the basics of Hive SQL and its syntax. Hive SQL is similar to traditional SQL, but it has its own set of commands and functions tailored for distributed data processing in Hadoop.

Hadoop Project to Perform Hive Analytics using SQL and Scala

Hive SQL Syntax

Hive SQL follows a syntax that's similar to other SQL dialects, but with some unique features for handling big data. A typical Hive SQL query consists of:

SELECT: To retrieve data from tables.
FROM: To specify the source table.
WHERE: To filter data based on conditions.
GROUP BY: To group data.
HAVING: To filter grouped data.
ORDER BY: To sort the result.
JOIN: To combine data from multiple tables.

Hive SQL Examples

Here are some Hive SQL examples for common data operations:

Aggregations: Compute sums, averages, or other aggregations on data.
Joins: Combine data from multiple tables.
Subqueries: Use subqueries within your SQL statements.
Conditional Logic: Implement conditional statements in your queries.

How to Write a Hive SQL Query to Create Tables - Step-by-Step Guide

Here is a step-by-step guide on crafting Hive SQL queries to master the art of table creation in Hive for efficient data management.

System requirements: Hive SQL Query to Create Tables

Install Ubuntu in the virtual machine Click Here
Install single node Hadoop machine Click Here
Install Apache Hive Click Here

Step 1: Prepare a Dataset

Here we are using the bank related comma separated values (csv) dataset for the create hive table in local.

Use linux command to check data as follows:

head -10 listbankdataset.csv

Data is looks as follows :

dataset view

Step 2 : Copy the CSV Data to HDFS (Hadoop Distributed File System)

Here we are going to copy the csv file to HDFS. Before that we need to create a directory to store the file. To create a directory in the HDFS:

hduser@bigdata-VirtualBox:~$ hdfs dfs -mkdir /data hduser@bigdata-VirtualBox:~$ hdfs dfs -mkdir /data/client_bank_details/

above command as follows:

In the above commands we create /data folder 1st then inside the data folder created another folder client_bank_details.

To check whether a directory is created or not.

Directory check command

To copy the csv file use below command

hduser@bigdata-VirtualBox:~$ hdfs dfs -put /home/hduser/listbankdataset.csv /data/client_bank_details/

To verify the file is exists or not in directory path follow as below:

hduser@bigdata-VirtualBox:~$ hdfs dfs -ls /data/client_bank_details/

Note: In our scenario directory path in the hadoop is /data/client_bank_details/

Before create an external table open the hive shell and we need to create a database as follows : Open the hive shell

Command to open the Hive Shell

To create a database :

Create database dezyre_db; use dezyre_db;

Hive command - Create a database

Retail Analytics Project Example using Sqoop, HDFS, and Hive

Step 3 : Create an External Hive Table and Load the Data

Here we are going create an external hive table on the top of the csv file for that use below hive script to create a table as follows :

CREATE EXTERNAL TABLE IF NOT EXISTS `client_bank_details` ( `client_num` string, `attrition_flag` string, `customer_age` string, `gender` string, `dependent_count` string, `education_level` string, `marital_atatus` string, `Income_xategory` string, `card_xategory` string, `months_on_book` string, `total_relationship_count` string, `months_inactive_12_mon` string, `contacts_count_12_mon` string, `credit_cimit` string, `total_revolving_bal` string, `avg_open_to_buy` string, `total_amt_chng_q4_q1` string, `total_trans_amt` string, `total_trans_ct` string, `total_ct_chng_q4_q1` string, `avg_utilization_ratio` string ) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde' WITH SERDEPROPERTIES ( "separatorChar" = ",", "quoteChar" = "\"" ) STORED AS TEXTFILE LOCATION '/data/client_bank_details' TBLPROPERTIES ("skip.header.line.count"="1");

In the above hql script we are creating an external by specifying the data format and location and also skipping the headers of the csv file.

above command looks as follows:

Hive Command - CREATE an External Table

Airline Dataset Analysis using Hadoop, Hive, Pig and Athena

Step 4 : Query and Verify the Data

To query the data using below query in hive shell

Select * from client_bank_details limit 10;

Output of the above code:

Hive SQL Query Output - Query and Verify the data

Explore More about Hive SQL Operations with ProjectPro!

Mastering Hive SQL is a valuable skill for data professionals. The ability to create tables and query data using Hive SQL is essential in the world of data science and big data. However, the real magic happens when you apply your knowledge in real-world projects. To gain practical experience and deepen your expertise, consider exploring ProjectPro. This one-stop platform offers a repository of over 270+ projects, providing you with the hands-on experience you need to excel in the field of data science and big data. Don't just learn Hive SQL, put it to use with ProjectPro!

What Users are saying..

Anand Kumpatla

Sr Data Scientist @ Doubleslash Software Solutions Pvt Ltd

ProjectPro is a unique platform and helps many people in the industry to solve real-life problems with a step-by-step walkthrough of projects. A platform with some fantastic resources to gain... Read More

Relevant Projects

Machine Learning Projects

Data Science Projects

Python Projects for Data Science

Data Science Projects in R

Machine Learning Projects for Beginners

Deep Learning Projects

Neural Network Projects

Tensorflow Projects

NLP Projects

Kaggle Projects

IoT Projects

Big Data Projects

Hadoop Real-Time Projects Examples

Spark Projects

Data Analytics Projects for Students

Relevant Projects

CycleGAN Implementation for Image-To-Image Translation

In this GAN Deep Learning Project, you will learn how to build an image to image translation model in PyTorch with Cycle GAN.

View Project Details

MLOps Project to Build Search Relevancy Algorithm with SBERT

In this MLOps SBERT project you will learn to build and deploy an accurate and scalable search algorithm on AWS using SBERT and ANNOY to enhance search relevancy in news articles.

View Project Details

Deep Learning Project for Text Detection in Images using Python

CV2 Text Detection Code for Images using Python -Build a CRNN deep learning model to predict the single-line text in a given image.

View Project Details

Recommender System Machine Learning Project for Beginners-1

Recommender System Machine Learning Project for Beginners - Learn how to design, implement and train a rule-based recommender system in Python

View Project Details

Mastering A/B Testing: A Practical Guide for Production

In this A/B Testing for Machine Learning Project, you will gain hands-on experience in conducting A/B tests, analyzing statistical significance, and understanding the challenges of building a solution for A/B testing in a production environment.

View Project Details

Build a Collaborative Filtering Recommender System in Python

Use the Amazon Reviews/Ratings dataset of 2 Million records to build a recommender system using memory-based collaborative filtering in Python.

View Project Details

Loan Eligibility Prediction in Python using H2O.ai

In this loan prediction project you will build predictive models in Python using H2O.ai to predict if an applicant is able to repay the loan or not.

View Project Details

Image Classification Model using Transfer Learning in PyTorch

In this PyTorch Project, you will build an image classification model in PyTorch using the ResNet pre-trained model.

View Project Details

Build a Autoregressive and Moving Average Time Series Model

In this time series project, you will learn to build Autoregressive and Moving Average Time Series Models to forecast future readings, optimize performance, and harness the power of predictive analytics for sensor data.

View Project Details

Ecommerce product reviews - Pairwise ranking and sentiment analysis

This project analyzes a dataset containing ecommerce product reviews. The goal is to use machine learning models to perform sentiment analysis on product reviews and rank them based on relevance. Reviews play a key role in product recommendation systems.

View Project Details

How to Write a Hive SQL Query to Create Tables & Query Data?

Recipe Objective - How to Write a Hive SQL Query to Create Tables & Query Data?

Understanding Hive SQL

Hive SQL Syntax

Hive SQL Examples

How to Write a Hive SQL Query to Create Tables - Step-by-Step Guide

System requirements: Hive SQL Query to Create Tables

Step 1: Prepare a Dataset

Step 2 : Copy the CSV Data to HDFS (Hadoop Distributed File System)

Step 3 : Create an External Hive Table and Load the Data

Step 4 : Query and Verify the Data

Explore More about Hive SQL Operations with ProjectPro!

Anand Kumpatla

Relevant Projects

You might also like

Relevant Projects