How to work with Import and Export commands in Hive

This recipe helps you work with Import and Export commands in Hive

Recipe Objective: How to work with Import and Export commands in Hive?

Hive supports the import and export of data between Hive and DBMS. In this recipe, we work with the Import and Export commands in Hive. The EXPORT command exports the data of a table or partition and the metadata into a specified output location. And the IMPORT command moves data from DBMS to Hive.

ETL Orchestration on AWS using Glue and Step Functions

Prerequisites:

Before proceeding with the recipe, make sure Single node Hadoop and Hive are installed on your local EC2 instance. If not already installed, follow the below link to do the same.

Steps to set up an environment:

  • In the AWS, create an EC2 instance and log in to Cloudera Manager with your public IP mentioned in the EC2 instance. Login to putty/terminal and check if HDFS and Hive are installed. If not installed, please find the links provided above for installations.
  • Type “&ltyour public IP&gt:7180” in the web browser and log in to Cloudera Manager, where you can check if Hadoop is installed.
  • If they are not visible in the Cloudera cluster, you may add them by clicking on the “Add Services” in the cluster to add the required services in your local instance.

Working with Export/Import commands in Hive:

Throughout this recipe, we used the “user_info” table present in our database. Following is the schema of the table.

bigdata_1

Simple Export command:

It exports the specified table to the location mentioned in the “hdfs_exports_location.” For example, if we wish to export the “user_info” table to a location named “user_info” in the hdfs, the query would be

export table user_info to 'hdfs_exports_location/user_info';

Sample output:

bigdata_2

Simple Import command:

It imports the specified table from the location mentioned in the “hdfs_exports_location.” Please note, if the file we are importing is named the same as an existing table in the database, the hive import command throws an error. For example, if we wish to import the “user_info” table from hdfs_exports_location, it throws an error as we already have a table with the name “user_info.” The sample output is given below.

bigdata_3

In such scenarios, we rename the table during import.

Rename table on import:

The query to rename a table on import is :

import table &ltnew table name&gt from 'hdfs_exports_location/&ltloc&gt';

Sample output where we are importing the “user_info” table renamed as “user_info_2”.

bigdata_4

Check if the table is successfully imported by listing the tables present in that database.

bigdata_5

Export table partition:

Let us now see how a table partition can be exported. We have a partition table “user_info_part” that is created by partitioning the “user_info” table where the user profession is “educator.” Query to export table partition:

export table &lttable partition&gt partition(&ltpartition condition&gt) to 'hdfs_exports_location/&ltloc&gt';

Sample output:

bigdata_6

Import table partition:

The query to import table partition would be:

import table &ltpartition table name&gt partition(&ltpartition condition&gt) from 'hdfs_exports_location/&ltloc&gt';

Make sure the imported table partition name is not present in the database. Else the query would throw an error. Sample output for importing partition is given below.

bigdata_7

What Users are saying..

profile image

Gautam Vermani

Data Consultant at Confidential
linkedin profile url

Having worked in the field of Data Science, I wanted to explore how I can implement projects in other domains, So I thought of connecting with ProjectPro. A project that helped me absorb this topic... Read More

Relevant Projects

AWS Project-Website Monitoring using AWS Lambda and Aurora
In this AWS Project, you will learn the best practices for website monitoring using AWS services like Lambda, Aurora MySQL, Amazon Dynamo DB and Kinesis.

Building Data Pipelines in Azure with Azure Synapse Analytics
In this Microsoft Azure Data Engineering Project, you will learn how to build a data pipeline using Azure Synapse Analytics, Azure Storage and Azure Synapse SQL pool to perform data analysis on the 2021 Olympics dataset.

PySpark Project-Build a Data Pipeline using Hive and Cassandra
In this PySpark ETL Project, you will learn to build a data pipeline and perform ETL operations by integrating PySpark with Hive and Cassandra

Analyse Yelp Dataset with Spark & Parquet Format on Azure Databricks
In this Databricks Azure project, you will use Spark & Parquet file formats to analyse the Yelp reviews dataset. As part of this you will deploy Azure data factory, data pipelines and visualise the analysis.

SQL Project for Data Analysis using Oracle Database-Part 2
In this SQL Project for Data Analysis, you will learn to efficiently analyse data using JOINS and various other operations accessible through SQL in Oracle Database.

Web Server Log Processing using Hadoop in Azure
In this big data project, you will use Hadoop, Flume, Spark and Hive to process the Web Server logs dataset to glean more insights on the log data.

A Hands-On Approach to Learn Apache Spark using Scala
Get Started with Apache Spark using Scala for Big Data Analysis

Learn How to Implement SCD in Talend to Capture Data Changes
In this Talend Project, you will build an ETL pipeline in Talend to capture data changes using SCD techniques.

Deploying auto-reply Twitter handle with Kafka, Spark and LSTM
Deploy an Auto-Reply Twitter Handle that replies to query-related tweets with a trackable ticket ID generated based on the query category predicted using LSTM deep learning model.

Build a Scalable Event Based GCP Data Pipeline using DataFlow
In this GCP project, you will learn to build and deploy a fully-managed(serverless) event-driven data pipeline on GCP using services like Cloud Composer, Google Cloud Storage (GCS), Pub-Sub, Cloud Functions, BigQuery, BigTable