Explain the use of explode and lateral view in hive?

Explain the use of explode and lateral view in hive?

Explain the use of explode and lateral view in hive?

This recipe explains what the use of explode and lateral view in hive

Recipe Objective

In big data scenarios , when data volume is huge, we may need to find a subset of data to speed up data analysis. Here comes a technique to select and analyze a subset of data in order to identify patterns and trends in the data known as sampling.

System requirements :

Step 1 : Prepare the dataset

Here we are using the employee related comma separated values (csv) dataset for the create hive table in local.

Data of Output looks as follows:

Before create a table open the hive shell and we need to create a database as follows :Open the hive shell as below

To create database using below queries :

Create database dezyre_db; use dezyre_db;

As follows below:

Step 2 : Create a Hive Table and Load the Data into the Table and verify the Data

Here we are going create a hive table for loading the data from this table to created bucketed tables, Use below to create a hive table:

CREATE TABLE employee ( employee_id int, company_id int, seniority int, salary int, join_date string, quit_date string, dept string ) ROW FORMAT DELIMITED fields terminated by ',' TBLPROPERTIES ("skip.header.line.count"="1");

the above query runs as follows :

Loading the data into hive table and verifying the data

load data local inpath '/home/bigdata/Downloads/empdata.csv' into table employee;

Verifying the data by running the select query as follows

Explanation of views in hive

Views are logical data structures that can be used to simplify queries by either hiding the complexities such as joins, subqueries, and filters or by flattening the data. Hive views do not store data. Once the Hive view is created, its schema is frozen immediately. Subsequent changes to the underlying tables will not be reflected in the view’s schema.

Step 3 : Creating a view

Here we are creating a view using select statement from table employee which employees has greater than 1 lakh salary

hive> CREATE VIEW emp_100000 AS SELECT * FROM employee WHERE salary>100000;

Output of the above query :

We can verify using below queries whether the view is created or not and format of the view as below:

hive> show create table emp_100000;

Output of the above query as follows :

hive> desc formatted emp_100000;

Output of the above query as follows :

Step 4 : Alter a view properties using alter statement

hive> ALTER VIEW emp_100000 SET TBLPROPERTIES ('comment' = 'This is my view table');

Output of the above query as follows :

Step 5 : Drop a view drop statement

Here we are going to drop the view to drop a view run the below query is as follows:

hive> DROP VIEW emp_100000;

Output of the above query as follows :

Step 6 : Usage of Explode and Lateral View

The LATERAL VIEW statement is used with user-defined table generating functions such as EXPLODE() to flatten the map or array type of a column.The explode function can be used on both ARRAY and MAP with LATERAL VIEW.

Before performing exploring and lateral view functions we will create table and insert data into it To create a table in hive with array data type column is as follows

hive> create table std_course_details( std_id int, stud_name string, location string, course array);

Query runs as follows below:

To insert data into above table using insert statement

hive> INSERT INTO TABLE std_course_details VALUES (1,'vamshi','hyd',array('hive','hadoop','spark')),(2,'chandana','bangalore',array('reactjs','javascript')),(3,'Divya','pune',array('python','pyspark','airflow','spark')),(4,'srikanth','pune',array('java','spring boot')),(5,'sreethan','pune',array('c','c++'));

To verify the insert data into table

  • Explode function:
  • The explode function explodes an array to multiple rows. Returns a row-set with a single column (col), one row for each element from the array.

    Here we are going to split array column values into rows by running the below query :

    hive > select explode(course) from std_course_details;

    the above query runs as follows

  • Lateral View :
  • Lateral view explodes the array data into multiple rows. In other words, lateral view expands the array into rows.

    When you use a lateral view along with the explode function, you will get the result something like below.

    hive> select std_id,stud_name,location,courses from std_course_details LATERAL VIEW explode(course) courses_list as courses;

    Output of the above query :

    Relevant Projects

    Yelp Data Processing Using Spark And Hive Part 1
    In this big data project, we will continue from a previous hive project "Data engineering on Yelp Datasets using Hadoop tools" and do the entire data processing using spark.

    Event Data Analysis using AWS ELK Stack
    This Elasticsearch example deploys the AWS ELK stack to analyse streaming event data. Tools used include Nifi, PySpark, Elasticsearch, Logstash and Kibana for visualisation.

    Movielens dataset analysis for movie recommendations using Spark in Azure
    In this Databricks Azure tutorial project, you will use Spark Sql to analyse the movielens dataset to provide movie recommendations. As part of this you will deploy Azure data factory, data pipelines and visualise the analysis.

    Web Server Log Processing using Hadoop
    In this hadoop project, you will be using a sample application log file from an application server to a demonstrated scaled-down server log processing pipeline.

    NLP Project on LDA Topic Modelling Python using RACE Dataset
    Use the RACE dataset to extract a dominant topic from each document and perform LDA topic modeling in python.

    Real-time Auto Tracking with Spark-Redis
    Spark Project - Discuss real-time monitoring of taxis in a city. The real-time data streaming will be simulated using Flume. The ingestion will be done using Spark Streaming.

    Explore features of Spark SQL in practice on Spark 2.0
    The goal of this spark project for students is to explore the features of Spark SQL in practice on the latest version of Spark i.e. Spark 2.0.

    Tough engineering choices with large datasets in Hive Part - 1
    Explore hive usage efficiently in this hadoop hive project using various file formats such as JSON, CSV, ORC, AVRO and compare their relative performances

    PySpark Tutorial - Learn to use Apache Spark with Python
    PySpark Project-Get a handle on using Python with Spark through this hands-on data processing spark python tutorial.

    Implementing Slow Changing Dimensions in a Data Warehouse using Hive and Spark
    Hive Project- Understand the various types of SCDs and implement these slowly changing dimesnsion in Hadoop Hive and Spark.