How to select the Files to Import using Sqoop

This recipe helps you select the Files to Import using Sqoop
Last Updated: 30 Sep 2021

Get access to Big Data projects View all Big Data projects

APACHE HADOOP PROJECTS DATA CLEANING PYTHON DATA MUNGING MACHINE LEARNING RECIPES PANDAS CHEATSHEET ALL TAGS

Recipe Objective: How to select the Files to Import using Sqoop?

This recipe teaches us to select files present in RDBMS and insert them into HDFS using the Sqoop tool. It was designed to transfer data between HDFS and RDBMS using a JDBC driver to connect.

Prerequisites:

Before proceeding with the recipe, make sure the following installations are done on your local EC2 instance.

Single node Hadoop - click here
Apache Sqoop -click here
MySQL - click here

Steps to set up the environment:

In the AWS, create an EC2 instance and log in to Cloudera Manager with your public IP mentioned in the EC2 instance.
To do this, type “&ltyour public IP&gt:7180” in the web browser and log in to Cloudera Manager, where you can check if Hadoop and Sqoop are installed.
If they are not visible in the Cloudera cluster, you may add them by clicking on the “Add Services” in the cluster to add the required services in your local instance.

Transferring files from MySQL to HDFS:

The following steps are involved in pulling the data present in the MySQL table and inserting it into HDFS. We see how sqoop is used to achieve this. In this example, we created apriori using the “retailinfo” table present in our “test” database.

Step 1: Log in to MySQL using

mysql -u root -p;

Enter the required credentials.

bigdata_1

Step 2:Before transferring the selected file, check how many records are present in it. To do this, change to the required database using.

use &ltdatabase name>

Now, check the number of records present in the table, i.e., the file we select to move its data into HDFS, using “count(*).”

select count(*) from &lttablename>

Step 3: Before running the “sqoop import” command, ensure that the target directory is not already present. Otherwise, the import command throws an error. To check this, let us try deleting the directory that we wish to use as our target directory.

hadoop fs -rm -r &lt target directory &gt

Step 4: Let us now do the “sqoop import” job to pull the data from the selected file from MySQL and insert it into HDFS using the command:

sqoop import \ --connect jdbc:mysql://localhost/&ltdatabase name&gt \ --table &lttable name&gt \ --username &ltusername&gt --password &ltpassword&gt \ --target-dir &lttarget location in HDFS&gt \ -m &ltno. of Mapper jobs you wish to create&gt

In our example, the database is “test,” the selected file is the table “retailinfo,” and our target location in HDFS is in the directory “/user/root/online_basic_command.” We proceeded with only one mapper job.

bigdata_4

Upon successful data transfer, the output looks similar to:

bigdata_5

bigdata_6

Download Materials

bigdata_1

bigdata_2

bigdata_3

bigdata_4

bigdata_5

bigdata_6

What Users are saying..

Ray han

Tech Leader | Stanford / Yale University

I think that they are fantastic. I attended Yale and Stanford and have worked at Honeywell,Oracle, and Arthur Andersen(Accenture) in the US. I have taken Big Data and Hadoop,NoSQL, Spark, Hadoop... Read More

Relevant Projects

Machine Learning Projects

Data Science Projects

Python Projects for Data Science

Data Science Projects in R

Machine Learning Projects for Beginners

Deep Learning Projects

Neural Network Projects

Tensorflow Projects

NLP Projects

Kaggle Projects

IoT Projects

Big Data Projects

Hadoop Real-Time Projects Examples

Spark Projects

Data Analytics Projects for Students

Relevant Projects

SQL Project for Data Analysis using Oracle Database-Part 4

In this SQL Project for Data Analysis, you will learn to efficiently write queries using WITH clause and analyse data using SQL Aggregate Functions and various other operators like EXISTS, HAVING.

View Project Details

SQL Project for Data Analysis using Oracle Database-Part 1

In this SQL Project for Data Analysis, you will learn to efficiently leverage various analytical features and functions accessible through SQL in Oracle Database

View Project Details

Building Real-Time AWS Log Analytics Solution

In this AWS Project, you will build an end-to-end log analytics solution to collect, ingest and process data. The processed data can be analysed to monitor the health of production systems on AWS.

View Project Details

Build an ETL Pipeline with DBT, Snowflake and Airflow

Data Engineering Project to Build an ETL pipeline using technologies like dbt, Snowflake, and Airflow, ensuring seamless data extraction, transformation, and loading, with efficient monitoring through Slack and email notifications via SNS

View Project Details

Deploying auto-reply Twitter handle with Kafka, Spark and LSTM

Deploy an Auto-Reply Twitter Handle that replies to query-related tweets with a trackable ticket ID generated based on the query category predicted using LSTM deep learning model.

View Project Details

Build a Spark Streaming Pipeline with Synapse and CosmosDB

In this Spark Streaming project, you will learn to build a robust and scalable spark streaming pipeline using Azure Synapse Analytics and Azure Cosmos DB and also gain expertise in window functions, joins, and logic apps for comprehensive real-time data analysis and processing.

View Project Details

How to select the Files to Import using Sqoop

Recipe Objective: How to select the Files to Import using Sqoop?

Prerequisites:

Steps to set up the environment:

Transferring files from MySQL to HDFS:

Ray han

Relevant Projects

You might also like

Relevant Projects