Create a directory and list the contents of a directory in HDFS

This recipe explains how to create a directory and how to list the contents of a directory in HDFS

Recipe Objective: How to create a directory and list the contents of a directory in HDFS?

This recipe demonstrates creating a directory in the HDFS and listing out the contents of the directory.

Deploy an Auto Twitter Handle with Spark and Kafka

Prerequisites:

Before proceeding with the recipe, make sure Single node Hadoop is installed on your local EC2 instance. If not already installed, follow this link (click here ) to do the same.

Steps to set up an environment:

  • In the AWS, create an EC2 instance and log in to Cloudera Manager with your public IP mentioned in the EC2 instance. Login to putty/terminal and check if HDFS is installed. If not installed, please find the links provided above for installations.
  • Type “&ltyour public IP&gt:7180” in the web browser and log in to Cloudera Manager, where you can check if Hadoop is installed.
  • If they are not visible in the Cloudera cluster, you may add them by clicking on the “Add Services” in the cluster to add the required services in your local instance.

Following are the steps to create a directory and list its content:

Step 1: Switch to root user from ec2-user using the “sudo -i” command.

 bigdata_1

Step 2: Create the directory using the command:

hadoop fs -mkdir &ltdirectory name with full its full path&gt

Let us create a directory named “new_directory” in the “user.” So the command is “hdfs fs -mkdir /user/new_directory/”. And check if it is created successfully using the “hdfs fs -ls /user” command.

 bigdata_2

 

 bigdata_3

We see that the directory is successfully created.

Step 3: Let us now list the contents of this newly-created directory.

As we have not loaded in this directory, the command is successfully executed but with no output as the directory is empty.

For example, I have a directory “Customer,” and I wish to display the contents of this directory. Hence passed the full path to this directory in the Hadoop fs -ls command. The output of the same is given below.

 bigdata_4

 

What Users are saying..

profile image

Anand Kumpatla

Sr Data Scientist @ Doubleslash Software Solutions Pvt Ltd
linkedin profile url

ProjectPro is a unique platform and helps many people in the industry to solve real-life problems with a step-by-step walkthrough of projects. A platform with some fantastic resources to gain... Read More

Relevant Projects

Learn How to Implement SCD in Talend to Capture Data Changes
In this Talend Project, you will build an ETL pipeline in Talend to capture data changes using SCD techniques.

AWS Project-Website Monitoring using AWS Lambda and Aurora
In this AWS Project, you will learn the best practices for website monitoring using AWS services like Lambda, Aurora MySQL, Amazon Dynamo DB and Kinesis.

GCP Project to Learn using BigQuery for Exploring Data
Learn using GCP BigQuery for exploring and preparing data for analysis and transformation of your datasets.

Orchestrate Redshift ETL using AWS Glue and Step Functions
ETL Orchestration on AWS - Use AWS Glue and Step Functions to fetch source data and glean faster analytical insights on Amazon Redshift Cluster

Data Processing and Transformation in Hive using Azure VM
Hive Practice Example - Explore hive usage efficiently for data transformation and processing in this big data project using Azure VM.

Learn Real-Time Data Ingestion with Azure Purview
In this Microsoft Azure project, you will learn data ingestion and preparation for Azure Purview.

SQL Project for Data Analysis using Oracle Database-Part 7
In this SQL project, you will learn to perform various data wrangling activities on an ecommerce database.

Databricks Real-Time Streaming with Event Hubs and Snowflake
In this Azure Databricks Project, you will learn to use Azure Databricks, Event Hubs, and Snowflake to process and analyze real-time data, specifically in monitoring IoT devices.

Deploying auto-reply Twitter handle with Kafka, Spark and LSTM
Deploy an Auto-Reply Twitter Handle that replies to query-related tweets with a trackable ticket ID generated based on the query category predicted using LSTM deep learning model.

Snowflake Real Time Data Warehouse Project for Beginners-1
In this Snowflake Data Warehousing Project, you will learn to implement the Snowflake architecture and build a data warehouse in the cloud to deliver business value.