How to change the owner of files in HDFS

This recipe helps you change the owner of files in HDFS

Recipe Objective: How to change the owner of files in HDFS?

This recipe demonstrates how to change the owner of files in the HDFS.

Prerequisites:

Before proceeding with the recipe, make sure Single node Hadoop is installed on your local EC2 instance. If not already installed, follow the below link to do the same.

Steps to set up an environment:

  • In the AWS, create an EC2 instance and log in to Cloudera Manager with your public IP mentioned in the EC2 instance. Login to putty/terminal and check if HDFS is installed. If not installed, please find the links provided above for installations.
  • Type “&ltyour public IP&gt:7180” in the web browser and log in to Cloudera Manager, where you can check if Hadoop is installed.
  • If they are not visible in the Cloudera cluster, you may add them by clicking on the “Add Services” in the cluster to add the required services in your local instance.

Changing the owner of files in the HDFS:

Firstly, switch to root user from ec2-user using the “sudo -i” command. And let us create a directory in the HDFS by changing it as the HDFS user. Commands for the same are listed below.

sudo -i
su - hdfs
 bigdata_1

Let us create a directory “test-dir” in the hdfs using the mkdir command. And check if the owner of the directory is “hdfs,” as required. Run the following commands to create the directory and list the directories in hdfs to check the owner of the newly-created directory.

hadoop fs -mkdir /user/test-dir/
hadoop fs -ls /user
bigdata_2

To send a file from any directory to the hdfs directory, the owner inside hdfs should be changed to the directory from which we are sending the file. For example: If you have to send a file from the root user to the “test-dir” directory inside hdfs, the owner of that directory inside hdfs should be changed to root. Run the following command to change the owner of the directory.

hadoop fs -chown root:supergroup /user/test-dir

Upon changing, list the files in the “user” to check if the owner of test-dir is changed from hdfs to root.

Sample output:

bigdata_3

What Users are saying..

profile image

Ameeruddin Mohammed

ETL (Abintio) developer at IBM
linkedin profile url

I come from a background in Marketing and Analytics and when I developed an interest in Machine Learning algorithms, I did multiple in-class courses from reputed institutions though I got good... Read More

Relevant Projects

Build an ETL Pipeline on EMR using AWS CDK and Power BI
In this ETL Project, you will learn build an ETL Pipeline on Amazon EMR with AWS CDK and Apache Hive. You'll deploy the pipeline using S3, Cloud9, and EMR, and then use Power BI to create dynamic visualizations of your transformed data.

SQL Project for Data Analysis using Oracle Database-Part 2
In this SQL Project for Data Analysis, you will learn to efficiently analyse data using JOINS and various other operations accessible through SQL in Oracle Database.

PySpark Tutorial - Learn to use Apache Spark with Python
PySpark Project-Get a handle on using Python with Spark through this hands-on data processing spark python tutorial.

Real-Time Streaming of Twitter Sentiments AWS EC2 NiFi
Learn to perform 1) Twitter Sentiment Analysis using Spark Streaming, NiFi and Kafka, and 2) Build an Interactive Data Visualization for the analysis using Python Plotly.

Build an ETL Pipeline for Financial Data Analytics on GCP-IaC
In this GCP Project, you will learn to build an ETL pipeline on Google Cloud Platform to maximize the efficiency of financial data analytics with GCP-IaC.

AWS Project for Batch Processing with PySpark on AWS EMR
In this AWS Project, you will learn how to perform batch processing on Wikipedia data with PySpark on AWS EMR.

Python and MongoDB Project for Beginners with Source Code-Part 2
In this Python and MongoDB Project for Beginners, you will learn how to use Apache Sedona and perform advanced analysis on the Transportation dataset.

Azure Stream Analytics for Real-Time Cab Service Monitoring
Build an end-to-end stream processing pipeline using Azure Stream Analytics for real time cab service monitoring

How to deal with slowly changing dimensions using snowflake?
Implement Slowly Changing Dimensions using Snowflake Method - Build Type 1 and Type 2 SCD in Snowflake using the Stream and Task Functionalities

PySpark ETL Project for Real-Time Data Processing
In this PySpark ETL Project, you will learn to build a data pipeline and perform ETL operations for Real-Time Data Processing