|Cracking a Hadoop Admin Interview becomes a tedious job if you do not spend enough time preparing for it.This article lists top Hadoop Admin Interview Questions and Answers which are likely to be asked when being interviewed for Hadoop Adminstration jobs.|
In 2010, nobody knew what Hadoop is and today the elephant in the big data room has become the big data darling. According to Wikibon, the Hadoop market crossed $256 mn in vendor revenue in 2012 and is anticipated to exponentially increase to $1.7 billion by end of 2017. Programmers, architects, system administrators and data warehousing professionals are leaving no stone unturned in learning Hadoop for storing and processing large data sets.
Professionals who are trying for a Hadoop Developer or Hadoop Admin job, do not necessarily put much effort into preparing just Hadoop Admin Interview Questions. While people going for the Hadoop developer positions, can take the liberty to prepare interview questions related to administration as part of their overall Hadoop interview, it is essential for people – who are preparing just for the role of Hadoop Admin, to get into the details of Hadoop admin interview questions. In our previous posts Top 100 Hadoop Interview Questions and Answers and Top 50 Hadoop Interview Questions, we listed all the Hadoop Interview Questions that can be asked in a Hadoop Developer job interview.
Computing research found that the skills gap for Hadoop is one of the biggest in the entire big data spectrum. In the big data space where Hadoop is used by various industries, the importance of Hadoop Administration cannot be overlooked. There are myriad industries hiring Hadoop Administrators, for ensuring that their big data systems can tick in the most complex and dynamic situations. From finance to government sectors, every industry is hiring Hadoop Admins to manage their big data platforms. The demand for Hadoop Admin professionals is rising, to fulfill the dearth of expertise talent.
Want to know how much a Hadoop Professional earns at top tech companies- CLICK HERE
Without much ado let’s help you get started on bridging the talent gap by helping you nail your next Hadoop Administration Job Interview -
Hadoop Admin Interviews, test a candidate’s knowledge around the installation, configuration and maintenance of Hadoop software. A Hadoop Administrator is required to research and implement platform-specific big data solutions based on the requirements of the stakeholders. It is necessary for a candidate appearing for a Hadoop Admin Interview, to be well-versed with concepts of large scale data management. To justify yourself as a quality candidate for the Hadoop Admin job profile, make sure that you discuss your knowledge and abilities to manage Hadoop projects, exhibit multitasking and leadership skills in your specific areas of interest and expertise.
If you would like more information about Big Data careers, please click the orange "Request Info" button on top of this page.
If you have applied for a Hadoop Admin job, then it is worth your time to review some of the Hadoop Admin Interview Questions, listed below, while you prepare for your interview-
Learn Hadoop to become a Microsoft Certified Big Data Engineer.
1) How will you decide whether you need to use the Capacity Scheduler or the Fair Scheduler?
Fair Scheduling is the process in which resources are assigned to jobs such that all jobs get to share equal number of resources over time. Fair Scheduler can be used under the following circumstances -
i) If you wants the jobs to make equal progress instead of following the FIFO order then you must use Fair Scheduling.
ii) If you have slow connectivity and data locality plays a vital role and makes a significant difference to the job runtime then you must use Fair Scheduling.
iii) Use fair scheduling if there is lot of variability in the utilization between pools.
Capacity Scheduler allows runs the hadoop mapreduce cluster as a shared, multi-tenant cluster to maximize the utilization of the hadoop cluster and throughput.Capacity Scheduler can be used under the following circumstances -
i) If the jobs require scheduler detrminism then Capacity Scheduler can be useful.
ii) CS's memory based scheduling method is useful if the jobs have varying memory requirements.
iii) If you want to enforce resource allocation because you know very well about the cluster utilization and workload then use Capacity Scheduler.
2) What are the daemons required to run a Hadoop cluster?
NameNode, DataNode, TaskTracker and JobTracker
3) How will you restart a NameNode?
The easiest way of doing this is to run the command to stop running shell script i.e. click on stop-all.sh. Once this is done, restarts the NameNode by clicking on start-all.sh.
Build an impressive Hadoop Project portfolio by working on interesting hadoop project ideas.
4) Explain about the different schedulers available in Hadoop.
5) List few Hadoop shell commands that are used to perform a copy operation.
6) What is jps command used for?
jps command is used to verify whether the daemons that run the Hadoop cluster are working or not. The output of jps command shows the status of the NameNode, Secondary NameNode, DataNode, TaskTracker and JobTracker.
7) What are the important hardware considerations when deploying Hadoop in production environment?
8) How many NameNodes can you run on a single Hadoop cluster?
9) What happens when the NameNode on the Hadoop cluster goes down?
The file system goes offline whenever the NameNode is down.
10) What is the conf/hadoop-env.sh file and which variable in the file should be set for Hadoop to work?
This file provides an environment for Hadoop to run and consists of the following variables-HADOOP_CLASSPATH, JAVA_HOME and HADOOP_LOG_DIR. JAVA_HOME variable should be set for Hadoop to run.
11) Apart from using the jps command is there any other way that you can check whether the NameNode is working or not.
Use the command -/etc/init.d/hadoop-0.20-namenode status.
12) In a MapReduce system, if the HDFS block size is 64 MB and there are 3 files of size 127MB, 64K and 65MB with FileInputFormat. Under this scenario, how many input splits are likely to be made by the Hadoop framework.
2 splits each for 127 MB and 65 MB files and 1 split for the 64KB file.
13) Which command is used to verify if the HDFS is corrupt or not?
Hadoop FSCK (File System Check) command is used to check missing blocks.
14) List some use cases of the Hadoop Ecosystem
Text Mining, Graph Analysis, Semantic Analysis, Sentiment Analysis, Recommendation Systems.
15) How can you kill a Hadoop job?
Hadoop job –kill jobID
16) I want to see all the jobs running in a Hadoop cluster. How can you do this?
Using the command – Hadoop job –list, gives the list of jobs running in a Hadoop cluster.
17) Is it possible to copy files across multiple clusters? If yes, how can you accomplish this?
Yes, it is possible to copy files across multiple Hadoop clusters and this can be achieved using distributed copy. DistCP command is used for intra or inter cluster copying.
18) Which is the best operating system to run Hadoop?
Ubuntu or Linux is the most preferred operating system to run Hadoop. Though Windows OS can also be used to run Hadoop but it will lead to several problems and is not recommended.
19) What are the network requirements to run Hadoop?
20) The mapred.output.compress property is set to true, to make sure that all output files are compressed for efficient space usage on the Hadoop cluster. In case under a particular condition if a cluster user does not require compressed data for a job. What would you suggest that he do?
If the user does not want to compress the data for a particular job then he should create his own configuration file and set the mapred.output.compress property to false. This configuration file then should be loaded as a resource into the job.
21) What is the best practice to deploy a secondary NameNode?
It is always better to deploy a secondary NameNode on a separate standalone machine. When the secondary NameNode is deployed on a separate machine it does not interfere with the operations of the primary node.
22) How often should the NameNode be reformatted?
The NameNode should never be reformatted. Doing so will result in complete data loss. NameNode is formatted only once at the beginning after which it creates the directory structure for file system metadata and namespace ID for the entire file system.
23) If Hadoop spawns 100 tasks for a job and one of the job fails. What does Hadoop do?
The task will be started again on a new TaskTracker and if it fails more than 4 times which is the default setting (the default value can be changed), the job will be killed.
24) How can you add and remove nodes from the Hadoop cluster?
25) You increase the replication level but notice that the data is under replicated. What could have gone wrong?
Nothing could have actually wrong, if there is huge volume of data because data replication usually takes times based on data size as the cluster has to copy the data and it might take a few hours.
26) Explain about the different configuration files and where are they located.
The configuration files are located in “conf” sub directory. Hadoop has 3 different Configuration files- hdfs-site.xml, core-site.xml and mapred-site.xml
These interview questions are asked on a case by case basis, depending on – where you are applying for the role of a Hadoop admin, do you have prior experience at this role, etc. Please do share your Hadoop Admin interview experience in the comments below.
The above list just gives an overview on the different types of Hadoop Admin Interview questions that can be asked. However, the Hadoop Admin Interview questions can purely vary and change based on your working experience and the business domain you come from. Do not worry if you are inexperienced, as companies would love to hire you if you are clear with your basics and have hands-on experience in working on Hadoop projects. The foremost thing to get started on, is to prepare for a great career in Hadoop Administration and one can definitely succeed in nailing a Hadoop Admin Interview. Strive for excellence and success will follow.
We would love to answer any questions you have in honing your Hadoop skills for a lucrative career, please leave a comment below.