1. In real project following set of machines can be a part of hadoop setup:
May 07 2014 03:47 PM
a. Cloudera cluster: comprises of several machines behaving as datanodes/tastracker and one or machine behaving as namenode/jobtracker.
b. Client machines: from where cluster can be accessed for firing hadoop commands [client machine itself can be a part of hadoop cluster]
c. User machine: using which user can remotely connect to client machines and perform hadoop specific operations.
2. To deploy map-reduce program on cloudera environment,
a. Prepare jar containing the MapReduce code and related dependency jar files
b. Add the jar in HADOOP_CLASSPATH eg. export HADOOP_CLASSPATH=HADOOP_CLASSPATH:
c. Fire command: hadoop jar
3. To start on a POC, a single [windows/linux] machine would suffice with Cloudera VM installed on it.
Link : http://www.cloudera.com/content/support/en/downloads/download-components/download-products.html?productID=F6mO278Rvo
This VM comes with pre-installed psuedo-distributed hadoop components and eclipse.
Eclipse can be used to write down java based map-reduce code and pre-installed cloudera hadoop can be used to run the program.
Once done with the complete functionality, deploy the program on 5-node hadoop cluster with some good amount of big-data to showcase the time differences.
To deploy 5-node cluster, you can take help from dezyre administrator course material.