Namenode federation


4 Answer(s)


Hi Krishna,
I say, we can do it but there is no point of benefit. If you are talking about availability factor that would be affect.
Let assume, if client has enter the data into hdfs and let assume its metadata is store is in NN1. If the request has made to first datanode, it comes to NN1, it tell the jobtracker about the location and job will proceed.
If the request goes to other NN2 or NN3. They won't able to return the location and job will fail.
Conclusion : Increase of number namenode will only increase the partition of cluster, their is not affect on availability factor.

Hope this helps.
Thanks.

Hi Krishna,

Federation concept in Hadoop is all about Namespace scalability and isolation using same underline storage. For larger deployments, any organization may want to isolate their cluster to run segregated applications. Each namenode doesn't require any coordination with other and therefore they are federated.

Yes, you can also use to scale your limited storage but moreover, when you configure a Namenode in a larger organization, you will prefer to have a highly configured Namenode. However, with federation you will have the benefits of both namespace scalability and isolation.

Yes, you need to add new servers for your SNN/backup node, if you are considering the isolation among the namespaces.

Thanks Abhishek. Let me put it in my own example, so that I can understand this better:)
Example:
/user1, /user2 and /user3 are three namespaces
NN1 - /user1, NN2 - /user2, NN3 - /user3
SNN1 - /user1, SNN2 - /user2, SNN3 - /user3
NN1 and SNN1 are under replication
NN2 and SNN2 are under replication
NN3 and SNN3 are under replication
Because of this, my namespaces are federated, isolated and do provide high availability

Did I understand this correctly ?

Thanks,
Krishna

Yes, you have understood it correctly but do remember SNN means secondary namenode which helps you provide a checkpoint whereas a standby server provides you high availability. In simple terms, a federated cluster is a centralized unit within which each cluster keep some autonomy to be isolated.