Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

HDFS Federation

avatar
Explorer

In Hadoop1, horizontal scaling of NameNode was not possible. Due to this there was a limit to which one could perform Horizontal scaling for DataNodes in the cluster.

Hadoop2 allows horizontal scaling of NameNodes using a technique called HDFS federation. Without HDFS federation, one cannot add more data nodes to the hadoop cluster beyond a certain limit. In other words, a single namenode's RAM will become fully occupied with metadata/namespace information used for managing the entire datanodes in a cluster. It is at this point that addition of new datanodes, i.e horizontal scaling of data nodes became impossible. Because, namenodes's RAM is already full and cannot handle the new set of metadata/namespace informations, generated as a result of managing the additional data nodes in the cluster. So the limiting factor is the memory capacity of a NameNodes RAM. HDFS federation comes to its rescue by dividing the metadata/namespace across multiple Namenodes, thereby offloading part of one application's metadata/namespace information that have grown beyond the capacity of a single NameNode's RAM to a second NameNode's RAM.

Is my understanding correct ??

If YES, then as per documentation, it says that NameNodes are independent and does not communicate each other. In that case who is managing an application's metadata/namespace informations that are resident between two NameNodes? Some sort of daemon should be there that communicates with all NameNodes and finding the portions of metadata split between NameNodes.

If NO, then one application's metadata/namespace information does not get split, rather it stays in the same NameNode's RAM only. The next namenode will be used only when the system visualises a situation where an applications's metadata/namespace informations cannot be fully accomodated in its RAM for successfully executing all of its MapReduce jobs. If it does not fit, then system has to try a NameNode whose RAM capacity is sufficient enough to load the job. So my take here is HDFS federation is just a work around and has not actually solved the issue.

Please confirm.

1 ACCEPTED SOLUTION

avatar
Super Guru
hide-solution

This problem has been solved!

Want to get a detailed solution you have to login/registered on the community

Register/Login
8 REPLIES 8

avatar
Master Guru

avatar
Super Guru

@Fasil Ahamed

Please see my replies below:

1. So the limiting factor is the memory capacity of a NameNodes RAM. HDFS federation comes to its rescue by dividing the metadata/namespace across multiple Namenodes, thereby offloading part of one application's metadata/namespace information that have grown beyond the capacity of a single NameNode's RAM to a second NameNode's RAM.

Is my understanding correct ??

Answer: Yes

2. If YES, then as per documentation, it says that NameNodes are independent and does not communicate each other. In that case who is managing an application's metadata/namespace informations that are resident between two NameNodes?

Answer: No one. Client needs to know the namespace they are connecting to and hence the name node. There are multiple name services in HDFS federation. Your client applications even without HDFS federation today, connect to namenode using a nameservice. They will do the same thing when HDFS federation is enabled and they just need to know which nameservice they are connecting to.

Now, the question is how would Hive or other client tools know where the hive metastore is. What if there are external tables. When you enable HDFS federation, you will use what's called ViewFS which enables you to manage multiple namespaces by mounting different file system locations in different namenodes to their logical mount points similar to linux mount tables. I would highly recommend reading following two links (I like the first link better).

http://thriveschool.blogspot.com/2014/07/hdfs-federation.html

https://hadoop.apache.org/docs/r2.4.1/hadoop-project-dist/hadoop-hdfs/ViewFs.html

avatar
Rising Star

Hi @mqureshi. How are the clients divided up between the namenodes? Can the whole cluster still interact fully? E.g. if pig and hive connect to different nameservices/namenodes can they still operate on the same data on HDFS?

avatar
Super Guru
@Emily Sharpe

That's a great question. I have updated the answer. You would basically use ViewFS which uses mount tables similar to linux to solve the problem of using relative paths in different namespaces without the need of specifying the namenode uri.

I must admit that it gets ugly. That's why unless you have thousands of nodes, this should ideally be avoided. Please share your motivations on considering Federation. May be there is a better and cleaner solution.

avatar
Rising Star

Hi @mqureshi. Thanks for your response. Personally I have no motivation to use Federation I am just curious about it as I see it mentioned occasionally, and I hadn't really come across a concrete example of its practical application and how that would work.

avatar
Explorer

Hi @mqureshi

Understood that the client needs to know upfront the name of the namespace that it want to work with. In other words, the namepace containing the data blocks which are essential for parallely distribute the map task for a particular use case. So the naming of namespace comes into the Hadoop scenes at the time of storing the huge data into the common HDFS storage cluster. The input data will be split into input splits and gets stored in the name of several blocks in each datanodes. All the meatadata informations, that are getting generated as result of storing the input splits into various blocks in the Datanode cluster, will get added to the namespace in the corresponding NameNode. If, say for example, the kind of input file that we want to process is immensely huge such that the inputs splits generated for that file are accordingly huge in number, and the resulting meatadata information for the storage references of the entire input splits in HDFS grows beyond the RAM capacity. In such a scenario, do we have any other option other than the below? Divide the input file such that the metadata informations size in a NameNode are allowed to grow only within the RAM capacity. For example we have Two NameNodes(NN1,NN2) in our cluster which are federated as following NN1 have 8GB RAM NN2 have 6GB RAM Our original input file is so huge that, lets say, its metadata informations is going to occupy 10GB of RAM during HDFS storage. As we know we cannot proceed storing this file in HDFS under a single namespace without partitioning the input file manually. We go by dividing the input file into two equal proportions, such that the first part's metadata references(created during HDFS storage in NN1) occupy only memory less than 8GB RAM,i.e 5GB and the second part's metadata references(created during HDFS storage in NN2) will occupy only memory less than 6GB RAM,i.e 5GB Do we have an auto detection of such use cases and avoid the manual intervention of partitioning the input data here?

avatar
Super Guru
hide-solution

This problem has been solved!

Want to get a detailed solution you have to login/registered on the community

Register/Login

avatar
Explorer

Yes, its my imagination resulted out of looking into the bigger picture theoretically than work related. Also was trying to forsee a situation where it becomes necessary for NameNodes to talk to each other, when they have an application's huge NameSpace data getting partitioned(due to RAM constraints) and resides in other NameNodes. From your explanation and the links provided, its clear that such a situation doesnt exist and is most likely not going to occur in real world scenarios. Thus the need for NameNodes to talk to each other is not at all required.