Can I configure the HDFS Federation using AMBARI ? If not how can I configure it into existing cluster which is created using ambari? I mean if possible through some properties or something with command line.
As far as i know it's not a supported feature in HDP
Here's the best resource for federation https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-hdfs/Federation.html
Similar questions have been asked https://community.hortonworks.com/questions/11010/hdfs-federation-and-viewfs-support-for-multiple-cl...
It is not supported here's latest document http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.4.0/bk_HDP_RelNotes/content/community_features.h... clearly stating we do not support community driven features like federation
Short Answer is No as of now
You best shot is Apache guide. offical blog http://hortonworks.com/blog/an-introduction-to-hdfs-federation/
My head hurts thinking of running, at least, two NNs, 3+ JNs, 2 ZKFCs (assuming you could probably use the same ZK instances for each federated NN) for N number of federated members. Better buy a few more master nodes!
On the flip side, I'd always ask the "what are you trying to solve" question. Many times people are imaging security/visibility, but we already can do that with permissions (posix + ACL -- all administered with Ranger ideally). And space issues won't get resolved with Federation -- we still can leverage quotas.
Obviously I don't know the use case or requirement that might be driving a real need for Federation, but the cost is going to be high should we ever pull support for it into HDP and administer it from Ambari and I personally don't think it is worth the cost.
can you please tell me that If I am using java application to request some file upload/download operations to HDFS, Where this request go first? is it zookeeper or namenode because If its namenode than I need to change address(respective namenode address) in my request URL every time when namenode active/passive status changes. So just wondering how could I use this effectively and reliably?
I just want to know for the sake of my knowledge as I am confused about whole architecture of HA.
It would be great if you provide some workflow type diagram or something.
Yes, the "hdfs dfs -put" (or "hadoop fs -put") commands are running a Java application that itself used the Hadoop client libraries. Under the covers, this app is communicating with the NN for each block it needs to write to HDFS and is giving the specific DN names that the NN would like the replica copies to be stored to. Then (again, for each block) the client writes to these DN processes (in a pipelining fashion) to get the actual data for the block to be persisted to disk.
The diagram at http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html#NameNode_and_D... shows some of this interaction and companies like Hortonworks offer solid training to help on concepts like this. Additionally, I'm betting that http://www.amazon.com/Hadoop-Definitive-Guide-Tom-White/dp/1491901632/ has some detailed walkthrough as well.