Support Questions

Find answers, ask questions, and share your expertise

How do you specify a highly-available HDFS namespace in an Apache Falcon cluster definition

avatar
Contributor

for the following interfaces:

<interface type="readonly" endpoint="hftp://<host>:50070"/>

<interface type="write" endpoint="hdfs://<host>:8020" />

If we are pointing to a cluster with HDFS H/A enabled?

1 ACCEPTED SOLUTION

avatar
Contributor

There are a couple considerations that need to be taken into account when using NN HA with Falcon and Oozie. In all cases, you need to use the Namenode service ID when referring to the Namenode in the cluster xml. This value can be found in hdfs-site.xml in the property dfs.ha.namenodes.[nameservice ID]. For multi-cluster installs, you need to setup all cluster Namenode HA nameservice ID details in all clusters. For example, if you have two clusters, hdfs-site.xml for both cluster one and cluster two will have 2 nameservice IDs. Likewise, for three clusters, all three clusters would have three nameservice IDs. A two-cluster implementation would look similar to the following:

<property>
  <name>dfs.ha.namenodes.hacluster1</name>
  <value>c1nn1,c1nn2</value>
</property>
<property>
  <name>dfs.ha.namenodes.hacluster2</name>
  <value>c2nn1,c2nn2</value>
</property>

Now, when you setup Falcon, provide both cluster definitions on both clusters.

View solution in original post

5 REPLIES 5

avatar
Expert Contributor

If the nameservice is "myHA", the interfaces should be "hdfs://myHA".

avatar
Contributor

Just to clarify, the cluster in question is different from the one where Falcon is running, i.e. it is a D/R cluster we want to copy data to..

avatar
Rising Star

You can point to it directly via its address, or you can do as @bvellanki (balu) mentioned, and list its HA. For example, if your HA for your backup cluster is called DRHA, your address would be hdfs://DRHA:8020. See below:

<interface type="readonly" endpoint="hftp://DRHA.company.com:50070" version="2.2.0"/>         
<interface type="write" endpoint="hdfs://DRHA.company.com:8020" version="2.2.0"/> 

#You can also do this, depending on preference

<interface type="readonly" endpoint="hftp://DRHA:50070" version="2.2.0"/>         
<interface type="write" endpoint="hdfs://DRHA:8020" version="2.2.0"/> 

avatar
Contributor

There are a couple considerations that need to be taken into account when using NN HA with Falcon and Oozie. In all cases, you need to use the Namenode service ID when referring to the Namenode in the cluster xml. This value can be found in hdfs-site.xml in the property dfs.ha.namenodes.[nameservice ID]. For multi-cluster installs, you need to setup all cluster Namenode HA nameservice ID details in all clusters. For example, if you have two clusters, hdfs-site.xml for both cluster one and cluster two will have 2 nameservice IDs. Likewise, for three clusters, all three clusters would have three nameservice IDs. A two-cluster implementation would look similar to the following:

<property>
  <name>dfs.ha.namenodes.hacluster1</name>
  <value>c1nn1,c1nn2</value>
</property>
<property>
  <name>dfs.ha.namenodes.hacluster2</name>
  <value>c2nn1,c2nn2</value>
</property>

Now, when you setup Falcon, provide both cluster definitions on both clusters.

avatar
Master Mentor

@dkjerrumgaard has this been resolved? Can you post your solution or accept best answer?