Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

How do you specify a highly-available HDFS namespace in an Apache Falcon cluster definition

Solved Go to solution
Highlighted

How do you specify a highly-available HDFS namespace in an Apache Falcon cluster definition

for the following interfaces:

<interface type="readonly" endpoint="hftp://<host>:50070"/>

<interface type="write" endpoint="hdfs://<host>:8020" />

If we are pointing to a cluster with HDFS H/A enabled?

1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted

Re: How do you specify a highly-available HDFS namespace in an Apache Falcon cluster definition

Explorer

There are a couple considerations that need to be taken into account when using NN HA with Falcon and Oozie. In all cases, you need to use the Namenode service ID when referring to the Namenode in the cluster xml. This value can be found in hdfs-site.xml in the property dfs.ha.namenodes.[nameservice ID]. For multi-cluster installs, you need to setup all cluster Namenode HA nameservice ID details in all clusters. For example, if you have two clusters, hdfs-site.xml for both cluster one and cluster two will have 2 nameservice IDs. Likewise, for three clusters, all three clusters would have three nameservice IDs. A two-cluster implementation would look similar to the following:

<property>
  <name>dfs.ha.namenodes.hacluster1</name>
  <value>c1nn1,c1nn2</value>
</property>
<property>
  <name>dfs.ha.namenodes.hacluster2</name>
  <value>c2nn1,c2nn2</value>
</property>

Now, when you setup Falcon, provide both cluster definitions on both clusters.

View solution in original post

5 REPLIES 5
Highlighted

Re: How do you specify a highly-available HDFS namespace in an Apache Falcon cluster definition

Rising Star

If the nameservice is "myHA", the interfaces should be "hdfs://myHA".

Re: How do you specify a highly-available HDFS namespace in an Apache Falcon cluster definition

Just to clarify, the cluster in question is different from the one where Falcon is running, i.e. it is a D/R cluster we want to copy data to..

Highlighted

Re: How do you specify a highly-available HDFS namespace in an Apache Falcon cluster definition

Contributor

You can point to it directly via its address, or you can do as @bvellanki (balu) mentioned, and list its HA. For example, if your HA for your backup cluster is called DRHA, your address would be hdfs://DRHA:8020. See below:

<interface type="readonly" endpoint="hftp://DRHA.company.com:50070" version="2.2.0"/>         
<interface type="write" endpoint="hdfs://DRHA.company.com:8020" version="2.2.0"/> 

#You can also do this, depending on preference

<interface type="readonly" endpoint="hftp://DRHA:50070" version="2.2.0"/>         
<interface type="write" endpoint="hdfs://DRHA:8020" version="2.2.0"/> 
Highlighted

Re: How do you specify a highly-available HDFS namespace in an Apache Falcon cluster definition

Explorer

There are a couple considerations that need to be taken into account when using NN HA with Falcon and Oozie. In all cases, you need to use the Namenode service ID when referring to the Namenode in the cluster xml. This value can be found in hdfs-site.xml in the property dfs.ha.namenodes.[nameservice ID]. For multi-cluster installs, you need to setup all cluster Namenode HA nameservice ID details in all clusters. For example, if you have two clusters, hdfs-site.xml for both cluster one and cluster two will have 2 nameservice IDs. Likewise, for three clusters, all three clusters would have three nameservice IDs. A two-cluster implementation would look similar to the following:

<property>
  <name>dfs.ha.namenodes.hacluster1</name>
  <value>c1nn1,c1nn2</value>
</property>
<property>
  <name>dfs.ha.namenodes.hacluster2</name>
  <value>c2nn1,c2nn2</value>
</property>

Now, when you setup Falcon, provide both cluster definitions on both clusters.

View solution in original post

Highlighted

Re: How do you specify a highly-available HDFS namespace in an Apache Falcon cluster definition

Mentor

@dkjerrumgaard has this been resolved? Can you post your solution or accept best answer?

Don't have an account?
Coming from Hortonworks? Activate your account here