Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

How to activate Knox HA for HDFS and WebHDFS ?

avatar
Rising Star
 
1 ACCEPTED SOLUTION

avatar

Do you want to enable Knox High Availability (having multiple Knox instances) or do you want to enable HDFS HA within your Knox instance?

If the latter, check out the WebHdfs HA section on https://knox.apache.org/books/knox-0-6-0/user-guide.html#WebHDFS

To enable HA functionality for WebHDFS in Knox the following configuration has to be added to the topology file:

<provider>
   <role>ha</role>
   <name>HaProvider</name>
   <enabled>true</enabled>
   <param>
       <name>WEBHDFS</name>
       <value>maxFailoverAttempts=3;failoverSleep=1000;maxRetryAttempts=300;retrySleep=1000;enabled=true</value>
   </param>
</provider>

And for the service configuration itself the additional URLs that standby nodes should be added to the list. The active URL (at the time of configuration) should ideally be added to the top of the list.

<service>
    <role>WEBHDFS</role>
    <url>http://{host1}:50070/webhdfs</url>
    <url>http://{host2}:50070/webhdfs</url>
</service>

Let me know if that helps

Jonas 😃

View solution in original post

3 REPLIES 3

avatar

Do you want to enable Knox High Availability (having multiple Knox instances) or do you want to enable HDFS HA within your Knox instance?

If the latter, check out the WebHdfs HA section on https://knox.apache.org/books/knox-0-6-0/user-guide.html#WebHDFS

To enable HA functionality for WebHDFS in Knox the following configuration has to be added to the topology file:

<provider>
   <role>ha</role>
   <name>HaProvider</name>
   <enabled>true</enabled>
   <param>
       <name>WEBHDFS</name>
       <value>maxFailoverAttempts=3;failoverSleep=1000;maxRetryAttempts=300;retrySleep=1000;enabled=true</value>
   </param>
</provider>

And for the service configuration itself the additional URLs that standby nodes should be added to the list. The active URL (at the time of configuration) should ideally be added to the top of the list.

<service>
    <role>WEBHDFS</role>
    <url>http://{host1}:50070/webhdfs</url>
    <url>http://{host2}:50070/webhdfs</url>
</service>

Let me know if that helps

Jonas 😃

avatar
Rising Star

Dear Jonas,

Thanks for your reply. Yes we wanted to enable HDFS HA and WebHDFS HA within your Knox instance.

We did follow those steps and it works like a charm for WebHDFS.

I was wondering if there is something else to do following this documentation : http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.2/bk_Knox_Gateway_Admin_Guide/content/service_... and this comment : Both WEBHDFS and NAMENODE require a tag (ha-alias) in order to work in High Availability mode.

Maxime

avatar
Explorer

@mlanciaux@hortonworks.com, that part of the documentation needs to be corrected. No such tag <ha_alias> exists in the topology.

Instead the NAMENODE service should have the logical name of the HA service found via dfs.nameservices in hdfs-site.xml as the value of the <url> tag.

So the topology file will have something like this,

<service>
  <role>NAMENODE</role>
  <url>my-ha-service</url>
</service>

where the hdfs-site property and value look like this for example,

<property>
    <name>dfs.nameservices</name>
    <value>my-ha-service</value>
</property>