Support Questions
Find answers, ask questions, and share your expertise

How to activate Knox HA for HDFS and WebHDFS ?

Solved Go to solution
Highlighted

How to activate Knox HA for HDFS and WebHDFS ?

Contributor
 
1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted

Re: How to activate Knox HA for HDFS and WebHDFS ?

Do you want to enable Knox High Availability (having multiple Knox instances) or do you want to enable HDFS HA within your Knox instance?

If the latter, check out the WebHdfs HA section on https://knox.apache.org/books/knox-0-6-0/user-guide.html#WebHDFS

To enable HA functionality for WebHDFS in Knox the following configuration has to be added to the topology file:

<provider>
   <role>ha</role>
   <name>HaProvider</name>
   <enabled>true</enabled>
   <param>
       <name>WEBHDFS</name>
       <value>maxFailoverAttempts=3;failoverSleep=1000;maxRetryAttempts=300;retrySleep=1000;enabled=true</value>
   </param>
</provider>

And for the service configuration itself the additional URLs that standby nodes should be added to the list. The active URL (at the time of configuration) should ideally be added to the top of the list.

<service>
    <role>WEBHDFS</role>
    <url>http://{host1}:50070/webhdfs</url>
    <url>http://{host2}:50070/webhdfs</url>
</service>

Let me know if that helps

Jonas =)

View solution in original post

3 REPLIES 3
Highlighted

Re: How to activate Knox HA for HDFS and WebHDFS ?

Do you want to enable Knox High Availability (having multiple Knox instances) or do you want to enable HDFS HA within your Knox instance?

If the latter, check out the WebHdfs HA section on https://knox.apache.org/books/knox-0-6-0/user-guide.html#WebHDFS

To enable HA functionality for WebHDFS in Knox the following configuration has to be added to the topology file:

<provider>
   <role>ha</role>
   <name>HaProvider</name>
   <enabled>true</enabled>
   <param>
       <name>WEBHDFS</name>
       <value>maxFailoverAttempts=3;failoverSleep=1000;maxRetryAttempts=300;retrySleep=1000;enabled=true</value>
   </param>
</provider>

And for the service configuration itself the additional URLs that standby nodes should be added to the list. The active URL (at the time of configuration) should ideally be added to the top of the list.

<service>
    <role>WEBHDFS</role>
    <url>http://{host1}:50070/webhdfs</url>
    <url>http://{host2}:50070/webhdfs</url>
</service>

Let me know if that helps

Jonas =)

View solution in original post

Highlighted

Re: How to activate Knox HA for HDFS and WebHDFS ?

Contributor

Dear Jonas,

Thanks for your reply. Yes we wanted to enable HDFS HA and WebHDFS HA within your Knox instance.

We did follow those steps and it works like a charm for WebHDFS.

I was wondering if there is something else to do following this documentation : http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.2/bk_Knox_Gateway_Admin_Guide/content/service_... and this comment : Both WEBHDFS and NAMENODE require a tag (ha-alias) in order to work in High Availability mode.

Maxime

Highlighted

Re: How to activate Knox HA for HDFS and WebHDFS ?

Explorer

@mlanciaux@hortonworks.com, that part of the documentation needs to be corrected. No such tag <ha_alias> exists in the topology.

Instead the NAMENODE service should have the logical name of the HA service found via dfs.nameservices in hdfs-site.xml as the value of the <url> tag.

So the topology file will have something like this,

<service>
  <role>NAMENODE</role>
  <url>my-ha-service</url>
</service>

where the hdfs-site property and value look like this for example,

<property>
    <name>dfs.nameservices</name>
    <value>my-ha-service</value>
</property>