Support Questions
Find answers, ask questions, and share your expertise

Test HA on ResourceManager

New Contributor

Hi guys,

I'm trying to test the HA on the ResourceManager service.

I have 2 instances of the resourceManager.
All jobs work fine, but as soon as I shutdown the first node with active ResourceManager, the cluster enable the second (standby) ResourceManager.

 

When the second ResourceManager is activated i can not start new jobs.


I am forced to restart the resourceManager on the first node.

For information, I have CDH cluster with Openldap and kerberos server.

Regards,

1 ACCEPTED SOLUTION

Accepted Solutions

New Contributor

Hi guys,

 

I found solution. The HA is not operational with : CDH5 / Kerberos / Isilon.

EMC confirm bug but not found solution.

 

 

View solution in original post

6 REPLIES 6

Super Collaborator

What is the exception / error message shown in the logs when you try to start a new job ?

 

New Contributor

hello Mathieu,

 

I join log message.

 

Logging initialized using configuration in jar:file:/opt/cloudera/parcels/CDH-5.8.0-1.cdh5.8.0.p0.42/jars/hive-common-1.1.0-cdh5.8.0.jar!/hive-log4j.properties
OK
Time taken: 0.51 seconds
Query ID = richard_20170228204141_1d3b3cdd-b064-4b00-80c3-2c42d7bf1a16
Total jobs = 5
Launching Job 1 out of 5
Number of reduce tasks not specified. Estimated from input data size: 17
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
java.io.IOException: org.apache.hadoop.yarn.exceptions.YarnException: Failed to submit application_1488310804413_0001 to YARN : Failed to renew token: Kind: HDFS_DELEGATION_TOKEN, Service: zone.isilon.datalan.lan:8020, Ident: (HDFS_DELEGATION_TOKEN token 0 for richard)
        at org.apache.hadoop.mapred.YARNRunner.submitJob(YARNRunner.java:306)
        at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:244)
        at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1307)
        at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1304)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693)
        at org.apache.hadoop.mapreduce.Job.submit(Job.java:1304)
        at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:578)
        at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:573)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693)
        at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:573)
        at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:564)
        at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:430)
        at org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:137)
        at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160)
        at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100)
        at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1782)
        at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1539)
        at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1318)
        at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1127)
        at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1115)
        at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:220)
        at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:172)
        at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:383)
        at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:318)
        at org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:416)
        at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:432)
        at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:726)
        at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:693)
        at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:628)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: org.apache.hadoop.yarn.exceptions.YarnException: Failed to submit application_1488310804413_0001 to YARN : Failed to renew token: Kind: HDFS_DELEGATION_TOKEN, Service: zone.isilon.datalan.lan:8020, Ident: (HDFS_DELEGATION_TOKEN token 0 for richard)
        at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.submitApplication(YarnClientImpl.java:257)
        at org.apache.hadoop.mapred.ResourceMgrDelegate.submitApplication(ResourceMgrDelegate.java:290)
        at org.apache.hadoop.mapred.YARNRunner.submitJob(YARNRunner.java:290)
        ... 38 more

Champion
The YARN service is unable to get the HDFS Delegation token on behalf of the users.

What values do you have for the below settings

hadoop.proxyuser.yarn.hosts
hadoop.proxyuser.yarn.groups
hadoop.proxyuser.mapred.hosts
hadoop.proxyuser.mapred.groups

New Contributor

I'm not find this value on cloudera manager and is not define in core-site.xml.

New Contributor

Hi guys,

 

I found solution. The HA is not operational with : CDH5 / Kerberos / Isilon.

EMC confirm bug but not found solution.

 

 

View solution in original post

This issue was resolved in OneFS 8.0.0.4.

 

See the release notes for the Resolved Issue.

 

During failover to a secondary ResourceManager, HDFS MapReduce jobs might have been disrupted. This could
have occurred because, during failover, OneFS renegotiated the connection to the ResourceManager using the
same Kerberos ticket but with a different name. As a result, the request to connect to the secondary
ResourceManager could not be authenticated and access was denied.181448