Reply
New Contributor
Posts: 4
Registered: ‎02-28-2017
Accepted Solution

Test HA on ResourceManager

[ Edited ]

Hi guys,

I'm trying to test the HA on the ResourceManager service.

I have 2 instances of the resourceManager.
All jobs work fine, but as soon as I shutdown the first node with active ResourceManager, the cluster enable the second (standby) ResourceManager.

 

When the second ResourceManager is activated i can not start new jobs.


I am forced to restart the resourceManager on the first node.

For information, I have CDH cluster with Openldap and kerberos server.

Regards,

Posts: 153
Topics: 8
Kudos: 15
Solutions: 16
Registered: ‎07-16-2015

Re: Test HA on ResourceManager

What is the exception / error message shown in the logs when you try to start a new job ?

 

New Contributor
Posts: 4
Registered: ‎02-28-2017

Re: Test HA on ResourceManager

hello Mathieu,

 

I join log message.

 

Logging initialized using configuration in jar:file:/opt/cloudera/parcels/CDH-5.8.0-1.cdh5.8.0.p0.42/jars/hive-common-1.1.0-cdh5.8.0.jar!/hive-log4j.properties
OK
Time taken: 0.51 seconds
Query ID = richard_20170228204141_1d3b3cdd-b064-4b00-80c3-2c42d7bf1a16
Total jobs = 5
Launching Job 1 out of 5
Number of reduce tasks not specified. Estimated from input data size: 17
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
java.io.IOException: org.apache.hadoop.yarn.exceptions.YarnException: Failed to submit application_1488310804413_0001 to YARN : Failed to renew token: Kind: HDFS_DELEGATION_TOKEN, Service: zone.isilon.datalan.lan:8020, Ident: (HDFS_DELEGATION_TOKEN token 0 for richard)
        at org.apache.hadoop.mapred.YARNRunner.submitJob(YARNRunner.java:306)
        at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:244)
        at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1307)
        at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1304)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693)
        at org.apache.hadoop.mapreduce.Job.submit(Job.java:1304)
        at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:578)
        at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:573)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693)
        at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:573)
        at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:564)
        at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:430)
        at org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:137)
        at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160)
        at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100)
        at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1782)
        at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1539)
        at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1318)
        at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1127)
        at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1115)
        at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:220)
        at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:172)
        at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:383)
        at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:318)
        at org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:416)
        at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:432)
        at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:726)
        at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:693)
        at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:628)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: org.apache.hadoop.yarn.exceptions.YarnException: Failed to submit application_1488310804413_0001 to YARN : Failed to renew token: Kind: HDFS_DELEGATION_TOKEN, Service: zone.isilon.datalan.lan:8020, Ident: (HDFS_DELEGATION_TOKEN token 0 for richard)
        at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.submitApplication(YarnClientImpl.java:257)
        at org.apache.hadoop.mapred.ResourceMgrDelegate.submitApplication(ResourceMgrDelegate.java:290)
        at org.apache.hadoop.mapred.YARNRunner.submitJob(YARNRunner.java:290)
        ... 38 more
Posts: 642
Topics: 3
Kudos: 103
Solutions: 66
Registered: ‎08-16-2016

Re: Test HA on ResourceManager

The YARN service is unable to get the HDFS Delegation token on behalf of the users.

What values do you have for the below settings

hadoop.proxyuser.yarn.hosts
hadoop.proxyuser.yarn.groups
hadoop.proxyuser.mapred.hosts
hadoop.proxyuser.mapred.groups
New Contributor
Posts: 4
Registered: ‎02-28-2017

Re: Test HA on ResourceManager

I'm not find this value on cloudera manager and is not define in core-site.xml.

New Contributor
Posts: 4
Registered: ‎02-28-2017

Re: Test HA on ResourceManager

Hi guys,

 

I found solution. The HA is not operational with : CDH5 / Kerberos / Isilon.

EMC confirm bug but not found solution.

 

 

Highlighted
Explorer
Posts: 6
Registered: ‎07-30-2015

Re: Test HA on ResourceManager

This issue was resolved in OneFS 8.0.0.4.

 

See the release notes for the Resolved Issue.

 

During failover to a secondary ResourceManager, HDFS MapReduce jobs might have been disrupted. This could
have occurred because, during failover, OneFS renegotiated the connection to the ResourceManager using the
same Kerberos ticket but with a different name. As a result, the request to connect to the secondary
ResourceManager could not be authenticated and access was denied.181448

Announcements