Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Resource manager not allocating the resources even resources available in the cluster

Highlighted

Resource manager not allocating the resources even resources available in the cluster

Super Collaborator

After restarting the Yarn services jobs are running state.

In the Node Manager logs showing error related ats-hbase [time line server] connection refused issue with region server. i have checked the port of region server is 16020. but still this services is trying to connect 17020.

Logs:

2018-10-26 09:36:58,460 INFO launcher.ContainerRelaunch (ContainerRelaunch.java:call(87)) - Relaunch container with workDir = /hadoop/hadoop/yarn/local/usercache/yarn-ats/appcache/application_1538447063871_0001/container_e38_1538447063871_0001_02_000002, logDir = /hadoop/hadoop/yarn/log/application_1538447063871_0001/container_e38_1538447063871_0001_02_000002, nmPrivateContainerScriptPath = /hadoop/yarn/local/nmPrivate/application_1538447063871_0001/container_e38_1538447063871_0001_02_000002/launch_container.sh, nmPrivateTokensPath = /hadoop/hadoop/yarn/local/nmPrivate/application_1538447063871_0001/container_e38_1538447063871_0001_02_000002/container_e38_1538447063871_0001_02_000002.tokens, pidFilePath = /hadoop/hadoop/yarn/local/nmPrivate/application_1538447063871_0001/container_e38_1538447063871_0001_02_000002/container_e38_1538447063871_0001_02_000002.pid 2018-10-26 09:36:58,463 INFO container.ContainerImpl (ContainerImpl.java:handle(2093)) - Container container_e38_1538447063871_0001_02_000002 transitioned from RELAUNCHING to RUNNING 2018-10-26 09:36:58,465 INFO monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:onStartMonitoringContainer(941)) - Starting resource-monitoring for container_e38_1538447063871_0001_02_000002 2018-10-26 09:36:58,465 INFO nodemanager.DefaultContainerExecutor (DefaultContainerExecutor.java:buildCommandExecutor(370)) - launchContainer: [bash, /hadoop/hadoop/yarn/local/usercache/yarn-ats/appcache/application_1538447063871_0001/container_e38_1538447063871_0001_02_000002/default_container_executor.sh] 2018-10-26 09:36:58,518 WARN nodemanager.DefaultContainerExecutor utor.java:logOutput(541)) - Container id: container_e38_1538447063871_0001_02_000002 2018-10-26 09:36:58,518 INFO

018-10-26 09:36:59,657 WARN nodemanager.DefaultContainerExecutor (DefaultContainerExecutor.java:launchContainer(311)) - Exception from container-launch with container ID: container_e38_1538447063871_0001_02_000001 and exit code: 1 ExitCodeException exitCode=1: at org.apache.hadoop.util.Shell.runCommand(Shell.java:1009) at org.apache.hadoop.util.Shell.run(Shell.java:902) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1227) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:294) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.relaunchContainer(DefaultContainerExecutor.java:345) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.relaunchContainer(ContainerLaunch.java:504) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerRelaunch.call(ContainerRelaunch.java:111) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerRelaunch.call(ContainerRelaunch.java:47) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) 2018-10-26 09:36:59,657 INFO nodemanager.ContainerExecutor (ContainerExecutor.java:logOutput(541)) - Exception from container-launch. 2018-10-26 09:36:59,657 INFO nodemanager.ContainerExecutor (ContainerExecutor.java:logOutput(541)) - Container id: container_e38_1538447063871_0001_02_000001 2018-10-26 09:36:59,657 INFO nodemanager.ContainerExecutor (ContainerExecutor.java:logOutput(541)) - Exit code: 1 2018-10-26 09:36:59,657 WARN launcher.ContainerLaunch (ContainerLaunch.java:handleContainerExitWithFailure(598)) - Container launch failed : Container exited with a non-zero exit code 1. 2018-10-26 09:36:59,660 INFO container.ContainerImpl (ContainerImpl.java:doRelaunch(1638)) - Relaunching Container container_e38_1538447063871_0001_02_000001. Remaining retry attempts(after relaunch) : -457. Interval between retries is 30000ms 2018-10-26 09:36:59,660 INFO container.ContainerImpl (ContainerImpl.java:handle(2093)) - Container container_e38_1538447063871_0001_02_000001 transitioned from RUNNING to RELAUNCHING 2018-10-26 09:37:00,389 INFO client.RpcRetryingCallerImpl (RpcRetryingCallerImpl.java:callWithRetries(134)) - Call exception, tries=6, retries=6, started=4192 ms ago, cancelled=false, msg=Call to ip-10-0-10-76.amer.o9solutions.local/10.0.10.76:17020 failed on connection exception: org.apache.hbase.thirdparty.io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: ip-10-0-10-76.amer.o9solutions.local/10.0.10.76:17020, details=row 'prod.timelineservice.entity,yarn-ats!yarn_cluster!ats-hbase����h�<����h����!COMPONENT!!regionserver,99999999999999' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, hostname=ip-10-0-10-76.amer.o9solutions.local,17020,1538447132436, seqNum=-1