Support Questions

rsurti · ‎11-06-2024

Hi

I have hadoop cluster with namenode resourcemanager on on server, datanode o another server and hive, tez on different server.

I am getting error on running query on beeline - below are the yarn logs - it keeps trying to connect

2024-10-31 15:57:49,806 [INFO] [ServiceThread:org.apache.tez.dag.app.rm.TaskSchedulerManager] |rm.TaskSchedulerManager|: Creating TaskScheduler: Local TaskScheduler with clusterIdentifier=111101111
2024-10-31 15:57:49,813 [INFO] [ServiceThread:org.apache.tez.dag.app.rm.TaskSchedulerManager] |rm.YarnTaskSchedulerService|: YarnTaskScheduler initialized with configuration: maxRMHeartbeatInterval: 1000, containerReuseEnabled: true, reuseRackLocal: true, reuseNonLocal: false, localitySchedulingDelay: 250, preemptionPercentage: 10, preemptionMaxWaitTime: 60000, numHeartbeatsBetweenPreemptions: 3, idleContainerMinTimeout: 5000, idleContainerMaxTimeout: 10000, sessionMinHeldContainers: 0
2024-10-31 15:57:49,817 [INFO] [ServiceThread:org.apache.tez.dag.app.rm.TaskSchedulerManager] |client.RMProxy|: Connecting to ResourceManager at /0.0.0.0:8030
2024-10-31 15:57:50,834 [INFO] [ServiceThread:org.apache.tez.dag.app.rm.TaskSchedulerManager] |ipc.Client|: Retrying connect to server: 0.0.0.0/0.0.0.0:8030. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2024-10-31 15:57:51,836 [INFO] [ServiceThread:org.apache.tez.dag.app.rm.TaskSchedulerManager] |ipc.Client|: Retrying connect to server: 0.0.0.0/0.0.0.0:8030. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2024-10-31 15:57:52,837 [INFO] [ServiceThread:org.apache.tez.dag.app.rm.TaskSchedulerManager] |ipc.Client|: Retrying connect to server: 0.0.0.0/0.0.0.0:8030.

few troubleshoots i have done,

checked the yarn-site.xml file on all instances

hostname and all three addresses are mentioned

<configuration>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>node1</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>node1:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>node1:8030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>node1:8031</value>
</property>
<property>
<name>yarn.nodemanager.address</name>
<value>node1:59392</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.auxservices.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>

<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>124491</value>
</property>
<property>
<name>yarn.nodemanager.resource.cpu-vcores</name>
<value>125</value>
</property>
<property>
<name>yarn.scheduler.maximum-allocation-mb</name>
<value>50115</value>
</property>
<property>
<name>yarn.scheduler.maximum-allocation-vcores</name>
<value>54</value>
</property>

</configuration>

also checked

telnet node1 8030

this is working

ping node1

this also works

checked /etc/hosts

this also seems to be fine

rsurti · ‎11-07-2024

Thanks for suggestion, the issue has been resolve, we had aaded new datanode after that we had restarted the namenode, resourcemanager, datanode, node manager, but not hiveserver, because of which configuration was not loaded on hive properly, after restart it started working.

View solution in original post

VidyaSargur · ‎11-07-2024

@rsurti, Welcome to our community! To help you get the best possible answer, I have tagged our experts @asish @udeshmukh who may be able to assist you further.

Please feel free to provide any additional information or details about your query. We hope that you will find a satisfactory solution to your question.

Regards,

Vidya Sargur,
Community Manager

Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.
Learn more about the Cloudera Community:
Community Guidelines
How to use the forum

udeshmukh · ‎11-07-2024

@rsurti

ApplicationMaster is trying to connect to the ResourceManager on the same host (localhost / any interface, which is 0.0.0.0) and as it cannot connect to the RM it is failing.

2024-10-31 15:57:52,837 [INFO] [ServiceThread:org.apache.tez.dag.app.rm.TaskSchedulerManager] |ipc.Client|: Retrying connect to server: 0.0.0.0/0.0.0.0:8030.

The above suggests a misconfiguration - YARN config files missing / or not having proper contents on those hosts.

Have you performed CM>Yarn> Actions> Deploy Client configurations ? If not, could you try this ?

@VidyaSargur We might need yarn experts in this.

rsurti · ‎11-07-2024

Thanks for suggestion, the issue has been resolve, we had aaded new datanode after that we had restarted the namenode, resourcemanager, datanode, node manager, but not hiveserver, because of which configuration was not loaded on hive properly, after restart it started working.

Support Questions

Unable to connect to resourcemanager at 0.0.0.0:8030 when running query on beeline with tez