Created 11-06-2024 11:21 PM
Hi
I have hadoop cluster with namenode resourcemanager on on server, datanode o another server and hive, tez on different server.
I am getting error on running query on beeline - below are the yarn logs - it keeps trying to connect
2024-10-31 15:57:49,806 [INFO] [ServiceThread:org.apache.tez.dag.app.rm.TaskSchedulerManager] |rm.TaskSchedulerManager|: Creating TaskScheduler: Local TaskScheduler with clusterIdentifier=111101111 2024-10-31 15:57:49,813 [INFO] [ServiceThread:org.apache.tez.dag.app.rm.TaskSchedulerManager] |rm.YarnTaskSchedulerService|: YarnTaskScheduler initialized with configuration: maxRMHeartbeatInterval: 1000, containerReuseEnabled: true, reuseRackLocal: true, reuseNonLocal: false, localitySchedulingDelay: 250, preemptionPercentage: 10, preemptionMaxWaitTime: 60000, numHeartbeatsBetweenPreemptions: 3, idleContainerMinTimeout: 5000, idleContainerMaxTimeout: 10000, sessionMinHeldContainers: 0 2024-10-31 15:57:49,817 [INFO] [ServiceThread:org.apache.tez.dag.app.rm.TaskSchedulerManager] |client.RMProxy|: Connecting to ResourceManager at /0.0.0.0:8030 2024-10-31 15:57:50,834 [INFO] [ServiceThread:org.apache.tez.dag.app.rm.TaskSchedulerManager] |ipc.Client|: Retrying connect to server: 0.0.0.0/0.0.0.0:8030. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2024-10-31 15:57:51,836 [INFO] [ServiceThread:org.apache.tez.dag.app.rm.TaskSchedulerManager] |ipc.Client|: Retrying connect to server: 0.0.0.0/0.0.0.0:8030. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2024-10-31 15:57:52,837 [INFO] [ServiceThread:org.apache.tez.dag.app.rm.TaskSchedulerManager] |ipc.Client|: Retrying connect to server: 0.0.0.0/0.0.0.0:8030.
few troubleshoots i have done,
checked the yarn-site.xml file on all instances
hostname and all three addresses are mentioned
<configuration>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>node1</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>node1:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>node1:8030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>node1:8031</value>
</property>
<property>
<name>yarn.nodemanager.address</name>
<value>node1:59392</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.auxservices.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>124491</value>
</property>
<property>
<name>yarn.nodemanager.resource.cpu-vcores</name>
<value>125</value>
</property>
<property>
<name>yarn.scheduler.maximum-allocation-mb</name>
<value>50115</value>
</property>
<property>
<name>yarn.scheduler.maximum-allocation-vcores</name>
<value>54</value>
</property>
</configuration>
also checked
telnet node1 8030
this is working
ping node1
this also works
checked /etc/hosts
this also seems to be fine
Created 11-07-2024 10:29 PM
Thanks for suggestion, the issue has been resolve, we had aaded new datanode after that we had restarted the namenode, resourcemanager, datanode, node manager, but not hiveserver, because of which configuration was not loaded on hive properly, after restart it started working.
Created 11-07-2024 09:41 PM
@rsurti, Welcome to our community! To help you get the best possible answer, I have tagged our experts @asish @udeshmukh who may be able to assist you further.
Please feel free to provide any additional information or details about your query. We hope that you will find a satisfactory solution to your question.
Regards,
Vidya Sargur,Created 11-07-2024 09:56 PM
ApplicationMaster is trying to connect to the ResourceManager on the same host (localhost / any interface, which is 0.0.0.0) and as it cannot connect to the RM it is failing.
2024-10-31 15:57:52,837 [INFO] [ServiceThread:org.apache.tez.dag.app.rm.TaskSchedulerManager] |ipc.Client|: Retrying connect to server: 0.0.0.0/0.0.0.0:8030.
The above suggests a misconfiguration - YARN config files missing / or not having proper contents on those hosts.
Have you performed CM>Yarn> Actions> Deploy Client configurations ? If not, could you try this ?
@VidyaSargur We might need yarn experts in this.
Created 11-07-2024 10:29 PM
Thanks for suggestion, the issue has been resolve, we had aaded new datanode after that we had restarted the namenode, resourcemanager, datanode, node manager, but not hiveserver, because of which configuration was not loaded on hive properly, after restart it started working.