Created 07-06-2018 01:56 PM
I have realized from following los that Zeppelin doesn't connect YARN Resource manager. I have checked YARN conifgs on Ambari
yarn.resourcemanager.address=hadooptest.datalonga.com:8050
I have checked via telnet and I couldn't connect hadooptest.datalonga.com:8050 either.
Zeppelin seems running but I don't understand why zeppelin log directory are full of these kind of INFO logs?
The logs are taken from /var/log/zeppelin/zeppelin-interpreter-spark2-sirin_erkan-spark-zeppelin-hadooptest01
INFO [2018-07-06 14:45:58,298] ({pool-2-thread-2} Client.java[handleConnectionFailure]:906) - Retrying connect to server: hadooptest.datalonga.com/10.07.07.3:8050. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS) INFO [2018-07-06 14:45:59,299] ({pool-2-thread-2} Client.java[handleConnectionFailure]:906) - Retrying connect to server: hadooptest.datalonga.com/10.07.07.3:8050. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS) INFO [2018-07-06 14:46:00,300] ({pool-2-thread-2} Client.java[handleConnectionFailure]:906) - Retrying connect to server: hadooptest.datalonga.com/10.07.07.3:8050. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS) INFO [2018-07-06 14:46:01,302] ({pool-2-thread-2} Client.java[handleConnectionFailure]:906) - Retrying connect to server: hadooptest.datalonga.com/10.07.07.3:8050. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS) INFO [2018-07-06 14:46:02,303] ({pool-2-thread-2} Client.java[handleConnectionFailure]:906) - Retrying connect to server: hadooptest.datalonga.com/10.07.07.3:8050. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS) INFO [2018-07-06 14:46:03,304] ({pool-2-thread-2} Client.java[handleConnectionFailure]:906) - Retrying connect to server: hadooptest.datalonga.com/10.07.07.3:8050. Already tried 5 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS) INFO [2018-07-06 14:46:04,305] ({pool-2-thread-2} Client.java[handleConnectionFailure]:906) - Retrying connect to server: hadooptest.datalonga.com/10.07.07.3:8050. Already tried 6 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS) INFO [2018-07-06 14:46:05,306] ({pool-2-thread-2} Client.java[handleConnectionFailure]:906) - Retrying connect to server: hadooptest.datalonga.com/10.07.07.3:8050. Already tried 7 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS) INFO [2018-07-06 14:46:06,308] ({pool-2-thread-2} Client.java[handleConnectionFailure]:906) - Retrying connect to server: hadooptest.datalonga.com/10.07.07.3:8050. Already tried 8 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS) INFO [2018-07-06 14:46:07,309] ({pool-2-thread-2} Client.java[handleConnectionFailure]:906) - Retrying connect to server: hadooptest.datalonga.com/10.07.07.3:8050. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS) INFO [2018-07-06 14:46:08,311] ({pool-2-thread-2} Client.java[handleConnectionFailure]:906) - Retrying connect to server: hadooptest.datalonga.com/10.07.07.3:8050. Already tried 10 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS) INFO [2018-07-06 14:46:09,312] ({pool-2-thread-2} Client.java[handleConnectionFailure]:906) - Retrying connect to server: hadooptest.datalonga.com/10.07.07.3:8050. Already tried 11 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS) INFO [2018-07-06 14:46:10,313] ({pool-2-thread-2} Client.java[handleConnectionFailure]:906) - Retrying connect to server: hadooptest.datalonga.com/10.07.07.3:8050. Already tried 12 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS) INFO [2018-07-06 14:46:11,315] ({pool-2-thread-2} Client.java[handleConnectionFailure]:906) - Retrying connect to server: hadooptest.datalonga.com/10.07.07.3:8050. Already tried 13 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS) INFO [2018-07-06 14:46:12,316] ({pool-2-thread-2} Client.java[handleConnectionFailure]:906) - Retrying connect to server: hadooptest.datalonga.com/10.07.07.3:8050. Already tried 14 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS) INFO [2018-07-06 14:46:13,317] ({pool-2-thread-2} Client.java[handleConnectionFailure]:906) - Retrying connect to server: hadooptest.datalonga.com/10.07.07.3:8050. Already tried 15 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS) INFO [2018-07-06 14:46:14,318] ({pool-2-thread-2} Client.java[handleConnectionFailure]:906) - Retrying connect to server: hadooptest.datalonga.com/10.07.07.3:8050. Already tried 16 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS) INFO [2018-07-06 14:46:15,319] ({pool-2-thread-2} Client.java[handleConnectionFailure]:906) - Retrying connect to server: hadooptest.datalonga.com/10.07.07.3:8050. Already tried 17 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS) INFO [2018-07-06 14:46:16,321] ({pool-2-thread-2} Client.java[handleConnectionFailure]:906) - Retrying connect to server: hadooptest.datalonga.com/10.07.07.3:8050. Already tried 18 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS) INFO [2018-07-06 14:46:17,322] ({pool-2-thread-2} Client.java[handleConnectionFailure]:906) - Retrying connect to server: hadooptest.datalonga.com/10.07.07.3:8050. Already tried 19 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS) INFO [2018-07-06 14:46:18,323] ({pool-2-thread-2} Client.java[handleConnectionFailure]:906) - Retrying connect to server: hadooptest.datalonga.com/10.07.07.3:8050. Already tried 20 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS) INFO [2018-07-06 14:46:19,324] ({pool-2-thread-2} Client.java[handleConnectionFailure]:906) - Retrying connect to server: hadooptest.datalonga.com/10.07.07.3:8050. Already tried 21 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS) INFO [2018-07-06 14:46:20,325] ({pool-2-thread-2} Client.java[handleConnectionFailure]:906) - Retrying connect to server: hadooptest.datalonga.com/10.07.07.3:8050. Already tried 22 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS) INFO [2018-07-06 14:46:21,326] ({pool-2-thread-2} Client.java[handleConnectionFailure]:906) - Retrying connect to server: hadooptest.datalonga.com/10.07.07.3:8050. Already tried 23 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS) INFO [2018-07-06 14:46:22,328] ({pool-2-thread-2} Client.java[handleConnectionFailure]:906) - Retrying connect to server: hadooptest.datalonga.com/10.07.07.3:8050. Already tried 24 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS) INFO [2018-07-06 14:46:23,329] ({pool-2-thread-2} Client.java[handleConnectionFailure]:906) - Retrying connect to server: hadooptest.datalonga.com/10.07.07.3:8050. Already tried 25 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS) INFO [2018-07-06 14:46:24,331] ({pool-2-thread-2} Client.java[handleConnectionFailure]:906) - Retrying connect to server: hadooptest.datalonga.com/10.07.07.3:8050. Already tried 26 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS) INFO [2018-07-06 14:46:25,332] ({pool-2-thread-2} Client.java[handleConnectionFailure]:906) - Retrying connect to server: hadooptest.datalonga.com/10.07.07.3:8050. Already tried 27 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS) INFO [2018-07-06 14:46:26,333] ({pool-2-thread-2} Client.java[handleConnectionFailure]:906) - Retrying connect to server: hadooptest.datalonga.com/10.07.07.3:8050. Already tried 28 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
Created 07-06-2018 01:59 PM
Zeppelin UI does not have any requirement for YARN so its expected to start and work fine even if YARN is down.
However, some interpreters like spark/spark2 will launch applications to yarn. Above log is for spark2 interpreter:
/var/log/zeppelin/zeppelin-interpreter-spark2-sirin_erkan-spark-zeppelin-hadooptest01
As part of submission of the spark2 application is normal to see these errors if there is a connectivity issue with yarn.
HTH
*** If you found this answer addressed your question, please take a moment to login and click the "accept" link on the answer.
Created 07-07-2018 04:12 AM
Thank you for your reply @Felix Albani. But the logs are INFO logs not ERROR. The problem is, generation of unnecessary INFO logs due to not connecting resource manager. On my Sandbox resource manager address is sandbox-hdp.hortonworks.com:8032 when I test it with telnet it works fine which does not on my prod env.
[root@sandbox-hdp starter]# telnet sandbox-hdp.hortonworks.com 8032 Trying 172.17.0.2... Connected to sandbox-hdp.hortonworks.com. Escape character is '^]'.
Created 07-09-2018 12:50 PM
@Erkan ŞİRİN yes is info while retrying then it will error once number of retries has exhausted. Check in yarn-site.xml what is your yarn.resourcemanager.address value. Make sure there is no firewall / network issue between zeppelin node and yarn.
Created 07-12-2018 07:49 AM
Hi @Felix Albani. No network or firewall issues. How I find out it? I run following command on ResourceManager host:
netstat -an | grep 8050
Nothing.
So there is no service run on port 8050. So, nothing to listen.
Created 07-12-2018 12:25 PM
@Erkan ŞİRİN in the above description you mentioned that yarn.resourcemanager.address=hadooptest.datalonga.com:8050, so if nothing is listening on that port perhaps RM is not starting correctly? also do you have RM HA?
HTH
Created on 07-16-2018 11:08 AM - edited 08-18-2019 02:17 AM
Hi @Felix Albani I use YARN HA. All YARN services including ResourceManager up and running.
Created 07-16-2018 12:26 PM
@Erkan ŞİRİN Property yarn.resourcemanager.address is redundant when using yarn RM HA.
Please review this link https://community.hortonworks.com/questions/39671/what-is-the-default-yarn-resource-manager-port-is....
Based on this, perhaps you like to check on zeppelin host if the yarn-site.xml contains valid settings. Check under /etc/spark/conf directory to see if there might be a copy of the core-site.xml/yarn-site.xml that might be outdated.
HTH