Created on 02-27-2018 12:41 PM - edited 08-17-2019 05:30 PM
we are performing now rolling upgrade to hdp-2.6.4
during the upgrade , upgrade was stop about services check isshue
on all services we have the same issue:
Python script has been killed due to timeout after waiting 300 secs
what are the solution to expand the timeout or else in order to avoid this problem ?
Created 02-27-2018 12:58 PM
The default timeout value for python based service checks are defined as 300 seconds, so one option will be to try increasing the value to a higher value like 450 or 600 to see if it work:
# grep -A3 -B2 'service_check.py' /var/lib/ambari-server/resources/common-services/YARN/2.1.0.2.0/metainfo.xml <commandScript> <script>scripts/service_check.py</script> <scriptType>PYTHON</scriptType> <timeout>300</timeout> </commandScript>
.
BUT here the problem looks like there may be some issue from Yarn side, Because normally the service check does not take all the 300 seconds time. So it will be better to check the health of HDFS and Yarn by looking at the logs to see if there are any errors reported.
.
Created 02-27-2018 12:58 PM
The default timeout value for python based service checks are defined as 300 seconds, so one option will be to try increasing the value to a higher value like 450 or 600 to see if it work:
# grep -A3 -B2 'service_check.py' /var/lib/ambari-server/resources/common-services/YARN/2.1.0.2.0/metainfo.xml <commandScript> <script>scripts/service_check.py</script> <scriptType>PYTHON</scriptType> <timeout>300</timeout> </commandScript>
.
BUT here the problem looks like there may be some issue from Yarn side, Because normally the service check does not take all the 300 seconds time. So it will be better to check the health of HDFS and Yarn by looking at the logs to see if there are any errors reported.
.
Created 02-27-2018 01:04 PM
YARN was fail in service check but not HDFS , second when we restart the yarn its restart successfully , so I guess service check can fail in spite service restart completed
Created 02-27-2018 01:07 PM
Yes, service checks can still fail even after successful restart of yarn service. This is because the YARN service checks runs some jobs like following which might fail due to some memory issues (even though RM and NM might be running fine)
Example:
# yarn org.apache.hadoop.yarn.applications.distributedshell.Client -shell_command ls -num_containers 1 -jar /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell.jar -timeout 300000 --queue default
.
So if service check is failijng then we should check the logs to find out why it failed, we might see some errors on the YARN logs indicating memory issue or container creation related issues or something else.