Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Rolling Upgrade proccess stop on service check

avatar

we are performing now rolling upgrade to hdp-2.6.4

during the upgrade , upgrade was stop about services check isshue

on all services we have the same issue:

Python script has been killed due to timeout after waiting 300 secs

what are the solution to expand the timeout or else in order to avoid this problem ?

64412-capture.png

Michael-Bronson
1 ACCEPTED SOLUTION

avatar
Master Mentor

@Michael Bronson

The default timeout value for python based service checks are defined as 300 seconds, so one option will be to try increasing the value to a higher value like 450 or 600 to see if it work:

# grep -A3 -B2 'service_check.py' /var/lib/ambari-server/resources/common-services/YARN/2.1.0.2.0/metainfo.xml 
      <commandScript>
        <script>scripts/service_check.py</script>
        <scriptType>PYTHON</scriptType>
        <timeout>300</timeout>
      </commandScript>

.

BUT here the problem looks like there may be some issue from Yarn side, Because normally the service check does not take all the 300 seconds time. So it will be better to check the health of HDFS and Yarn by looking at the logs to see if there are any errors reported.

.

View solution in original post

3 REPLIES 3

avatar
Master Mentor

@Michael Bronson

The default timeout value for python based service checks are defined as 300 seconds, so one option will be to try increasing the value to a higher value like 450 or 600 to see if it work:

# grep -A3 -B2 'service_check.py' /var/lib/ambari-server/resources/common-services/YARN/2.1.0.2.0/metainfo.xml 
      <commandScript>
        <script>scripts/service_check.py</script>
        <scriptType>PYTHON</scriptType>
        <timeout>300</timeout>
      </commandScript>

.

BUT here the problem looks like there may be some issue from Yarn side, Because normally the service check does not take all the 300 seconds time. So it will be better to check the health of HDFS and Yarn by looking at the logs to see if there are any errors reported.

.

avatar

YARN was fail in service check but not HDFS , second when we restart the yarn its restart successfully , so I guess service check can fail in spite service restart completed

Michael-Bronson

avatar
Master Mentor

@Michael Bronson

Yes, service checks can still fail even after successful restart of yarn service. This is because the YARN service checks runs some jobs like following which might fail due to some memory issues (even though RM and NM might be running fine)

Example:

# yarn org.apache.hadoop.yarn.applications.distributedshell.Client -shell_command ls -num_containers 1 -jar /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell.jar -timeout 300000 --queue default

.

So if service check is failijng then we should check the logs to find out why it failed, we might see some errors on the YARN logs indicating memory issue or container creation related issues or something else.