Support Questions
Find answers, ask questions, and share your expertise

HDFS and other service checks fail with timeout error

HDFS and other service checks fail with timeout error

Rising Star

I am in the process of installing an 8 node HDP 2.4 cluster, administered with Ambari 2.2. When running service checks in the following services:

HDFS

MapReduce2

Hive

Oozie

I get the following error message:

Python script has been killed due to timeout after waiting 300 secs

That is the only error message shown on stderr or stdout. I have checked here and here and followed the advice given there of increasing the agent task timeout. However this has not improved things at all. Does anybody have any advice about how I can improve this?

2 REPLIES 2

Re: HDFS and other service checks fail with timeout error

Guru

usually when you get a timeout, solution is not to increase the timeout but to find the problem. Check in your logs the command being played by the script and try it by yourself. That should come for many reasons (DNS, iptables, etc.) and each other could be unrelated

Re: HDFS and other service checks fail with timeout error

Explorer

It is usually best to check the slave logs to see if they're still running. If NodeManagers are all down, for example, the YARN-dependent service checks will timeout because a job may be submitted to the ResourceManager, but the jobs aren't actually being run.