Created on 04-03-2017 03:57 PM
ISSUE: All of the datanodes in the cluster are shown as down in Ambari dashboard but hdfs is up, available and healthy.
But the output of ps command shows that the service is actually running on the datanodes
[root@node2 ~]# ps -ef|grep datanode hdfs 11150 1 0 Aug09 ? 00:03:01 /usr/jdk64/jdk1.8.0_60/bin/java -Dproc_datanode -Xmx1024m -Dhdp.version=2.3.6.0-3796 -Djava.net.preferIPv4Stack=true -Dhdp.version= -Djava.net.preferIPv4Stack=true -Dhdp.version= -Djava.net.preferIPv4Stack=true -Dhadoop.log.dir=/var/log/hadoop/hdfs -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/usr/hdp/2.3.6.0-3796/hadoop -Dhadoop.id.str=hdfs -Dhadoop.root.logger=INFO,console -Djava.library.path=:/usr/hdp/2.3.6.0-3796/hadoop/lib/native/Linux-amd64-64:/usr/hdp/2.3.6.0-3796/hadoop/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -Dhdp.version=2.3.6.0-3796 -Dhadoop.log.dir=/var/log/hadoop/hdfs -Dhadoop.log.file=hadoop-hdfs-datanode-node2.openstacklocal.log -Dhadoop.home.dir=/usr/hdp/2.3.6.0-3796/hadoop -Dhadoop.id.str=hdfs -Dhadoop.root.logger=INFO,RFA -Djava.library.path=:/usr/hdp/2.3.6.0-3796/hadoop/lib/native/Linux-amd64-64:/usr/hdp/2.3.6.0-3796/hadoop/lib/native:/usr/hdp/2.3.6.0-3796/hadoop/lib/native/Linux-amd64-64:/usr/hdp/2.3.6.0-3796/hadoop/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -server -server -XX:ParallelGCThreads=4 -XX:+UseConcMarkSweepGC -XX:ErrorFile=/var/log/hadoop/hdfs/hs_err_pid%p.log -XX:NewSize=200m -XX:MaxNewSize=200m -Xloggc:/var/log/hadoop/hdfs/gc.log-201608091928 -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -Xms1024m -Xmx1024m -Dhadoop.security.logger=INFO,DRFAS -Dhdfs.audit.logger=INFO,DRFAAUDIT -server -XX:ParallelGCThreads=4 -XX:+UseConcMarkSweepGC -XX:ErrorFile=/var/log/hadoop/hdfs/hs_err_pid%p.log -XX:NewSize=200m -XX:MaxNewSize=200m -Xloggc:/var/log/hadoop/hdfs/gc.log-201608091928 -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -Xms1024m -Xmx1024m -Dhadoop.security.logger=INFO,DRFAS -Dhdfs.audit.logger=INFO,DRFAAUDIT -server -XX:ParallelGCThreads=4 -XX:+UseConcMarkSweepGC -XX:ErrorFile=/var/log/hadoop/hdfs/hs_err_pid%p.log -XX:NewSize=200m -XX:MaxNewSize=200m -Xloggc:/var/log/hadoop/hdfs/gc.log-201608091928 -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -Xms1024m -Xmx1024m -Dhadoop.security.logger=INFO,DRFAS -Dhdfs.audit.logger=INFO,DRFAAUDIT -Dhadoop.security.logger=INFO,RFAS org.apache.hadoop.hdfs.server.datanode.DataNode
SOLUTION: This issue is resolved in Ambari 2.2.2.0.
WORKAROUND: This issue has been resolved in Ambari 2.2.2.0. For previous versions you can resolve this issue by making the following changes
1. Make a backup of the /usr/lib/python2.6/site-packages/ambari_agent/PythonReflectiveExecutor.py script
cp /usr/lib/python2.6/site-packages/ambari_agent/PythonReflectiveExecutor.py /usr/lib/python2.6/site-packages/ambari_agent/PythonReflectiveExecutor.py_ORIG
2. Remove the file /usr/lib/python2.6/site-packages/ambari_agent/PythonReflectiveExecutor.pyc
rm /usr/lib/python2.6/site-packages/ambari_agent/PythonReflectiveExecutor.pyc
3. Edit the /usr/lib/python2.6/site-packages/ambari_agent/PythonReflectiveExecutor.py script and make the following changes.
Change this:
sys.path.append(self.script_dir)to this
sys.path.insert(0, self.script_dir)
4. Restart the ambari-agent on the host
ROOT CAUSE: In this particular case the installation of additional python modules via 'pip install unixODBC-devel' resulted in ambari cached hdfs.py conflicts with python hdfs libraries. This issue is outlined in the following bugs reported by Hortonworks and Apache.
Hortonworks: BUG-54974 - ambari cached hdfs.py conflicts with python hdfs lib resulting into monitoring errors (Resolved in Ambari 2.2.2.0)
Apache Ambari: AMBARI-14926 - ambari cached hdfs.py conflicts with python hdfs lib resulting into monitoring errors(Resolved in Apache Ambari 2.4)