Created 03-08-2016 07:26 PM
Single datanode was working fine with Ambari 2.2.0.0, HDP2.3 on top of Centos 7.
Added a 2nd Worker Node...Nodemanager & Metrics Monitor are started. However Datanode/HDFS is stopped....I restart it & it starts without any error but quickly "stops". The stderr has no entries.
I have checked all of the obvious including ntp, iptables . I have stopped & restarted the cluster.
I'd appreciate any help to debug this issues. Thanks in advance, Vince
Created 03-09-2016 01:23 AM
Hi Geoffrey - thanks for your suggestions but it wasn't the cause of my issue.
I fixed my issue by doing the following:
for i in {1..12} ; do echo $i
cd /grid/$i/hadoop/hdfs/data
rm -rf current
done
My problem was that the 12 drives in the DataNode must have had old hdfs data so I deleted it with above script. After that everything worked ok! So lesson learned is that I will ensure the /grid structure is clean before adding any nodes! Thanks!
Created 03-08-2016 07:51 PM
Solution 1
1. Please check that the daemons have write privileges to the log directory
Stop and start namenode and datanode daemons in debug mode,
following command can be used.
Solution 2
You need to do something like this:
bin/stop-all.sh (or stop-dfs.sh and stop-yarn.sh in the 2.x serie)
rm -Rf /app/tmp/hadoop-your-username/*
bin/hadoop namenode -format (or hdfs in the 2.x serie)
Solution 3
/usr/local/hadoop/sbin/hadoop-daemon.sh stop namenode ; hadoop namenode
On datanode host, Execute the following command
/usr/local/hadoop/sbin/hadoop-daemon.sh stop datanode ; hadoop datanode
Check the logs messages from both daemons.
Created 03-08-2016 10:38 PM
Please check your datanode logs in the log directory you specified in ambari UI. Please post the logs here.
Created 03-09-2016 01:48 AM
Hi Artem...thanks for your help but I have been able to fix & posted the solution below (had to clean up the hdfs directories).
Created 03-09-2016 01:51 AM
np, glad you got it resolved!
Created 03-09-2016 01:23 AM
Hi Geoffrey - thanks for your suggestions but it wasn't the cause of my issue.
I fixed my issue by doing the following:
for i in {1..12} ; do echo $i
cd /grid/$i/hadoop/hdfs/data
rm -rf current
done
My problem was that the 12 drives in the DataNode must have had old hdfs data so I deleted it with above script. After that everything worked ok! So lesson learned is that I will ensure the /grid structure is clean before adding any nodes! Thanks!