Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Datanode keeps stopping after I restart with Ambari

avatar

Single datanode was working fine with Ambari 2.2.0.0, HDP2.3 on top of Centos 7.

Added a 2nd Worker Node...Nodemanager & Metrics Monitor are started. However Datanode/HDFS is stopped....I restart it & it starts without any error but quickly "stops". The stderr has no entries.

I have checked all of the obvious including ntp, iptables . I have stopped & restarted the cluster.

I'd appreciate any help to debug this issues. Thanks in advance, Vince

1 ACCEPTED SOLUTION

avatar

Hi Geoffrey - thanks for your suggestions but it wasn't the cause of my issue.

I fixed my issue by doing the following:

for i in {1..12} ; do echo $i

cd /grid/$i/hadoop/hdfs/data

rm -rf current

done

My problem was that the 12 drives in the DataNode must have had old hdfs data so I deleted it with above script. After that everything worked ok! So lesson learned is that I will ensure the /grid structure is clean before adding any nodes! Thanks!

View solution in original post

5 REPLIES 5

avatar
Master Mentor

@Vincent McGarry

Solution 1

1. Please check that the daemons have write privileges to the log directory

Stop and start namenode and datanode daemons in debug mode,

following command can be used.

Solution 2

You need to do something like this:

bin/stop-all.sh (or stop-dfs.sh and stop-yarn.sh in the 2.x serie) 
rm -Rf /app/tmp/hadoop-your-username/* 
bin/hadoop namenode -format (or hdfs in the 2.x serie) 

Solution 3

/usr/local/hadoop/sbin/hadoop-daemon.sh stop namenode ; hadoop namenode 

On datanode host, Execute the following command

 /usr/local/hadoop/sbin/hadoop-daemon.sh stop datanode ; hadoop datanode 

Check the logs messages from both daemons.

avatar
Master Mentor

Please check your datanode logs in the log directory you specified in ambari UI. Please post the logs here.

avatar

Hi Artem...thanks for your help but I have been able to fix & posted the solution below (had to clean up the hdfs directories).

avatar
Master Mentor

np, glad you got it resolved!

avatar

Hi Geoffrey - thanks for your suggestions but it wasn't the cause of my issue.

I fixed my issue by doing the following:

for i in {1..12} ; do echo $i

cd /grid/$i/hadoop/hdfs/data

rm -rf current

done

My problem was that the 12 drives in the DataNode must have had old hdfs data so I deleted it with above script. After that everything worked ok! So lesson learned is that I will ensure the /grid structure is clean before adding any nodes! Thanks!