Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Datanode goes dows after few secs of starting

Solved Go to solution
Highlighted

Re: Datanode goes dows after few secs of starting

top

top - 05:03:36 up  4:41,  1 user,  load average: 0.00, 0.00, 0.00
Tasks: 186 total,   1 running, 185 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.1%us,  0.4%sy,  0.0%ni, 99.5%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:  32877652k total,  1678960k used, 31198692k free,   335884k buffers
Swap:        0k total,        0k used,        0k free,   517928k cached



Highlighted

Re: Datanode goes dows after few secs of starting

Super Mentor

@Punit kumar

Problem seems to be directory permission related:

java.io.IOException: the path component: '/var/lib/hadoop-hdfs' is owned by a user who is not root and not you.  Your effective user id is 0; the path is owned by user id 508, and its permissions are 0751.  Please fix this or select a different socket path.

.

- As the DN log is complaining about the permission on "/var/lib/hadoop-hdfs" so please check what kind of permission do you have there. By default it should be owned by "hdfs:hadoop" as following:

# ls -lart /var/lib/hadoop-hdfs
drwxrwxrwt.  2 hdfs hadoop 4096 Aug 10 11:23 cache
srw-rw-rw-.  1 hdfs hadoop    0 Jan 24 09:09 dn_socket

- It would be best if you compare the permission on this Directory "/var/lib/hadoop-hdfs" from your Working DataNode hosts.

- In order to get more information about this exception, please see the use of "validateSocketPathSecurity0" method:

https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/ap...

.

Highlighted

Re: Datanode goes dows after few secs of starting

yeah that was right

total 4
drwxrwxrwt. 2 hdfs hadoop 4096 Nov 19  2014 cache
srw-rw-rw-. 1 hdfs hadoop    0 Jan 24 03:39 dn_socket


actually non of my datanode host is working.
is that memory issue.
Highlighted

Re: Datanode goes dows after few secs of starting

@Jay SenSharma

now am getting this error in datanode.log

 2017-01-24 03:39:19,891 INFO  common.Storage (Storage.java:tryLock(715)) - Lock on /mnt/disk1/hadoop/hdfs/data/in_use.lock acquired by nodename 1491@datanode.ec2.internal
2017-01-24 03:39:19,902 INFO  common.Storage (Storage.java:tryLock(715)) - Lock on /mnt/disk2/hadoop/hdfs/data/in_use.lock acquired by nodename 1491@datanode.ec2.internal
2017-01-24 03:39:19,903 FATAL datanode.DataNode (BPServiceActor.java:run(840)) - Initialization failed for Block pool <registering> (Datanode Uuid unassigned) service to namenode.ec2.internal/

namenode:8020. Exiting.

java.io.IOException: Incompatible clusterIDs in /mnt/disk1/hadoop/hdfs/data: namenode clusterID = CID-297a140f-7cd6-4c73-afc8-bd0a7d01c0ee; datanode clusterID = CID-7591e6bd-ce9b-4b14-910c-c9603892a0f1 at org.apache.hadoop.hdfs.server.datanode.DataStorage.doTransition(DataStorage.java:646) at org.apache.hadoop.hdfs.server.datanode.DataStorage.addStorageLocations(DataStorage.java:320) at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:403) at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:422) at org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:1311) at org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:1276) at org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:314) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:220) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:828) at java.lang.Thread.run(Thread.java:745) 2017-01-24 03:39:19,904 WARN datanode.DataNode (BPServiceActor.java:run(861)) - Ending block pool service for: Block pool <registering> (Datanode Uuid unassigned) service to ip-172-31-17-251.ec2.internal/172.31.17.251:8020 2017-01-24 03:39:20,005 INFO datanode.DataNode (BlockPoolManager.java:remove(103)) - Removed Block pool <registering> (Datanode Uuid unassigned) 2017-01-24 03:39:22,005 WARN datanode.DataNode (DataNode.java:secureMain(2392)) - Exiting Datanode 2017-01-24 03:39:22,007 INFO util.ExitUtil (ExitUtil.java:terminate(124)) - Exiting with status 0 2017-01-24 03:39:22,008 INFO datanode.DataNode (StringUtils.java:run(659)) - SHUTDOWN_MSG: /************************************************************ SHUTDOWN_MSG: Shutting down DataNode at datanode.ec2.internal/datanode ************************************************************/
Highlighted

Re: Datanode goes dows after few secs of starting

Super Mentor

@Punit kumar

I will suggest you to kill these DataNodes ()if there are any DN daemon processes running) and then try manually starting them as "hdfs" user. to see if those are getting started fine or not? In Parallel put the DataNode log in "tail" so that we can see if it is showing the same error or not ?

Once they come up successfully then next time try from Ambari.

Highlighted

Re: Datanode goes dows after few secs of starting

Super Mentor

@Punit kumar

Regarding your latest error:

java.io.IOException: Incompatible clusterIDs in 
/mnt/disk1/hadoop/hdfs/data: namenode clusterID = 
CID-297a140f-7cd6-4c73-afc8-bd0a7d01c0ee; datanode clusterID = 
CID-7591e6bd-ce9b-4b14-910c-c9603892a0f1 at 

Looks like your VERSION file has different cluster IDs present in NameNode and DataNode that need to be correct. So please check.

cat <dfs.namenode.name.dir>/current/VERSION
cat <dfs.datanode.data.dir>/current/VERSION 

Hence Copy the clusterID from nematode and put it in the VERSION file of datanode and then try again.

Please refer to: http://www.dedunu.info/2015/05/how-to-fix-incompatible-clusterids-in.html

.

View solution in original post

Re: Datanode goes dows after few secs of starting

@Jay SenSharma

Thnx, now its working.

Don't have an account?
Coming from Hortonworks? Activate your account here