Created 01-24-2017 08:59 AM
Datanode automatically goes down after a few sec on starting from ambari. i check that ambari agent is working.
datanode receives the heartbeat but no commands from namenode.
ambari agent log file.
INFO 2017-01-24 03:44:59,747 PythonExecutor.py:118 - Result: {'structuredOut': {}, 'stdout': '', 'stderr': '', 'exitcode': 1} INFO 2017-01-24 03:45:07,970 Heartbeat.py:78 - Building Heartbeat: {responseId = 210, timestamp = 1485247507970, commandsInProgress = False, componentsMapped = True} INFO 2017-01-24 03:45:08,129 Controller.py:214 - Heartbeat response received (id = 211) INFO 2017-01-24 03:45:08,129 Controller.py:249 - No commands sent from ip-172-31-17-251.ec2.internal INFO 2017-01-24 03:45:18,130 Heartbeat.py:78 - Building Heartbeat: {responseId = 211, timestamp = 1485247518130, commandsInProgress = False, componentsMapped = True} INFO 2017-01-24 03:45:18,274 Controller.py:214 - Heartbeat response received (id = 212) INFO 2017-01-24 03:45:18,274 Controller.py:249 - No commands sent from NAMENODE.ec2.internal
Created 01-24-2017 12:26 PM
Regarding your latest error:
java.io.IOException: Incompatible clusterIDs in /mnt/disk1/hadoop/hdfs/data: namenode clusterID = CID-297a140f-7cd6-4c73-afc8-bd0a7d01c0ee; datanode clusterID = CID-7591e6bd-ce9b-4b14-910c-c9603892a0f1 at
Looks like your VERSION file has different cluster IDs present in NameNode and DataNode that need to be correct. So please check.
cat <dfs.namenode.name.dir>/current/VERSION cat <dfs.datanode.data.dir>/current/VERSION
Hence Copy the clusterID from nematode and put it in the VERSION file of datanode and then try again.
Please refer to: http://www.dedunu.info/2015/05/how-to-fix-incompatible-clusterids-in.html
.
Created 01-24-2017 10:09 AM
top
top - 05:03:36 up 4:41, 1 user, load average: 0.00, 0.00, 0.00 Tasks: 186 total, 1 running, 185 sleeping, 0 stopped, 0 zombie Cpu(s): 0.1%us, 0.4%sy, 0.0%ni, 99.5%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 32877652k total, 1678960k used, 31198692k free, 335884k buffers Swap: 0k total, 0k used, 0k free, 517928k cached
Created 01-24-2017 10:27 AM
Problem seems to be directory permission related:
java.io.IOException: the path component: '/var/lib/hadoop-hdfs' is owned by a user who is not root and not you. Your effective user id is 0; the path is owned by user id 508, and its permissions are 0751. Please fix this or select a different socket path.
.
- As the DN log is complaining about the permission on "/var/lib/hadoop-hdfs" so please check what kind of permission do you have there. By default it should be owned by "hdfs:hadoop" as following:
# ls -lart /var/lib/hadoop-hdfs drwxrwxrwt. 2 hdfs hadoop 4096 Aug 10 11:23 cache srw-rw-rw-. 1 hdfs hadoop 0 Jan 24 09:09 dn_socket
- It would be best if you compare the permission on this Directory "/var/lib/hadoop-hdfs" from your Working DataNode hosts.
- In order to get more information about this exception, please see the use of "validateSocketPathSecurity0" method:
.
Created 01-24-2017 10:40 AM
yeah that was right
total 4 drwxrwxrwt. 2 hdfs hadoop 4096 Nov 19 2014 cache srw-rw-rw-. 1 hdfs hadoop 0 Jan 24 03:39 dn_socket actually non of my datanode host is working. is that memory issue.
Created 01-24-2017 10:52 AM
now am getting this error in datanode.log
2017-01-24 03:39:19,891 INFO common.Storage (Storage.java:tryLock(715)) - Lock on /mnt/disk1/hadoop/hdfs/data/in_use.lock acquired by nodename 1491@datanode.ec2.internal 2017-01-24 03:39:19,902 INFO common.Storage (Storage.java:tryLock(715)) - Lock on /mnt/disk2/hadoop/hdfs/data/in_use.lock acquired by nodename 1491@datanode.ec2.internal 2017-01-24 03:39:19,903 FATAL datanode.DataNode (BPServiceActor.java:run(840)) - Initialization failed for Block pool <registering> (Datanode Uuid unassigned) service to namenode.ec2.internal/
namenode:8020. Exiting.
java.io.IOException: Incompatible clusterIDs in /mnt/disk1/hadoop/hdfs/data: namenode clusterID = CID-297a140f-7cd6-4c73-afc8-bd0a7d01c0ee; datanode clusterID = CID-7591e6bd-ce9b-4b14-910c-c9603892a0f1 at org.apache.hadoop.hdfs.server.datanode.DataStorage.doTransition(DataStorage.java:646) at org.apache.hadoop.hdfs.server.datanode.DataStorage.addStorageLocations(DataStorage.java:320) at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:403) at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:422) at org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:1311) at org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:1276) at org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:314) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:220) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:828) at java.lang.Thread.run(Thread.java:745) 2017-01-24 03:39:19,904 WARN datanode.DataNode (BPServiceActor.java:run(861)) - Ending block pool service for: Block pool <registering> (Datanode Uuid unassigned) service to ip-172-31-17-251.ec2.internal/172.31.17.251:8020 2017-01-24 03:39:20,005 INFO datanode.DataNode (BlockPoolManager.java:remove(103)) - Removed Block pool <registering> (Datanode Uuid unassigned) 2017-01-24 03:39:22,005 WARN datanode.DataNode (DataNode.java:secureMain(2392)) - Exiting Datanode 2017-01-24 03:39:22,007 INFO util.ExitUtil (ExitUtil.java:terminate(124)) - Exiting with status 0 2017-01-24 03:39:22,008 INFO datanode.DataNode (StringUtils.java:run(659)) - SHUTDOWN_MSG: /************************************************************ SHUTDOWN_MSG: Shutting down DataNode at datanode.ec2.internal/datanode ************************************************************/Created 01-24-2017 10:48 AM
I will suggest you to kill these DataNodes ()if there are any DN daemon processes running) and then try manually starting them as "hdfs" user. to see if those are getting started fine or not? In Parallel put the DataNode log in "tail" so that we can see if it is showing the same error or not ?
Once they come up successfully then next time try from Ambari.
Created 01-24-2017 12:26 PM
Regarding your latest error:
java.io.IOException: Incompatible clusterIDs in /mnt/disk1/hadoop/hdfs/data: namenode clusterID = CID-297a140f-7cd6-4c73-afc8-bd0a7d01c0ee; datanode clusterID = CID-7591e6bd-ce9b-4b14-910c-c9603892a0f1 at
Looks like your VERSION file has different cluster IDs present in NameNode and DataNode that need to be correct. So please check.
cat <dfs.namenode.name.dir>/current/VERSION cat <dfs.datanode.data.dir>/current/VERSION
Hence Copy the clusterID from nematode and put it in the VERSION file of datanode and then try again.
Please refer to: http://www.dedunu.info/2015/05/how-to-fix-incompatible-clusterids-in.html
.
Created 01-24-2017 04:12 PM
Thnx, now its working.