Created 12-20-2016 02:52 PM
Datanode is not staying up on any node of the cluster. I have a seven node cluster with 4 datanodes. What's going on ? Below is what I see when I perform a HDFS check
Traceback (most recent call last):
  File "/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/service_check.py", line 146, in <module>
    HdfsServiceCheck().execute()
  File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 219, in execute
    method(env)
  File "/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/service_check.py", line 67, in service_check
    action="create_on_execute"
  File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", line 154, in __init__
    self.env.run()
  File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 158, in run
    self.run_action(resource, action)
  File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 121, in run_action
    provider_action()
  File "/usr/lib/python2.6/site-packages/resource_management/libraries/providers/hdfs_resource.py", line 402, in action_create_on_execute
    self.action_delayed("create")
  File "/usr/lib/python2.6/site-packages/resource_management/libraries/providers/hdfs_resource.py", line 399, in action_delayed
    self.get_hdfs_resource_executor().action_delayed(action_name, self)
  File "/usr/lib/python2.6/site-packages/resource_management/libraries/providers/hdfs_resource.py", line 255, in action_delayed
    self._create_resource()
  File "/usr/lib/python2.6/site-packages/resource_management/libraries/providers/hdfs_resource.py", line 269, in _create_resource
    self._create_file(self.main_resource.resource.target, source=self.main_resource.resource.source, mode=self.mode)
  File "/usr/lib/python2.6/site-packages/resource_management/libraries/providers/hdfs_resource.py", line 322, in _create_file
    self.util.run_command(target, 'CREATE', method='PUT', overwrite=True, assertable_result=False, file_to_put=source, **kwargs)
  File "/usr/lib/python2.6/site-packages/resource_management/libraries/providers/hdfs_resource.py", line 210, in run_command
    raise Fail(err_msg)
resource_management.core.exceptions.Fail: Execution of 'curl -sS -L -w '%{http_code}' -X PUT -T /etc/passwd 'http://hdp-m.asotc:50070/webhdfs/v1/tmp/id000a1902_date422016?op=CREATE&user.name=hdfs&overwrite=True'' returned status_code=403. 
{
  "RemoteException": {
    "exception": "IOException", 
    "javaClassName": "java.io.IOException", 
    "message": "Failed to find datanode, suggest to check cluster health."
  }
}
					
				
			
			
				
			
			
			
			
			
			
			
		Created 12-20-2016 03:27 PM
Below is what the log says
ng DataNode with maxLockedMemory = 0
2016-12-20 09:41:03,533 INFO  datanode.DataNode (DataNode.java:initDataXceiver(921)) - Opene                                                                            d streaming server at /0.0.0.0:50010
2016-12-20 09:41:03,537 INFO  datanode.DataNode (DataXceiverServer.java:<init>(76)) - Balanc                                                                            ing bandwith is 6250000 bytes/s
2016-12-20 09:41:03,537 INFO  datanode.DataNode (DataXceiverServer.java:<init>(77)) - Number                                                                             threads for balancing is 5
2016-12-20 09:41:03,542 INFO  datanode.DataNode (DataXceiverServer.java:<init>(76)) - Balanc                                                                            ing bandwith is 6250000 bytes/s
2016-12-20 09:41:03,542 INFO  datanode.DataNode (DataXceiverServer.java:<init>(77)) - Number                                                                             threads for balancing is 5
2016-12-20 09:41:03,542 INFO  datanode.DataNode (DataNode.java:initDataXceiver(936)) - Liste                                                                            ning on UNIX domain socket: /var/lib/hadoop-hdfs/dn_socket
2016-12-20 09:41:03,740 INFO  mortbay.log (Slf4jLog.java:info(67)) - Logging to org.slf4j.im                                                                            pl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog
2016-12-20 09:41:03,780 INFO  server.AuthenticationFilter (AuthenticationFilter.java:constru                                                                            ctSecretProvider(294)) - Unable to initialize FileSignerSecretProvider, falling back to use                                                                             random secrets.
2016-12-20 09:41:03,791 INFO  http.HttpRequestLog (HttpRequestLog.java:getRequestLog(80)) -                                                                             Http request log for http.requests.datanode is not defined
2016-12-20 09:41:03,799 INFO  http.HttpServer2 (HttpServer2.java:addGlobalFilter(710)) - Add                                                                            ed global filter 'safety' (class=org.apache.hadoop.http.HttpServer2$QuotingInputFilter)
2016-12-20 09:41:03,801 INFO  http.HttpServer2 (HttpServer2.java:addFilter(685)) - Added fil                                                                            ter static_user_filter (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilte                                                                            r) to context datanode
2016-12-20 09:41:03,802 INFO  http.HttpServer2 (HttpServer2.java:addFilter(693)) - Added fil                                                                            ter static_user_filter (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilte                                                                            r) to context static
2016-12-20 09:41:03,802 INFO  http.HttpServer2 (HttpServer2.java:addFilter(693)) - Added fil                                                                            ter static_user_filter (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilte                                                                            r) to context logs
2016-12-20 09:41:03,821 INFO  http.HttpServer2 (HttpServer2.java:openListeners(915)) - Jetty                                                                             bound to port 42822
2016-12-20 09:41:03,822 INFO  mortbay.log (Slf4jLog.java:info(67)) - jetty-6.1.26.hwx
2016-12-20 09:41:04,146 INFO  mortbay.log (Slf4jLog.java:info(67)) - Started HttpServer2$Sel                                                                            ectChannelConnectorWithSafeStartup@localhost:42822
2016-12-20 09:41:04,425 INFO  web.DatanodeHttpServer (DatanodeHttpServer.java:start(201)) -                                                                             Listening HTTP traffic on /0.0.0.0:50075
2016-12-20 09:41:04,685 INFO  datanode.DataNode (DataNode.java:startDataNode(1144)) - dnUser                                                                            Name = hdfs
2016-12-20 09:41:04,685 INFO  datanode.DataNode (DataNode.java:startDataNode(1145)) - superg                                                                            roup = hdfs
2016-12-20 09:41:04,770 INFO  ipc.CallQueueManager (CallQueueManager.java:<init>(56)) - Usin                                                                            g callQueue class java.util.concurrent.LinkedBlockingQueue
2016-12-20 09:41:04,804 INFO  ipc.Server (Server.java:run(676)) - Starting Socket Reader #1                                                                             for port 8010
2016-12-20 09:41:04,887 INFO  datanode.DataNode (DataNode.java:initIpcServer(837)) - Opened                                                                             IPC server at /0.0.0.0:8010
2016-12-20 09:41:04,903 INFO  datanode.DataNode (BlockPoolManager.java:refreshNamenodes(152)                                                                            ) - Refresh request received for nameservices: null
2016-12-20 09:41:04,940 INFO  datanode.DataNode (BlockPoolManager.java:doRefreshNamenodes(19                                                                            7)) - Starting BPOfferServices for nameservices: <default>
2016-12-20 09:41:04,964 INFO  datanode.DataNode (BPServiceActor.java:run(814)) - Block pool                                                                             <registering> (Datanode Uuid unassigned) service to hdp-m.asotc/10.0.2.23:8020 starting to o                                                                            ffer service
2016-12-20 09:41:04,989 INFO  ipc.Server (Server.java:run(906)) - IPC Server Responder: star                                                                            ting
2016-12-20 09:41:04,989 INFO  ipc.Server (Server.java:run(746)) - IPC Server listener on 801                                                                            0: starting
2016-12-20 09:41:05,309 INFO  common.Storage (Storage.java:tryLock(715)) - Lock on /hadoop/h                                                                            dfs/data/in_use.lock acquired by nodename 20341@hdp-m.asotc
2016-12-20 09:41:05,312 WARN  common.Storage (DataStorage.java:addStorageLocations(375)) - j                                                                            ava.io.IOException: Incompatible clusterIDs in /hadoop/hdfs/data: namenode clusterID = CID-3                                                                            5394708-aa35-4f25-b43b-0072da288d03; datanode clusterID = CID-d723cf5b-ba4a-43d3-afe1-781149                                                                            930f3e
2016-12-20 09:41:05,313 FATAL datanode.DataNode (BPServiceActor.java:run(833)) - Initializat                                                                            ion failed for Block pool <registering> (Datanode Uuid d0d90f34-c2a9-4e0e-ba5e-237b5820f879)                                                                             service to hdp-m.asotc/10.0.2.23:8020. Exiting.
java.io.IOException: All specified directories are failed to load.
        at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStor                                                                            age.java:477)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:1399)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:1364)
        at org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(B                                                                            POfferService.java:317)
        at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPS                                                                            erviceActor.java:224)
        at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:821                                                                            )
        at java.lang.Thread.run(Thread.java:745)
2016-12-20 09:41:05,315 WARN  datanode.DataNode (BPServiceActor.java:run(854)) - Ending bloc                                                                            k pool service for: Block pool <registering> (Datanode Uuid d0d90f34-c2a9-4e0e-ba5e-237b5820                                                                            f879) service to hdp-m.asotc/10.0.2.23:8020
2016-12-20 09:41:05,420 INFO  datanode.DataNode (BlockPoolManager.java:remove(103)) - Remove                                                                            d Block pool <registering> (Datanode Uuid d0d90f34-c2a9-4e0e-ba5e-237b5820f879)
2016-12-20 09:41:07,421 WARN  datanode.DataNode (DataNode.java:secureMain(2540)) - Exiting D                                                                            atanode
2016-12-20 09:41:07,427 INFO  util.ExitUtil (ExitUtil.java:terminate(124)) - Exiting with st                                                                            atus 0
2016-12-20 09:41:07,430 INFO  datanode.DataNode (LogAdapter.java:info(45)) - SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down DataNode at hdp-m.asotc/10.0.2.23
************************************************************/
					
				
			
			
				
			
			
			
				
			
			
			
			
			
		Created 12-20-2016 02:58 PM
HI Prakash,
Error code 403 is normally a Status Forbidden msg. Any specific error message in the log of the datanode?
It could be a DNS issue
Created 12-20-2016 03:06 PM
What does `hadoop dfsadmin -report` show?
Created 12-20-2016 03:17 PM
Hi Prakash, any issue in the datanode logs?
Created 12-20-2016 03:27 PM
Below is what the log says
ng DataNode with maxLockedMemory = 0
2016-12-20 09:41:03,533 INFO  datanode.DataNode (DataNode.java:initDataXceiver(921)) - Opene                                                                            d streaming server at /0.0.0.0:50010
2016-12-20 09:41:03,537 INFO  datanode.DataNode (DataXceiverServer.java:<init>(76)) - Balanc                                                                            ing bandwith is 6250000 bytes/s
2016-12-20 09:41:03,537 INFO  datanode.DataNode (DataXceiverServer.java:<init>(77)) - Number                                                                             threads for balancing is 5
2016-12-20 09:41:03,542 INFO  datanode.DataNode (DataXceiverServer.java:<init>(76)) - Balanc                                                                            ing bandwith is 6250000 bytes/s
2016-12-20 09:41:03,542 INFO  datanode.DataNode (DataXceiverServer.java:<init>(77)) - Number                                                                             threads for balancing is 5
2016-12-20 09:41:03,542 INFO  datanode.DataNode (DataNode.java:initDataXceiver(936)) - Liste                                                                            ning on UNIX domain socket: /var/lib/hadoop-hdfs/dn_socket
2016-12-20 09:41:03,740 INFO  mortbay.log (Slf4jLog.java:info(67)) - Logging to org.slf4j.im                                                                            pl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog
2016-12-20 09:41:03,780 INFO  server.AuthenticationFilter (AuthenticationFilter.java:constru                                                                            ctSecretProvider(294)) - Unable to initialize FileSignerSecretProvider, falling back to use                                                                             random secrets.
2016-12-20 09:41:03,791 INFO  http.HttpRequestLog (HttpRequestLog.java:getRequestLog(80)) -                                                                             Http request log for http.requests.datanode is not defined
2016-12-20 09:41:03,799 INFO  http.HttpServer2 (HttpServer2.java:addGlobalFilter(710)) - Add                                                                            ed global filter 'safety' (class=org.apache.hadoop.http.HttpServer2$QuotingInputFilter)
2016-12-20 09:41:03,801 INFO  http.HttpServer2 (HttpServer2.java:addFilter(685)) - Added fil                                                                            ter static_user_filter (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilte                                                                            r) to context datanode
2016-12-20 09:41:03,802 INFO  http.HttpServer2 (HttpServer2.java:addFilter(693)) - Added fil                                                                            ter static_user_filter (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilte                                                                            r) to context static
2016-12-20 09:41:03,802 INFO  http.HttpServer2 (HttpServer2.java:addFilter(693)) - Added fil                                                                            ter static_user_filter (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilte                                                                            r) to context logs
2016-12-20 09:41:03,821 INFO  http.HttpServer2 (HttpServer2.java:openListeners(915)) - Jetty                                                                             bound to port 42822
2016-12-20 09:41:03,822 INFO  mortbay.log (Slf4jLog.java:info(67)) - jetty-6.1.26.hwx
2016-12-20 09:41:04,146 INFO  mortbay.log (Slf4jLog.java:info(67)) - Started HttpServer2$Sel                                                                            ectChannelConnectorWithSafeStartup@localhost:42822
2016-12-20 09:41:04,425 INFO  web.DatanodeHttpServer (DatanodeHttpServer.java:start(201)) -                                                                             Listening HTTP traffic on /0.0.0.0:50075
2016-12-20 09:41:04,685 INFO  datanode.DataNode (DataNode.java:startDataNode(1144)) - dnUser                                                                            Name = hdfs
2016-12-20 09:41:04,685 INFO  datanode.DataNode (DataNode.java:startDataNode(1145)) - superg                                                                            roup = hdfs
2016-12-20 09:41:04,770 INFO  ipc.CallQueueManager (CallQueueManager.java:<init>(56)) - Usin                                                                            g callQueue class java.util.concurrent.LinkedBlockingQueue
2016-12-20 09:41:04,804 INFO  ipc.Server (Server.java:run(676)) - Starting Socket Reader #1                                                                             for port 8010
2016-12-20 09:41:04,887 INFO  datanode.DataNode (DataNode.java:initIpcServer(837)) - Opened                                                                             IPC server at /0.0.0.0:8010
2016-12-20 09:41:04,903 INFO  datanode.DataNode (BlockPoolManager.java:refreshNamenodes(152)                                                                            ) - Refresh request received for nameservices: null
2016-12-20 09:41:04,940 INFO  datanode.DataNode (BlockPoolManager.java:doRefreshNamenodes(19                                                                            7)) - Starting BPOfferServices for nameservices: <default>
2016-12-20 09:41:04,964 INFO  datanode.DataNode (BPServiceActor.java:run(814)) - Block pool                                                                             <registering> (Datanode Uuid unassigned) service to hdp-m.asotc/10.0.2.23:8020 starting to o                                                                            ffer service
2016-12-20 09:41:04,989 INFO  ipc.Server (Server.java:run(906)) - IPC Server Responder: star                                                                            ting
2016-12-20 09:41:04,989 INFO  ipc.Server (Server.java:run(746)) - IPC Server listener on 801                                                                            0: starting
2016-12-20 09:41:05,309 INFO  common.Storage (Storage.java:tryLock(715)) - Lock on /hadoop/h                                                                            dfs/data/in_use.lock acquired by nodename 20341@hdp-m.asotc
2016-12-20 09:41:05,312 WARN  common.Storage (DataStorage.java:addStorageLocations(375)) - j                                                                            ava.io.IOException: Incompatible clusterIDs in /hadoop/hdfs/data: namenode clusterID = CID-3                                                                            5394708-aa35-4f25-b43b-0072da288d03; datanode clusterID = CID-d723cf5b-ba4a-43d3-afe1-781149                                                                            930f3e
2016-12-20 09:41:05,313 FATAL datanode.DataNode (BPServiceActor.java:run(833)) - Initializat                                                                            ion failed for Block pool <registering> (Datanode Uuid d0d90f34-c2a9-4e0e-ba5e-237b5820f879)                                                                             service to hdp-m.asotc/10.0.2.23:8020. Exiting.
java.io.IOException: All specified directories are failed to load.
        at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStor                                                                            age.java:477)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:1399)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:1364)
        at org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(B                                                                            POfferService.java:317)
        at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPS                                                                            erviceActor.java:224)
        at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:821                                                                            )
        at java.lang.Thread.run(Thread.java:745)
2016-12-20 09:41:05,315 WARN  datanode.DataNode (BPServiceActor.java:run(854)) - Ending bloc                                                                            k pool service for: Block pool <registering> (Datanode Uuid d0d90f34-c2a9-4e0e-ba5e-237b5820                                                                            f879) service to hdp-m.asotc/10.0.2.23:8020
2016-12-20 09:41:05,420 INFO  datanode.DataNode (BlockPoolManager.java:remove(103)) - Remove                                                                            d Block pool <registering> (Datanode Uuid d0d90f34-c2a9-4e0e-ba5e-237b5820f879)
2016-12-20 09:41:07,421 WARN  datanode.DataNode (DataNode.java:secureMain(2540)) - Exiting D                                                                            atanode
2016-12-20 09:41:07,427 INFO  util.ExitUtil (ExitUtil.java:terminate(124)) - Exiting with st                                                                            atus 0
2016-12-20 09:41:07,430 INFO  datanode.DataNode (LogAdapter.java:info(45)) - SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down DataNode at hdp-m.asotc/10.0.2.23
************************************************************/
					
				
			
			
				
			
			
			
			
			
			
			
		Created 12-20-2016 03:31 PM
Incompatible clusterIDs in/hadoop/hdfs/data: namenode clusterID = CID-35394708-aa35-4f25-b43b-0072da288d03; datanode clusterID = CID-d723cf5b-ba4a-43d3-afe1-781149930f3e
ClusterID's need to be the same. This can happen after a reformat of your namenode. You can fix by a format, but you will loose all data `hdfs namenode -format`. Alternatively copy and replace the cluster ID in the version file, explained here: http://www.dedunu.info/2015/05/how-to-fix-incompatible-clusterids-in.html
Created 12-20-2016 03:34 PM
Is this is a new install? Did you format it?
see:
http://stackoverflow.com/questions/22316187/datanode-not-starts-correctly
http://www.cs.brandeis.edu//~cs147a/lab/hadoop-troubleshooting/
Created 12-20-2016 06:22 PM
Yes I formatted the namenode as namenode was having issue starting at the beginning.
Created 12-20-2016 11:44 PM
Thank you guys specifically @Ward Bekker. After I formatted the namenode, clusterID got mismatch with DataNode and that also preventing other services to start..
 
					
				
				
			
		
