Support Questions

Find answers, ask questions, and share your expertise

Datanode goes dows after few secs of starting

avatar
Expert Contributor

Datanode is not staying up on any node of the cluster. I have a seven node cluster with 4 datanodes. What's going on ? Below is what I see when I perform a HDFS check

Traceback (most recent call last):
  File "/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/service_check.py", line 146, in <module>
    HdfsServiceCheck().execute()
  File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 219, in execute
    method(env)
  File "/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/service_check.py", line 67, in service_check
    action="create_on_execute"
  File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", line 154, in __init__
    self.env.run()
  File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 158, in run
    self.run_action(resource, action)
  File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 121, in run_action
    provider_action()
  File "/usr/lib/python2.6/site-packages/resource_management/libraries/providers/hdfs_resource.py", line 402, in action_create_on_execute
    self.action_delayed("create")
  File "/usr/lib/python2.6/site-packages/resource_management/libraries/providers/hdfs_resource.py", line 399, in action_delayed
    self.get_hdfs_resource_executor().action_delayed(action_name, self)
  File "/usr/lib/python2.6/site-packages/resource_management/libraries/providers/hdfs_resource.py", line 255, in action_delayed
    self._create_resource()
  File "/usr/lib/python2.6/site-packages/resource_management/libraries/providers/hdfs_resource.py", line 269, in _create_resource
    self._create_file(self.main_resource.resource.target, source=self.main_resource.resource.source, mode=self.mode)
  File "/usr/lib/python2.6/site-packages/resource_management/libraries/providers/hdfs_resource.py", line 322, in _create_file
    self.util.run_command(target, 'CREATE', method='PUT', overwrite=True, assertable_result=False, file_to_put=source, **kwargs)
  File "/usr/lib/python2.6/site-packages/resource_management/libraries/providers/hdfs_resource.py", line 210, in run_command
    raise Fail(err_msg)
resource_management.core.exceptions.Fail: Execution of 'curl -sS -L -w '%{http_code}' -X PUT -T /etc/passwd 'http://hdp-m.asotc:50070/webhdfs/v1/tmp/id000a1902_date422016?op=CREATE&user.name=hdfs&overwrite=True'' returned status_code=403. 
{
  "RemoteException": {
    "exception": "IOException", 
    "javaClassName": "java.io.IOException", 
    "message": "Failed to find datanode, suggest to check cluster health."
  }
}
1 ACCEPTED SOLUTION

avatar
Expert Contributor

Below is what the log says

ng DataNode with maxLockedMemory = 0
2016-12-20 09:41:03,533 INFO  datanode.DataNode (DataNode.java:initDataXceiver(921)) - Opene                                                                            d streaming server at /0.0.0.0:50010
2016-12-20 09:41:03,537 INFO  datanode.DataNode (DataXceiverServer.java:<init>(76)) - Balanc                                                                            ing bandwith is 6250000 bytes/s
2016-12-20 09:41:03,537 INFO  datanode.DataNode (DataXceiverServer.java:<init>(77)) - Number                                                                             threads for balancing is 5
2016-12-20 09:41:03,542 INFO  datanode.DataNode (DataXceiverServer.java:<init>(76)) - Balanc                                                                            ing bandwith is 6250000 bytes/s
2016-12-20 09:41:03,542 INFO  datanode.DataNode (DataXceiverServer.java:<init>(77)) - Number                                                                             threads for balancing is 5
2016-12-20 09:41:03,542 INFO  datanode.DataNode (DataNode.java:initDataXceiver(936)) - Liste                                                                            ning on UNIX domain socket: /var/lib/hadoop-hdfs/dn_socket
2016-12-20 09:41:03,740 INFO  mortbay.log (Slf4jLog.java:info(67)) - Logging to org.slf4j.im                                                                            pl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog
2016-12-20 09:41:03,780 INFO  server.AuthenticationFilter (AuthenticationFilter.java:constru                                                                            ctSecretProvider(294)) - Unable to initialize FileSignerSecretProvider, falling back to use                                                                             random secrets.
2016-12-20 09:41:03,791 INFO  http.HttpRequestLog (HttpRequestLog.java:getRequestLog(80)) -                                                                             Http request log for http.requests.datanode is not defined
2016-12-20 09:41:03,799 INFO  http.HttpServer2 (HttpServer2.java:addGlobalFilter(710)) - Add                                                                            ed global filter 'safety' (class=org.apache.hadoop.http.HttpServer2$QuotingInputFilter)
2016-12-20 09:41:03,801 INFO  http.HttpServer2 (HttpServer2.java:addFilter(685)) - Added fil                                                                            ter static_user_filter (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilte                                                                            r) to context datanode
2016-12-20 09:41:03,802 INFO  http.HttpServer2 (HttpServer2.java:addFilter(693)) - Added fil                                                                            ter static_user_filter (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilte                                                                            r) to context static
2016-12-20 09:41:03,802 INFO  http.HttpServer2 (HttpServer2.java:addFilter(693)) - Added fil                                                                            ter static_user_filter (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilte                                                                            r) to context logs
2016-12-20 09:41:03,821 INFO  http.HttpServer2 (HttpServer2.java:openListeners(915)) - Jetty                                                                             bound to port 42822
2016-12-20 09:41:03,822 INFO  mortbay.log (Slf4jLog.java:info(67)) - jetty-6.1.26.hwx
2016-12-20 09:41:04,146 INFO  mortbay.log (Slf4jLog.java:info(67)) - Started HttpServer2$Sel                                                                            ectChannelConnectorWithSafeStartup@localhost:42822
2016-12-20 09:41:04,425 INFO  web.DatanodeHttpServer (DatanodeHttpServer.java:start(201)) -                                                                             Listening HTTP traffic on /0.0.0.0:50075
2016-12-20 09:41:04,685 INFO  datanode.DataNode (DataNode.java:startDataNode(1144)) - dnUser                                                                            Name = hdfs
2016-12-20 09:41:04,685 INFO  datanode.DataNode (DataNode.java:startDataNode(1145)) - superg                                                                            roup = hdfs
2016-12-20 09:41:04,770 INFO  ipc.CallQueueManager (CallQueueManager.java:<init>(56)) - Usin                                                                            g callQueue class java.util.concurrent.LinkedBlockingQueue
2016-12-20 09:41:04,804 INFO  ipc.Server (Server.java:run(676)) - Starting Socket Reader #1                                                                             for port 8010
2016-12-20 09:41:04,887 INFO  datanode.DataNode (DataNode.java:initIpcServer(837)) - Opened                                                                             IPC server at /0.0.0.0:8010
2016-12-20 09:41:04,903 INFO  datanode.DataNode (BlockPoolManager.java:refreshNamenodes(152)                                                                            ) - Refresh request received for nameservices: null
2016-12-20 09:41:04,940 INFO  datanode.DataNode (BlockPoolManager.java:doRefreshNamenodes(19                                                                            7)) - Starting BPOfferServices for nameservices: <default>
2016-12-20 09:41:04,964 INFO  datanode.DataNode (BPServiceActor.java:run(814)) - Block pool                                                                             <registering> (Datanode Uuid unassigned) service to hdp-m.asotc/10.0.2.23:8020 starting to o                                                                            ffer service
2016-12-20 09:41:04,989 INFO  ipc.Server (Server.java:run(906)) - IPC Server Responder: star                                                                            ting
2016-12-20 09:41:04,989 INFO  ipc.Server (Server.java:run(746)) - IPC Server listener on 801                                                                            0: starting
2016-12-20 09:41:05,309 INFO  common.Storage (Storage.java:tryLock(715)) - Lock on /hadoop/h                                                                            dfs/data/in_use.lock acquired by nodename 20341@hdp-m.asotc
2016-12-20 09:41:05,312 WARN  common.Storage (DataStorage.java:addStorageLocations(375)) - j                                                                            ava.io.IOException: Incompatible clusterIDs in /hadoop/hdfs/data: namenode clusterID = CID-3                                                                            5394708-aa35-4f25-b43b-0072da288d03; datanode clusterID = CID-d723cf5b-ba4a-43d3-afe1-781149                                                                            930f3e
2016-12-20 09:41:05,313 FATAL datanode.DataNode (BPServiceActor.java:run(833)) - Initializat                                                                            ion failed for Block pool <registering> (Datanode Uuid d0d90f34-c2a9-4e0e-ba5e-237b5820f879)                                                                             service to hdp-m.asotc/10.0.2.23:8020. Exiting.
java.io.IOException: All specified directories are failed to load.
        at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStor                                                                            age.java:477)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:1399)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:1364)
        at org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(B                                                                            POfferService.java:317)
        at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPS                                                                            erviceActor.java:224)
        at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:821                                                                            )
        at java.lang.Thread.run(Thread.java:745)
2016-12-20 09:41:05,315 WARN  datanode.DataNode (BPServiceActor.java:run(854)) - Ending bloc                                                                            k pool service for: Block pool <registering> (Datanode Uuid d0d90f34-c2a9-4e0e-ba5e-237b5820                                                                            f879) service to hdp-m.asotc/10.0.2.23:8020
2016-12-20 09:41:05,420 INFO  datanode.DataNode (BlockPoolManager.java:remove(103)) - Remove                                                                            d Block pool <registering> (Datanode Uuid d0d90f34-c2a9-4e0e-ba5e-237b5820f879)
2016-12-20 09:41:07,421 WARN  datanode.DataNode (DataNode.java:secureMain(2540)) - Exiting D                                                                            atanode
2016-12-20 09:41:07,427 INFO  util.ExitUtil (ExitUtil.java:terminate(124)) - Exiting with st                                                                            atus 0
2016-12-20 09:41:07,430 INFO  datanode.DataNode (LogAdapter.java:info(45)) - SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down DataNode at hdp-m.asotc/10.0.2.23
************************************************************/


View solution in original post

8 REPLIES 8

avatar

HI Prakash,

Error code 403 is normally a Status Forbidden msg. Any specific error message in the log of the datanode?

It could be a DNS issue

avatar

What does `hadoop dfsadmin -report` show?

avatar

Hi Prakash, any issue in the datanode logs?

avatar
Expert Contributor

Below is what the log says

ng DataNode with maxLockedMemory = 0
2016-12-20 09:41:03,533 INFO  datanode.DataNode (DataNode.java:initDataXceiver(921)) - Opene                                                                            d streaming server at /0.0.0.0:50010
2016-12-20 09:41:03,537 INFO  datanode.DataNode (DataXceiverServer.java:<init>(76)) - Balanc                                                                            ing bandwith is 6250000 bytes/s
2016-12-20 09:41:03,537 INFO  datanode.DataNode (DataXceiverServer.java:<init>(77)) - Number                                                                             threads for balancing is 5
2016-12-20 09:41:03,542 INFO  datanode.DataNode (DataXceiverServer.java:<init>(76)) - Balanc                                                                            ing bandwith is 6250000 bytes/s
2016-12-20 09:41:03,542 INFO  datanode.DataNode (DataXceiverServer.java:<init>(77)) - Number                                                                             threads for balancing is 5
2016-12-20 09:41:03,542 INFO  datanode.DataNode (DataNode.java:initDataXceiver(936)) - Liste                                                                            ning on UNIX domain socket: /var/lib/hadoop-hdfs/dn_socket
2016-12-20 09:41:03,740 INFO  mortbay.log (Slf4jLog.java:info(67)) - Logging to org.slf4j.im                                                                            pl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog
2016-12-20 09:41:03,780 INFO  server.AuthenticationFilter (AuthenticationFilter.java:constru                                                                            ctSecretProvider(294)) - Unable to initialize FileSignerSecretProvider, falling back to use                                                                             random secrets.
2016-12-20 09:41:03,791 INFO  http.HttpRequestLog (HttpRequestLog.java:getRequestLog(80)) -                                                                             Http request log for http.requests.datanode is not defined
2016-12-20 09:41:03,799 INFO  http.HttpServer2 (HttpServer2.java:addGlobalFilter(710)) - Add                                                                            ed global filter 'safety' (class=org.apache.hadoop.http.HttpServer2$QuotingInputFilter)
2016-12-20 09:41:03,801 INFO  http.HttpServer2 (HttpServer2.java:addFilter(685)) - Added fil                                                                            ter static_user_filter (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilte                                                                            r) to context datanode
2016-12-20 09:41:03,802 INFO  http.HttpServer2 (HttpServer2.java:addFilter(693)) - Added fil                                                                            ter static_user_filter (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilte                                                                            r) to context static
2016-12-20 09:41:03,802 INFO  http.HttpServer2 (HttpServer2.java:addFilter(693)) - Added fil                                                                            ter static_user_filter (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilte                                                                            r) to context logs
2016-12-20 09:41:03,821 INFO  http.HttpServer2 (HttpServer2.java:openListeners(915)) - Jetty                                                                             bound to port 42822
2016-12-20 09:41:03,822 INFO  mortbay.log (Slf4jLog.java:info(67)) - jetty-6.1.26.hwx
2016-12-20 09:41:04,146 INFO  mortbay.log (Slf4jLog.java:info(67)) - Started HttpServer2$Sel                                                                            ectChannelConnectorWithSafeStartup@localhost:42822
2016-12-20 09:41:04,425 INFO  web.DatanodeHttpServer (DatanodeHttpServer.java:start(201)) -                                                                             Listening HTTP traffic on /0.0.0.0:50075
2016-12-20 09:41:04,685 INFO  datanode.DataNode (DataNode.java:startDataNode(1144)) - dnUser                                                                            Name = hdfs
2016-12-20 09:41:04,685 INFO  datanode.DataNode (DataNode.java:startDataNode(1145)) - superg                                                                            roup = hdfs
2016-12-20 09:41:04,770 INFO  ipc.CallQueueManager (CallQueueManager.java:<init>(56)) - Usin                                                                            g callQueue class java.util.concurrent.LinkedBlockingQueue
2016-12-20 09:41:04,804 INFO  ipc.Server (Server.java:run(676)) - Starting Socket Reader #1                                                                             for port 8010
2016-12-20 09:41:04,887 INFO  datanode.DataNode (DataNode.java:initIpcServer(837)) - Opened                                                                             IPC server at /0.0.0.0:8010
2016-12-20 09:41:04,903 INFO  datanode.DataNode (BlockPoolManager.java:refreshNamenodes(152)                                                                            ) - Refresh request received for nameservices: null
2016-12-20 09:41:04,940 INFO  datanode.DataNode (BlockPoolManager.java:doRefreshNamenodes(19                                                                            7)) - Starting BPOfferServices for nameservices: <default>
2016-12-20 09:41:04,964 INFO  datanode.DataNode (BPServiceActor.java:run(814)) - Block pool                                                                             <registering> (Datanode Uuid unassigned) service to hdp-m.asotc/10.0.2.23:8020 starting to o                                                                            ffer service
2016-12-20 09:41:04,989 INFO  ipc.Server (Server.java:run(906)) - IPC Server Responder: star                                                                            ting
2016-12-20 09:41:04,989 INFO  ipc.Server (Server.java:run(746)) - IPC Server listener on 801                                                                            0: starting
2016-12-20 09:41:05,309 INFO  common.Storage (Storage.java:tryLock(715)) - Lock on /hadoop/h                                                                            dfs/data/in_use.lock acquired by nodename 20341@hdp-m.asotc
2016-12-20 09:41:05,312 WARN  common.Storage (DataStorage.java:addStorageLocations(375)) - j                                                                            ava.io.IOException: Incompatible clusterIDs in /hadoop/hdfs/data: namenode clusterID = CID-3                                                                            5394708-aa35-4f25-b43b-0072da288d03; datanode clusterID = CID-d723cf5b-ba4a-43d3-afe1-781149                                                                            930f3e
2016-12-20 09:41:05,313 FATAL datanode.DataNode (BPServiceActor.java:run(833)) - Initializat                                                                            ion failed for Block pool <registering> (Datanode Uuid d0d90f34-c2a9-4e0e-ba5e-237b5820f879)                                                                             service to hdp-m.asotc/10.0.2.23:8020. Exiting.
java.io.IOException: All specified directories are failed to load.
        at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStor                                                                            age.java:477)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:1399)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:1364)
        at org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(B                                                                            POfferService.java:317)
        at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPS                                                                            erviceActor.java:224)
        at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:821                                                                            )
        at java.lang.Thread.run(Thread.java:745)
2016-12-20 09:41:05,315 WARN  datanode.DataNode (BPServiceActor.java:run(854)) - Ending bloc                                                                            k pool service for: Block pool <registering> (Datanode Uuid d0d90f34-c2a9-4e0e-ba5e-237b5820                                                                            f879) service to hdp-m.asotc/10.0.2.23:8020
2016-12-20 09:41:05,420 INFO  datanode.DataNode (BlockPoolManager.java:remove(103)) - Remove                                                                            d Block pool <registering> (Datanode Uuid d0d90f34-c2a9-4e0e-ba5e-237b5820f879)
2016-12-20 09:41:07,421 WARN  datanode.DataNode (DataNode.java:secureMain(2540)) - Exiting D                                                                            atanode
2016-12-20 09:41:07,427 INFO  util.ExitUtil (ExitUtil.java:terminate(124)) - Exiting with st                                                                            atus 0
2016-12-20 09:41:07,430 INFO  datanode.DataNode (LogAdapter.java:info(45)) - SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down DataNode at hdp-m.asotc/10.0.2.23
************************************************************/


avatar
Incompatible clusterIDs in/hadoop/hdfs/data: namenode clusterID = CID-35394708-aa35-4f25-b43b-0072da288d03; datanode clusterID = CID-d723cf5b-ba4a-43d3-afe1-781149930f3e

ClusterID's need to be the same. This can happen after a reformat of your namenode. You can fix by a format, but you will loose all data `hdfs namenode -format`. Alternatively copy and replace the cluster ID in the version file, explained here: http://www.dedunu.info/2015/05/how-to-fix-incompatible-clusterids-in.html

avatar
Master Guru

avatar
Expert Contributor

@Timothy Spann

Yes I formatted the namenode as namenode was having issue starting at the beginning.

avatar
Expert Contributor

Thank you guys specifically @Ward Bekker. After I formatted the namenode, clusterID got mismatch with DataNode and that also preventing other services to start..