Created 12-20-2016 02:52 PM
Datanode is not staying up on any node of the cluster. I have a seven node cluster with 4 datanodes. What's going on ? Below is what I see when I perform a HDFS check
Traceback (most recent call last): File "/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/service_check.py", line 146, in <module> HdfsServiceCheck().execute() File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 219, in execute method(env) File "/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/service_check.py", line 67, in service_check action="create_on_execute" File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", line 154, in __init__ self.env.run() File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 158, in run self.run_action(resource, action) File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 121, in run_action provider_action() File "/usr/lib/python2.6/site-packages/resource_management/libraries/providers/hdfs_resource.py", line 402, in action_create_on_execute self.action_delayed("create") File "/usr/lib/python2.6/site-packages/resource_management/libraries/providers/hdfs_resource.py", line 399, in action_delayed self.get_hdfs_resource_executor().action_delayed(action_name, self) File "/usr/lib/python2.6/site-packages/resource_management/libraries/providers/hdfs_resource.py", line 255, in action_delayed self._create_resource() File "/usr/lib/python2.6/site-packages/resource_management/libraries/providers/hdfs_resource.py", line 269, in _create_resource self._create_file(self.main_resource.resource.target, source=self.main_resource.resource.source, mode=self.mode) File "/usr/lib/python2.6/site-packages/resource_management/libraries/providers/hdfs_resource.py", line 322, in _create_file self.util.run_command(target, 'CREATE', method='PUT', overwrite=True, assertable_result=False, file_to_put=source, **kwargs) File "/usr/lib/python2.6/site-packages/resource_management/libraries/providers/hdfs_resource.py", line 210, in run_command raise Fail(err_msg) resource_management.core.exceptions.Fail: Execution of 'curl -sS -L -w '%{http_code}' -X PUT -T /etc/passwd 'http://hdp-m.asotc:50070/webhdfs/v1/tmp/id000a1902_date422016?op=CREATE&user.name=hdfs&overwrite=True'' returned status_code=403. { "RemoteException": { "exception": "IOException", "javaClassName": "java.io.IOException", "message": "Failed to find datanode, suggest to check cluster health." } }
Created 12-20-2016 03:27 PM
Below is what the log says
ng DataNode with maxLockedMemory = 0 2016-12-20 09:41:03,533 INFO datanode.DataNode (DataNode.java:initDataXceiver(921)) - Opene d streaming server at /0.0.0.0:50010 2016-12-20 09:41:03,537 INFO datanode.DataNode (DataXceiverServer.java:<init>(76)) - Balanc ing bandwith is 6250000 bytes/s 2016-12-20 09:41:03,537 INFO datanode.DataNode (DataXceiverServer.java:<init>(77)) - Number threads for balancing is 5 2016-12-20 09:41:03,542 INFO datanode.DataNode (DataXceiverServer.java:<init>(76)) - Balanc ing bandwith is 6250000 bytes/s 2016-12-20 09:41:03,542 INFO datanode.DataNode (DataXceiverServer.java:<init>(77)) - Number threads for balancing is 5 2016-12-20 09:41:03,542 INFO datanode.DataNode (DataNode.java:initDataXceiver(936)) - Liste ning on UNIX domain socket: /var/lib/hadoop-hdfs/dn_socket 2016-12-20 09:41:03,740 INFO mortbay.log (Slf4jLog.java:info(67)) - Logging to org.slf4j.im pl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog 2016-12-20 09:41:03,780 INFO server.AuthenticationFilter (AuthenticationFilter.java:constru ctSecretProvider(294)) - Unable to initialize FileSignerSecretProvider, falling back to use random secrets. 2016-12-20 09:41:03,791 INFO http.HttpRequestLog (HttpRequestLog.java:getRequestLog(80)) - Http request log for http.requests.datanode is not defined 2016-12-20 09:41:03,799 INFO http.HttpServer2 (HttpServer2.java:addGlobalFilter(710)) - Add ed global filter 'safety' (class=org.apache.hadoop.http.HttpServer2$QuotingInputFilter) 2016-12-20 09:41:03,801 INFO http.HttpServer2 (HttpServer2.java:addFilter(685)) - Added fil ter static_user_filter (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilte r) to context datanode 2016-12-20 09:41:03,802 INFO http.HttpServer2 (HttpServer2.java:addFilter(693)) - Added fil ter static_user_filter (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilte r) to context static 2016-12-20 09:41:03,802 INFO http.HttpServer2 (HttpServer2.java:addFilter(693)) - Added fil ter static_user_filter (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilte r) to context logs 2016-12-20 09:41:03,821 INFO http.HttpServer2 (HttpServer2.java:openListeners(915)) - Jetty bound to port 42822 2016-12-20 09:41:03,822 INFO mortbay.log (Slf4jLog.java:info(67)) - jetty-6.1.26.hwx 2016-12-20 09:41:04,146 INFO mortbay.log (Slf4jLog.java:info(67)) - Started HttpServer2$Sel ectChannelConnectorWithSafeStartup@localhost:42822 2016-12-20 09:41:04,425 INFO web.DatanodeHttpServer (DatanodeHttpServer.java:start(201)) - Listening HTTP traffic on /0.0.0.0:50075 2016-12-20 09:41:04,685 INFO datanode.DataNode (DataNode.java:startDataNode(1144)) - dnUser Name = hdfs 2016-12-20 09:41:04,685 INFO datanode.DataNode (DataNode.java:startDataNode(1145)) - superg roup = hdfs 2016-12-20 09:41:04,770 INFO ipc.CallQueueManager (CallQueueManager.java:<init>(56)) - Usin g callQueue class java.util.concurrent.LinkedBlockingQueue 2016-12-20 09:41:04,804 INFO ipc.Server (Server.java:run(676)) - Starting Socket Reader #1 for port 8010 2016-12-20 09:41:04,887 INFO datanode.DataNode (DataNode.java:initIpcServer(837)) - Opened IPC server at /0.0.0.0:8010 2016-12-20 09:41:04,903 INFO datanode.DataNode (BlockPoolManager.java:refreshNamenodes(152) ) - Refresh request received for nameservices: null 2016-12-20 09:41:04,940 INFO datanode.DataNode (BlockPoolManager.java:doRefreshNamenodes(19 7)) - Starting BPOfferServices for nameservices: <default> 2016-12-20 09:41:04,964 INFO datanode.DataNode (BPServiceActor.java:run(814)) - Block pool <registering> (Datanode Uuid unassigned) service to hdp-m.asotc/10.0.2.23:8020 starting to o ffer service 2016-12-20 09:41:04,989 INFO ipc.Server (Server.java:run(906)) - IPC Server Responder: star ting 2016-12-20 09:41:04,989 INFO ipc.Server (Server.java:run(746)) - IPC Server listener on 801 0: starting 2016-12-20 09:41:05,309 INFO common.Storage (Storage.java:tryLock(715)) - Lock on /hadoop/h dfs/data/in_use.lock acquired by nodename 20341@hdp-m.asotc 2016-12-20 09:41:05,312 WARN common.Storage (DataStorage.java:addStorageLocations(375)) - j ava.io.IOException: Incompatible clusterIDs in /hadoop/hdfs/data: namenode clusterID = CID-3 5394708-aa35-4f25-b43b-0072da288d03; datanode clusterID = CID-d723cf5b-ba4a-43d3-afe1-781149 930f3e 2016-12-20 09:41:05,313 FATAL datanode.DataNode (BPServiceActor.java:run(833)) - Initializat ion failed for Block pool <registering> (Datanode Uuid d0d90f34-c2a9-4e0e-ba5e-237b5820f879) service to hdp-m.asotc/10.0.2.23:8020. Exiting. java.io.IOException: All specified directories are failed to load. at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStor age.java:477) at org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:1399) at org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:1364) at org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(B POfferService.java:317) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPS erviceActor.java:224) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:821 ) at java.lang.Thread.run(Thread.java:745) 2016-12-20 09:41:05,315 WARN datanode.DataNode (BPServiceActor.java:run(854)) - Ending bloc k pool service for: Block pool <registering> (Datanode Uuid d0d90f34-c2a9-4e0e-ba5e-237b5820 f879) service to hdp-m.asotc/10.0.2.23:8020 2016-12-20 09:41:05,420 INFO datanode.DataNode (BlockPoolManager.java:remove(103)) - Remove d Block pool <registering> (Datanode Uuid d0d90f34-c2a9-4e0e-ba5e-237b5820f879) 2016-12-20 09:41:07,421 WARN datanode.DataNode (DataNode.java:secureMain(2540)) - Exiting D atanode 2016-12-20 09:41:07,427 INFO util.ExitUtil (ExitUtil.java:terminate(124)) - Exiting with st atus 0 2016-12-20 09:41:07,430 INFO datanode.DataNode (LogAdapter.java:info(45)) - SHUTDOWN_MSG: /************************************************************ SHUTDOWN_MSG: Shutting down DataNode at hdp-m.asotc/10.0.2.23 ************************************************************/
Created 12-20-2016 02:58 PM
HI Prakash,
Error code 403 is normally a Status Forbidden msg. Any specific error message in the log of the datanode?
It could be a DNS issue
Created 12-20-2016 03:06 PM
What does `hadoop dfsadmin -report` show?
Created 12-20-2016 03:17 PM
Hi Prakash, any issue in the datanode logs?
Created 12-20-2016 03:27 PM
Below is what the log says
ng DataNode with maxLockedMemory = 0 2016-12-20 09:41:03,533 INFO datanode.DataNode (DataNode.java:initDataXceiver(921)) - Opene d streaming server at /0.0.0.0:50010 2016-12-20 09:41:03,537 INFO datanode.DataNode (DataXceiverServer.java:<init>(76)) - Balanc ing bandwith is 6250000 bytes/s 2016-12-20 09:41:03,537 INFO datanode.DataNode (DataXceiverServer.java:<init>(77)) - Number threads for balancing is 5 2016-12-20 09:41:03,542 INFO datanode.DataNode (DataXceiverServer.java:<init>(76)) - Balanc ing bandwith is 6250000 bytes/s 2016-12-20 09:41:03,542 INFO datanode.DataNode (DataXceiverServer.java:<init>(77)) - Number threads for balancing is 5 2016-12-20 09:41:03,542 INFO datanode.DataNode (DataNode.java:initDataXceiver(936)) - Liste ning on UNIX domain socket: /var/lib/hadoop-hdfs/dn_socket 2016-12-20 09:41:03,740 INFO mortbay.log (Slf4jLog.java:info(67)) - Logging to org.slf4j.im pl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog 2016-12-20 09:41:03,780 INFO server.AuthenticationFilter (AuthenticationFilter.java:constru ctSecretProvider(294)) - Unable to initialize FileSignerSecretProvider, falling back to use random secrets. 2016-12-20 09:41:03,791 INFO http.HttpRequestLog (HttpRequestLog.java:getRequestLog(80)) - Http request log for http.requests.datanode is not defined 2016-12-20 09:41:03,799 INFO http.HttpServer2 (HttpServer2.java:addGlobalFilter(710)) - Add ed global filter 'safety' (class=org.apache.hadoop.http.HttpServer2$QuotingInputFilter) 2016-12-20 09:41:03,801 INFO http.HttpServer2 (HttpServer2.java:addFilter(685)) - Added fil ter static_user_filter (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilte r) to context datanode 2016-12-20 09:41:03,802 INFO http.HttpServer2 (HttpServer2.java:addFilter(693)) - Added fil ter static_user_filter (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilte r) to context static 2016-12-20 09:41:03,802 INFO http.HttpServer2 (HttpServer2.java:addFilter(693)) - Added fil ter static_user_filter (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilte r) to context logs 2016-12-20 09:41:03,821 INFO http.HttpServer2 (HttpServer2.java:openListeners(915)) - Jetty bound to port 42822 2016-12-20 09:41:03,822 INFO mortbay.log (Slf4jLog.java:info(67)) - jetty-6.1.26.hwx 2016-12-20 09:41:04,146 INFO mortbay.log (Slf4jLog.java:info(67)) - Started HttpServer2$Sel ectChannelConnectorWithSafeStartup@localhost:42822 2016-12-20 09:41:04,425 INFO web.DatanodeHttpServer (DatanodeHttpServer.java:start(201)) - Listening HTTP traffic on /0.0.0.0:50075 2016-12-20 09:41:04,685 INFO datanode.DataNode (DataNode.java:startDataNode(1144)) - dnUser Name = hdfs 2016-12-20 09:41:04,685 INFO datanode.DataNode (DataNode.java:startDataNode(1145)) - superg roup = hdfs 2016-12-20 09:41:04,770 INFO ipc.CallQueueManager (CallQueueManager.java:<init>(56)) - Usin g callQueue class java.util.concurrent.LinkedBlockingQueue 2016-12-20 09:41:04,804 INFO ipc.Server (Server.java:run(676)) - Starting Socket Reader #1 for port 8010 2016-12-20 09:41:04,887 INFO datanode.DataNode (DataNode.java:initIpcServer(837)) - Opened IPC server at /0.0.0.0:8010 2016-12-20 09:41:04,903 INFO datanode.DataNode (BlockPoolManager.java:refreshNamenodes(152) ) - Refresh request received for nameservices: null 2016-12-20 09:41:04,940 INFO datanode.DataNode (BlockPoolManager.java:doRefreshNamenodes(19 7)) - Starting BPOfferServices for nameservices: <default> 2016-12-20 09:41:04,964 INFO datanode.DataNode (BPServiceActor.java:run(814)) - Block pool <registering> (Datanode Uuid unassigned) service to hdp-m.asotc/10.0.2.23:8020 starting to o ffer service 2016-12-20 09:41:04,989 INFO ipc.Server (Server.java:run(906)) - IPC Server Responder: star ting 2016-12-20 09:41:04,989 INFO ipc.Server (Server.java:run(746)) - IPC Server listener on 801 0: starting 2016-12-20 09:41:05,309 INFO common.Storage (Storage.java:tryLock(715)) - Lock on /hadoop/h dfs/data/in_use.lock acquired by nodename 20341@hdp-m.asotc 2016-12-20 09:41:05,312 WARN common.Storage (DataStorage.java:addStorageLocations(375)) - j ava.io.IOException: Incompatible clusterIDs in /hadoop/hdfs/data: namenode clusterID = CID-3 5394708-aa35-4f25-b43b-0072da288d03; datanode clusterID = CID-d723cf5b-ba4a-43d3-afe1-781149 930f3e 2016-12-20 09:41:05,313 FATAL datanode.DataNode (BPServiceActor.java:run(833)) - Initializat ion failed for Block pool <registering> (Datanode Uuid d0d90f34-c2a9-4e0e-ba5e-237b5820f879) service to hdp-m.asotc/10.0.2.23:8020. Exiting. java.io.IOException: All specified directories are failed to load. at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStor age.java:477) at org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:1399) at org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:1364) at org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(B POfferService.java:317) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPS erviceActor.java:224) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:821 ) at java.lang.Thread.run(Thread.java:745) 2016-12-20 09:41:05,315 WARN datanode.DataNode (BPServiceActor.java:run(854)) - Ending bloc k pool service for: Block pool <registering> (Datanode Uuid d0d90f34-c2a9-4e0e-ba5e-237b5820 f879) service to hdp-m.asotc/10.0.2.23:8020 2016-12-20 09:41:05,420 INFO datanode.DataNode (BlockPoolManager.java:remove(103)) - Remove d Block pool <registering> (Datanode Uuid d0d90f34-c2a9-4e0e-ba5e-237b5820f879) 2016-12-20 09:41:07,421 WARN datanode.DataNode (DataNode.java:secureMain(2540)) - Exiting D atanode 2016-12-20 09:41:07,427 INFO util.ExitUtil (ExitUtil.java:terminate(124)) - Exiting with st atus 0 2016-12-20 09:41:07,430 INFO datanode.DataNode (LogAdapter.java:info(45)) - SHUTDOWN_MSG: /************************************************************ SHUTDOWN_MSG: Shutting down DataNode at hdp-m.asotc/10.0.2.23 ************************************************************/
Created 12-20-2016 03:31 PM
Incompatible clusterIDs in/hadoop/hdfs/data: namenode clusterID = CID-35394708-aa35-4f25-b43b-0072da288d03; datanode clusterID = CID-d723cf5b-ba4a-43d3-afe1-781149930f3e
ClusterID's need to be the same. This can happen after a reformat of your namenode. You can fix by a format, but you will loose all data `hdfs namenode -format`. Alternatively copy and replace the cluster ID in the version file, explained here: http://www.dedunu.info/2015/05/how-to-fix-incompatible-clusterids-in.html
Created 12-20-2016 03:34 PM
Is this is a new install? Did you format it?
see:
http://stackoverflow.com/questions/22316187/datanode-not-starts-correctly
http://www.cs.brandeis.edu//~cs147a/lab/hadoop-troubleshooting/
Created 12-20-2016 06:22 PM
Yes I formatted the namenode as namenode was having issue starting at the beginning.
Created 12-20-2016 11:44 PM
Thank you guys specifically @Ward Bekker. After I formatted the namenode, clusterID got mismatch with DataNode and that also preventing other services to start..