Support Questions

Find answers, ask questions, and share your expertise

Data Node Not Starting and Not showing any error logs

avatar
Expert Contributor

Data node is not starting and it is not giving any error logs in logs file.

error logs:-

Traceback (most recent call last):
  File "/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/datanode.py", line 167, in <module>
    DataNode().execute()
  File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 219, in execute
    method(env)
  File "/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/datanode.py", line 62, in start
    datanode(action="start")
  File "/usr/lib/python2.6/site-packages/ambari_commons/os_family_impl.py", line 89, in thunk
    return fn(*args, **kwargs)
  File "/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/hdfs_datanode.py", line 72, in datanode
    create_log_dir=True
  File "/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/utils.py", line 267, in service
    Execute(daemon_cmd, not_if=process_id_exists_command, environment=hadoop_env_exports)
  File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", line 154, in __init__
    self.env.run()
  File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 158, in run
    self.run_action(resource, action)
  File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 121, in run_action
    provider_action()
  File "/usr/lib/python2.6/site-packages/resource_management/core/providers/system.py", line 238, in action_run
    tries=self.resource.tries, try_sleep=self.resource.try_sleep)
  File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 70, in inner
    result = function(command, **kwargs)
  File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 92, in checked_call
    tries=tries, try_sleep=try_sleep)
  File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 140, in _call_wrapper
    result = _call(command, **kwargs_copy)
  File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 291, in _call
    raise Fail(err_msg) 


resource_management.core.exceptions.Fail: Execution of 'ambari-sudo.sh su hdfs -l -s /bin/bash -c 'ulimit -c unlimited ;  /usr/hdp/current/hadoop-client/sbin/hadoop-daemon.sh --config /usr/hdp/current/hadoop-client/conf start datanode'' returned 1. starting datanode, logging to /data/log/hadoop/hdfs/hadoop-hdfs-datanode-hostname-out

in /var/log/hadoop/hdfs/hadoop-hdfs-datanode.log

at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:2411)
at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:2298)
at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:2345)
at org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:2526)
at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:2550)
2016-05-04 17:42:04,139 INFO  util.ExitUtil (ExitUtil.java:terminate(124)) - Exiting with status 1
2016-05-04 17:42:04,140 INFO  datanode.DataNode (LogAdapter.java:info(45)) - SHUTDOWN_MSG: 
/************************************************************
SHUTDOWN_MSG: Shutting down DataNode at FQDN/IP



When i start the datanode through ambari i dont see any logs in datanode log file.

In /data/log/hadoop/hdfs/hadoop-hdfs-datanode-hostname-out

core file size          (blocks, -c) 0 data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0 file size               (blocks, -f) unlimited
pending signals                 (-i) 63785
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 1024
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 63785
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited
/data/log/hadoop/hdfs/hadoop-hdfs-datanode-D-9539.out: line 2: syntax error near unexpected token `('
/data/log/hadoop/hdfs/hadoop-hdfs-datanode-D-9539.out: line 2: `core file size          (blocks, -c) unlimited'


Please suggest me.

Mohan.V

5 REPLIES 5

avatar
Master Guru

@Mohan V

Can you please try to start Datanode manually (without Ambari) with DEBUG logs?

Here is the command

1. Login to problematic Datanode by 'hdfs' user

2. Run below commands:

#Command1

export HADOOP_ROOT_LOGGER=DEBUG,console

#Command2

hdfs datanode

Note - This will print output on screen and will try to start your Datanode, please do not press 'ctrl+c' until you get ERROR/Exception 🙂

Hope this information helps you to troubleshoot your issue! Happy Hadooping 🙂

avatar
Expert Contributor

thanks for the reply kuldeep.

i tried what you have suggested.

I got the following output.

16/12/01 11:27:49 DEBUG sasl.DataTransferSaslUtil: DataTransferProtocol not using SaslPropertiesResolver, no QOP found in configuration for dfs.data.transfer.protection

16/12/01 11:27:49 INFO datanode.DataNode: Starting DataNode with maxLockedMemory = 0

16/12/01 11:27:49 INFO datanode.DataNode: Opened streaming server at /0.0.0.0:50010

16/12/01 11:27:49 INFO datanode.DataNode: Balancing bandwith is 6250000 bytes/s

16/12/01 11:27:49 INFO datanode.DataNode: Number threads for balancing is 5

16/12/01 11:27:49 INFO datanode.DataNode: Shutdown complete.

16/12/01 11:27:49 FATAL datanode.DataNode: Exception in secureMain

java.io.IOException: the path component: '/' is world-writable. Its permissions are 0777. Please fix this or select a different socket path.

at org.apache.hadoop.net.unix.DomainSocket.validateSocketPathSecurity0(Native Method)

at org.apache.hadoop.net.unix.DomainSocket.bindAndListen(DomainSocket.java:189)

at org.apache.hadoop.hdfs.net.DomainPeerServer.<init>(DomainPeerServer.java:40)

at org.apache.hadoop.hdfs.server.datanode.DataNode.getDomainPeerServer(DataNode.java:965)

at org.apache.hadoop.hdfs.server.datanode.DataNode.initDataXceiver(DataNode.java:931)

at org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:1134)

at org.apache.hadoop.hdfs.server.datanode.DataNode.<init>(DataNode.java:430)

at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:2411)

at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:2298)

at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:2345)

at org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:2526)

at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:2550)

16/12/01 11:27:49 INFO util.ExitUtil: Exiting with status 1

16/12/01 11:27:49 INFO datanode.DataNode: SHUTDOWN_MSG:

/************************************************************

SHUTDOWN_MSG: Shutting down DataNode at d-9539.kpit.com/10.10.167.160

as i have googled that error,here

http://grokbase.com/t/cloudera/scm-users/143a6q05g6/data-node-failed-to-start

sugested to change the permissions of /(root).

and when i did it still the datanode not started infact now it giving below error.

 File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 291, in _call
    raise Fail(err_msg)
resource_management.core.exceptions.Fail: Execution of 'ambari-sudo.sh su hdfs -l -s /bin/bash -c 'ulimit -c unlimited ;  /usr/hdp/current/hadoop-client/sbin/hadoop-daemon.sh --config /usr/hdp/current/hadoop-client/conf start datanode'' returned 1. /etc/profile: line 45: /dev/null: Permission denied
/etc/profile: line 70: /dev/null: Permission denied
/etc/profile: line 70: /dev/null: Permission denied
/etc/profile: line 70: /dev/null: Permission denied
/etc/profile: line 70: /dev/null: Permission denied

/etc/profile: line 70: /dev/null: Permission denied
/etc/profile: line 70: /dev/null: Permission denied
/etc/profile: line 70: /dev/null: Permission denied
/etc/profile: line 70: /dev/null: Permission denied
/etc/profile: line 70: /dev/null: Permission denied
/etc/profile: line 70: /dev/null: Permission denied
/etc/profile: line 70: /dev/null: Permission denied
/etc/profile: line 70: /dev/null: Permission denied
/etc/profile: line 70: /dev/null: Permission denied
/etc/profile: line 70: /dev/null: Permission denied
/etc/profile: line 70: /dev/null: Permission denied
-bash: /dev/null: Permission denied
/usr/hdp/current/hadoop-client/conf/hadoop-env.sh: line 100: /dev/null: Permission denied
ls: write error: Broken pipe
/usr/hdp/2.3.4.7-4/hadoop/libexec/hadoop-config.sh: line 155: /dev/null: Permission denied
/usr/hdp/current/hadoop-client/conf/hadoop-env.sh: line 100: /dev/null: Permission denied
ls: write error: Broken pipe
starting datanode, logging to /data/log/hadoop/hdfs/hadoop-hdfs-datanode-.out
/usr/hdp/2.3.4.7-4//hadoop-hdfs/bin/hdfs.distro: line 30: /dev/null: Permission denied
/usr/hdp/current/hadoop-client/conf/hadoop-env.sh: line 100: /dev/null: Permission denied
ls: write error: Broken pipe
/usr/hdp/2.3.4.7-4/hadoop/libexec/hadoop-config.sh: line 155: /dev/null: Permission denied
/usr/hdp/current/hadoop-client/sbin/hadoop-daemon.sh: line 187: /dev/null: Permission denied

avatar
Expert Contributor

i changed the permissions of the files from above by the reference of other cluster.

then agian i troed the command hdfs datanode

i got the follwing error in logs

16/12/01 13:13:22 INFO datanode.DataNode: Shutdown complete. 16/12/01 13:13:22 FATAL datanode.DataNode: Exception in secureMain java.io.IOException: the path component: '/var/lib/hadoop-hdfs' is owned by a user who is not root and not you. Your effective user id is 0; the path is owned by user id 1005, and its permissions are 0751. Please fix this or select a different socket path. at org.apache.hadoop.net.unix.DomainSocket.validateSocketPathSecurity0(Native Method) at org.apache.hadoop.net.unix.DomainSocket.bindAndListen(DomainSocket.java:189) at org.apache.hadoop.hdfs.net.DomainPeerServer.<init>(DomainPeerServer.java:40) at org.apache.hadoop.hdfs.server.datanode.DataNode.getDomainPeerServer(DataNode.java:965) at org.apache.hadoop.hdfs.server.datanode.DataNode.initDataXceiver(DataNode.java:931) at org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:1134) at org.apache.hadoop.hdfs.server.datanode.DataNode.<init>(DataNode.java:430) at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:2411) at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:2298) at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:2345) at org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:2526) at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:2550) 16/12/01 13:13:22 INFO util.ExitUtil: Exiting with status 1 16/12/01 13:13:22 INFO datanode.DataNode: SHUTDOWN_MSG:

i changed the hadoop-hdfs owner to root, but still getting the same issue.

any suggestions.

avatar

@Mohan V

Few strange things in the out file.

1). Your "/data/log/hadoop/hdfs/hadoop-hdfs-datanode-hostname-out" output shows the following

/data/log/hadoop/hdfs/hadoop-hdfs-datanode-D-9539.out: line 2: syntax error near unexpected token `('
/data/log/hadoop/hdfs/hadoop-hdfs-datanode-D-9539.out: line 2: `core file size          (blocks, -c) unlimited'

Which indicates some bad syntax/characters present in the hadoop scripts. Specially "/usr/hdp/current/hadoop-client/sbin/hadoop-daemon.sh" So can you please check the mentioned script on the Host where you were trying to start the DataNode?

2). The permission for the "/var/lib/hadoop-hdfs" should be ideally following with the ownership as "hdfs:hadoop"

drwxr-x--x.  3 hdfs     hadoop   4.0K Dec  1 07:44 hadoop-hdfs

3). Regarding the error which you mentioned in the recent comment as "java.io.IOException: the path component: '/' is world-writable. Its permissions are 0777. Please fix this or select a different socket path." its description is available at:

https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/ap...

It is to validate that the path chosen for a UNIX domain socket is secure. A socket path is secure if it doesn't allow unprivileged users to perform a man-in-the-middle attack against it. As an example one way to perform a man-in-the-middle attack would be for a malicious user to move the server socket out of the way and create his own socket in the same place.

More info: https://wiki.apache.org/hadoop/SocketPathSecurity

What is needed? So setting the correct permission for the / is needed here.

Please see: http://stackoverflow.com/questions/22300487/filed-to-start-data-node-in-hadoop-cluster

.

avatar
Expert Contributor

thanks for the reply jss.

i have tried all what you have suggested already.

but still getting the same issue.

when i start the datanode through ambari ui follwoing error is occured,

File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 92, in checked_call
    tries=tries, try_sleep=try_sleep)
  File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 140, in _call_wrapper
    result = _call(command, **kwargs_copy)
  File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 291, in _call
    raise Fail(err_msg)
resource_management.core.exceptions.Fail: Execution of 'ambari-sudo.sh su hdfs -l -s /bin/bash -c 'ulimit -c unlimited ;  /usr/hdp/current/hadoop-client/sbin/hadoop-daemon.sh --config /usr/hdp/current/hadoop-client/conf start datanode'' returned 1. /etc/profile: line 45: /dev/null: Permission denied
/etc/profile: line 70: /dev/null: Permission denied
/etc/profile: line 70: /dev/null: Permission denied
/etc/profile: line 70: /dev/null: Permission denied
/etc/profile: line 70: /dev/null: Permission denied
/etc/profile: line 70: /dev/null: Permission denied
/etc/profile: line 70: /dev/null: Permission denied
/etc/profile: line 70: /dev/null: Permission denied
/etc/profile: line 70: /dev/null: Permission denied
/etc/profile: line 70: /dev/null: Permission denied
/etc/profile: line 70: /dev/null: Permission denied
/etc/profile: line 70: /dev/null: Permission denied
/etc/profile: line 70: /dev/null: Permission denied
/etc/profile: line 70: /dev/null: Permission denied
/etc/profile: line 70: /dev/null: Permission denied
/etc/profile: line 70: /dev/null: Permission denied
-bash: /dev/null: Permission denied
/usr/hdp/current/hadoop-client/conf/hadoop-env.sh: line 100: /dev/null: Permission denied
ls: write error: Broken pipe
/usr/hdp/2.3.4.7-4/hadoop/libexec/hadoop-config.sh: line 155: /dev/null: Permission denied
/usr/hdp/current/hadoop-client/conf/hadoop-env.sh: line 100: /dev/null: Permission denied
ls: write error: Broken pipe
starting datanode, logging to /data/log/hadoop/hdfs/hadoop-hdfs-datanode-.out
/usr/hdp/2.3.4.7-4//hadoop-hdfs/bin/hdfs.distro: line 30: /dev/null: Permission denied
/usr/hdp/current/hadoop-client/conf/hadoop-env.sh: line 100: /dev/null: Permission denied
ls: write error: Broken pipe
/usr/hdp/2.3.4.7-4/hadoop/libexec/hadoop-config.sh: line 155: /dev/null: Permission denied
/usr/hdp/current/hadoop-client/sbin/hadoop-daemon.sh: line 187: /dev/null: Permission denied