Support Questions

Find answers, ask questions, and share your expertise

CDH 5.3 HDFS datanode can not start after the cluster and zookeeper are ready

avatar
New Contributor

Dear CDH users,

 
I am setting up a CDH 5.3 Cluster through Cloudera Manager 5.3 in CentOS 6.6, with one namenode and tow datanodes.
I read the installation guide carefully and I have setup the user privileges of cloudera-scm as follow:
  visudo
  %cloudera-scm ALL=(ALL) NOPASSWD: ALL
 
And create cloudera-scm:cloudera-scm as :
useradd -r -d/var/lib/cloudera-scm-server -g cloudera-scm -s /bin/bash -c "Cloudera Manager" cloudera-scm
groupadd -r supergroup
usermod -aG supergroup root
usermod -aG supergroup cloudera-scm
usermod -aG cloudera-scm root
 
But I get the error message in the log when I start to deploy zookeeper, which is the first service to deploy and start.
OSError: [Errno 13] Permission denied: '/var/log/zookeeper'
 
So I created the following directories and modify the owner to cloudera-scm:
mkdir -vp /var/lib/zookeeper
chown cloudera-scm:cloudera-scm !!:2
chmod 775 !!:2
 
mkdir -vp /var/log/zookeeper
chown cloudera-scm:cloudera-scm !!:2
chmod 775 !!:2
 
mkdir -vp /var/lib/zookeeper/version-2
chown cloudera-scm:cloudera-scm !!:2
chmod 775 !!:2
 
mkdir /cloudera_manager_zookeeper_canary
chown cloudera-scm:cloudera-scm !!:1
chmod 775 !!:2
 
And I can start the zookeeper normally.
 
Then, I add the service HDFS, also get errors like permission denied, so I also created the required directories.
 
The following error occurs:
10:53:56.694 PM FATAL org.apache.hadoop.hdfs.server.datanode.DataNode
Exception in secureMain
java.net.BindException: bind(2) error: Address already in use when trying to bind to '/var/run/hdfs-sockets/dn'
at org.apache.hadoop.net.unix.DomainSocket.bind0(Native Method)
at org.apache.hadoop.net.unix.DomainSocket.bindAndListen(DomainSocket.java:191)
at org.apache.hadoop.hdfs.net.DomainPeerServer.<init>(DomainPeerServer.java:40)
at org.apache.hadoop.hdfs.server.datanode.DataNode.getDomainPeerServer(DataNode.java:907)
at org.apache.hadoop.hdfs.server.datanode.DataNode.initDataXceiver(DataNode.java:873)
at org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:1066)
at org.apache.hadoop.hdfs.server.datanode.DataNode.<init>(DataNode.java:411)
at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:2297)
at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:2184)
at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:2231)
at org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:2407)
at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:2431)
 
I Googled many times and not find the way to resolve it.
 
The node health and the zookeepers are green.
 
[root@node2 hadoop-conf]# pwd
/var/run/cloudera-scm-agent/process/ccdeploy_hadoop-conf_etchadoopconf.cloudera.hdfs_3180973461668933781/hadoop-conf
[root@node2 hadoop-conf]# cat hdfs-site.xml 
<?xml version="1.0" encoding="UTF-8"?>
 
<!--Autogenerated by Cloudera Manager-->
<configuration>
  <property>
    <name>dfs.namenode.name.dir</name>
    <value>file:///dfs/nn</value>
  </property>
  <property>
    <name>dfs.namenode.servicerpc-address</name>
    <value>node1.mycloudera.com:8022</value>
  </property>
  <property>
    <name>dfs.https.address</name>
    <value>node1.mycloudera.com:50470</value>
  </property>
  <property>
    <name>dfs.https.port</name>
    <value>50470</value>
  </property>
  <property>
    <name>dfs.namenode.http-address</name>
    <value>node1.mycloudera.com:50070</value>
  </property>
  <property>
    <name>dfs.replication</name>
    <value>3</value>
  </property>
  <property>
    <name>dfs.blocksize</name>
    <value>134217728</value>
  </property>
  <property>
    <name>dfs.client.use.datanode.hostname</name>
    <value>false</value>
  </property>
  <property>
    <name>fs.permissions.umask-mode</name>
    <value>022</value>
  </property>
  <property>
    <name>dfs.namenode.acls.enabled</name>
    <value>false</value>
  </property>
  <property>
    <name>dfs.block.local-path-access.user</name>
    <value>cloudera-scm</value>
  </property>
  <property>
    <name>dfs.client.read.shortcircuit</name>
    <value>false</value>
  </property>
  <property>
    <name>dfs.domain.socket.path</name>
    <value>/var/run/hdfs-sockets/dn</value>
  </property>
  <property>
    <name>dfs.client.read.shortcircuit.skip.checksum</name>
    <value>false</value>
  </property>
  <property>
    <name>dfs.client.domain.socket.data.traffic</name>
    <value>false</value>
  </property>
  <property>
    <name>dfs.datanode.hdfs-blocks-metadata.enabled</name>
    <value>true</value>
  </property>
</configuration>
[root@node2 hadoop-conf]# 
 
[root@node2 ~]# cat /etc/passwd | grep cloudera-scm
cloudera-scm:x:496:480:Cloudera Manager:/var/lib/cloudera-scm-server:/sbin/nologin
[root@node2 ~]# cat /etc/group | grep cloudera
root:x:0:root,cloudera-scm
supergroup:x:493:root,cloudera-scm
cloudera-scm:x:480:root,cloudera-scm
 
Experts, do you ever encounter this problem? Please share your experiences, thank you very much.
1 ACCEPTED SOLUTION

avatar
New Contributor

Dear CDH Users,

 

I tried again, and keep all nodes connecting to Internet.

This time, all goes normal, not a single error occurs.

 

Thank you.

View solution in original post

2 REPLIES 2

avatar
New Contributor

Dear CDH Users,

 

I tried again, and keep all nodes connecting to Internet.

This time, all goes normal, not a single error occurs.

 

Thank you.

avatar
New Contributor

I have all permission correct for /var/run/hdfs-sockets/ under root:root still its showing error and HDFS is not starting