Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

CDH 5.3 HDFS datanode can not start after the cluster and zookeeper are ready

avatar
New Contributor

Dear CDH users,

 
I am setting up a CDH 5.3 Cluster through Cloudera Manager 5.3 in CentOS 6.6, with one namenode and tow datanodes.
I read the installation guide carefully and I have setup the user privileges of cloudera-scm as follow:
  visudo
  %cloudera-scm ALL=(ALL) NOPASSWD: ALL
 
And create cloudera-scm:cloudera-scm as :
useradd -r -d/var/lib/cloudera-scm-server -g cloudera-scm -s /bin/bash -c "Cloudera Manager" cloudera-scm
groupadd -r supergroup
usermod -aG supergroup root
usermod -aG supergroup cloudera-scm
usermod -aG cloudera-scm root
 
But I get the error message in the log when I start to deploy zookeeper, which is the first service to deploy and start.
OSError: [Errno 13] Permission denied: '/var/log/zookeeper'
 
So I created the following directories and modify the owner to cloudera-scm:
mkdir -vp /var/lib/zookeeper
chown cloudera-scm:cloudera-scm !!:2
chmod 775 !!:2
 
mkdir -vp /var/log/zookeeper
chown cloudera-scm:cloudera-scm !!:2
chmod 775 !!:2
 
mkdir -vp /var/lib/zookeeper/version-2
chown cloudera-scm:cloudera-scm !!:2
chmod 775 !!:2
 
mkdir /cloudera_manager_zookeeper_canary
chown cloudera-scm:cloudera-scm !!:1
chmod 775 !!:2
 
And I can start the zookeeper normally.
 
Then, I add the service HDFS, also get errors like permission denied, so I also created the required directories.
 
The following error occurs:
10:53:56.694 PM FATAL org.apache.hadoop.hdfs.server.datanode.DataNode
Exception in secureMain
java.net.BindException: bind(2) error: Address already in use when trying to bind to '/var/run/hdfs-sockets/dn'
at org.apache.hadoop.net.unix.DomainSocket.bind0(Native Method)
at org.apache.hadoop.net.unix.DomainSocket.bindAndListen(DomainSocket.java:191)
at org.apache.hadoop.hdfs.net.DomainPeerServer.<init>(DomainPeerServer.java:40)
at org.apache.hadoop.hdfs.server.datanode.DataNode.getDomainPeerServer(DataNode.java:907)
at org.apache.hadoop.hdfs.server.datanode.DataNode.initDataXceiver(DataNode.java:873)
at org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:1066)
at org.apache.hadoop.hdfs.server.datanode.DataNode.<init>(DataNode.java:411)
at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:2297)
at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:2184)
at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:2231)
at org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:2407)
at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:2431)
 
I Googled many times and not find the way to resolve it.
 
The node health and the zookeepers are green.
 
[root@node2 hadoop-conf]# pwd
/var/run/cloudera-scm-agent/process/ccdeploy_hadoop-conf_etchadoopconf.cloudera.hdfs_3180973461668933781/hadoop-conf
[root@node2 hadoop-conf]# cat hdfs-site.xml 
<?xml version="1.0" encoding="UTF-8"?>
 
<!--Autogenerated by Cloudera Manager-->
<configuration>
  <property>
    <name>dfs.namenode.name.dir</name>
    <value>file:///dfs/nn</value>
  </property>
  <property>
    <name>dfs.namenode.servicerpc-address</name>
    <value>node1.mycloudera.com:8022</value>
  </property>
  <property>
    <name>dfs.https.address</name>
    <value>node1.mycloudera.com:50470</value>
  </property>
  <property>
    <name>dfs.https.port</name>
    <value>50470</value>
  </property>
  <property>
    <name>dfs.namenode.http-address</name>
    <value>node1.mycloudera.com:50070</value>
  </property>
  <property>
    <name>dfs.replication</name>
    <value>3</value>
  </property>
  <property>
    <name>dfs.blocksize</name>
    <value>134217728</value>
  </property>
  <property>
    <name>dfs.client.use.datanode.hostname</name>
    <value>false</value>
  </property>
  <property>
    <name>fs.permissions.umask-mode</name>
    <value>022</value>
  </property>
  <property>
    <name>dfs.namenode.acls.enabled</name>
    <value>false</value>
  </property>
  <property>
    <name>dfs.block.local-path-access.user</name>
    <value>cloudera-scm</value>
  </property>
  <property>
    <name>dfs.client.read.shortcircuit</name>
    <value>false</value>
  </property>
  <property>
    <name>dfs.domain.socket.path</name>
    <value>/var/run/hdfs-sockets/dn</value>
  </property>
  <property>
    <name>dfs.client.read.shortcircuit.skip.checksum</name>
    <value>false</value>
  </property>
  <property>
    <name>dfs.client.domain.socket.data.traffic</name>
    <value>false</value>
  </property>
  <property>
    <name>dfs.datanode.hdfs-blocks-metadata.enabled</name>
    <value>true</value>
  </property>
</configuration>
[root@node2 hadoop-conf]# 
 
[root@node2 ~]# cat /etc/passwd | grep cloudera-scm
cloudera-scm:x:496:480:Cloudera Manager:/var/lib/cloudera-scm-server:/sbin/nologin
[root@node2 ~]# cat /etc/group | grep cloudera
root:x:0:root,cloudera-scm
supergroup:x:493:root,cloudera-scm
cloudera-scm:x:480:root,cloudera-scm
 
Experts, do you ever encounter this problem? Please share your experiences, thank you very much.
1 ACCEPTED SOLUTION

avatar
New Contributor

Dear CDH Users,

 

I tried again, and keep all nodes connecting to Internet.

This time, all goes normal, not a single error occurs.

 

Thank you.

View solution in original post

2 REPLIES 2

avatar
New Contributor

Dear CDH Users,

 

I tried again, and keep all nodes connecting to Internet.

This time, all goes normal, not a single error occurs.

 

Thank you.

avatar
New Contributor

I have all permission correct for /var/run/hdfs-sockets/ under root:root still its showing error and HDFS is not starting