Welcome to the Cloudera Community

Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Who agreed with this topic

All datanodes failed to connect to namenode after enabling Kerberos | Failed to create jsvc process

avatar
New Contributor

Hello,

 

After enabling Kerberos on my CDH 5.8.2 cluster(parcels), all the datanodes are not able to connect to namenode showing the below error in CDH manager UI:

NameNode Connectivity Suppress...
This DataNode is not connected to one or more of its NameNode(s).

Also datanodes logs are also not getting generated since I enabled Kerberos before that it was working fine.

 

Namenode logs:

2017-08-02 05:24:11,784 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: RECEIVED SIGNAL 15: SIGTERM
2017-08-03 06:01:26,492 WARN org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:hdfs (auth:PROXY) via hue/ngb-dev-master.dev.aeris.net@DEV.AERIS.NET (auth:KERBEROS) cause:java.io.IOException: File /tmp/.cloudera_health_monitoring_canary_files/.canary_file_2017_08_03-06_01_26 could only be replicated to 0 nodes instead of minReplication (=1).  There are 0 datanode(s) running and no node(s) are excluded in this operation.
2017-08-03 06:01:26,492 INFO org.apache.hadoop.ipc.Server: IPC Server handler 19 on 8020, call org.apache.hadoop.hdfs.protocol.ClientProtocol.addBlock from 10.21.34.130:53747 Call#213 Retry#0
java.io.IOException: File /tmp/.cloudera_health_monitoring_canary_files/.canary_file_2017_08_03-06_01_26 could only be replicated to 0 nodes instead of minReplication (=1).  There are 0 datanode(s) running and no node(s) are excluded in this operation.

 datanode has stopped generating logs since enabling of kerberos and only 'jsvc.err' file is generating logs:

Service killed by signal 11
Service killed by signal 11
Service killed by signal 11

On further investigation I found out that on datanode instance while trying to find out the datanode process, only root processes are getting invoked and no hdfs tasks are getting created as also seen in below logs:

root@ngb-dev-slave1:/var/log/hadoop-hdfs# ps -ef | grep datanode
root     12057  1588  0 Aug02 ?        00:00:00 jsvc.exec -Dproc_datanode -outfile /var/log/hadoop-hdfs/jsvc.out -errfile /var/log/hadoop-hdfs/jsvc.err -pidfile /tmp/hadoop_secure_dn.pid -nodetach -user hdfs -cp /run/cloudera-scm-agent/process/384-hdfs-DATANODE:/opt/cloudera/parcels/CDH-5.8.2-1.cdh5.8.2.p0.3/lib/hadoop/lib/*:/opt/cloudera/parcels/CDH-5.8.2-1.cdh5.8.2.p0.3/lib/hadoop/.//*:/opt/cloudera/parcels/CDH-5.8.2-1.cdh5.8.2.p0.3/lib/hadoop-hdfs/./:/opt/cloudera/parcels/CDH-5.8.2-1.cdh5.8.2.p0.3/lib/hadoop-hdfs/lib/*:/opt/cloudera/parcels/CDH-5.8.2-1.cdh5.8.2.p0.3/lib/hadoop-hdfs/.//*:/opt/cloudera/parcels/CDH-5.8.2-1.cdh5.8.2.p0.3/lib/hadoop-yarn/lib/*:/opt/cloudera/parcels/CDH-5.8.2-1.cdh5.8.2.p0.3/lib/hadoop-yarn/.//*:/opt/cloudera/parcels/CDH-5.8.2-1.cdh5.8.2.p0.3/lib/hadoop-mapreduce/lib/*:/opt/cloudera/parcels/CDH-5.8.2-1.cdh5.8.2.p0.3/lib/hadoop-mapreduce/.//*:/usr/share/cmf/lib/plugins/tt-instrumentation-5.12.0.jar:/usr/share/cmf/lib/plugins/event-publish-5.12.0-shaded.jar:/usr/share/cmf/lib/plugins/navigator/cdh57/audit-plugin-cdh57-2.11.0-shaded.jar -Xmx1000m -Dhdfs.audit.logger=INFO,RFAAUDIT -Dsecurity.audit.logger=INFO,RFAS -Djava.net.preferIPv4Stack=true -Dhadoop.log.dir=/var/log/hadoop-hdfs -Dhadoop.log.file=hadoop-cmf-hdfs-DATANODE-ngb-dev-slave1.dev.aeris.net.log.out -Dhadoop.home.dir=/opt/cloudera/parcels/CDH-5.8.2-1.cdh5.8.2.p0.3/lib/hadoop -Dhadoop.id.str=hdfs -Dhadoop.root.logger=INFO,RFA -Djava.library.path=/opt/cloudera/parcels/CDH-5.8.2-1.cdh5.8.2.p0.3/lib/hadoop/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -Dhadoop.id.str=hdfs -jvm server -Xms1073741824 -Xmx1073741824 -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 -XX:+CMSParallelRemarkEnabled -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp/hdfs_hdfs-DATANODE-6bb147f909751418954df20e7596eaf2_pid12057.hprof -XX:OnOutOfMemoryError=/usr/lib/cmf/service/common/killparent.sh -Dhadoop.security.logger=INFO,RFAS org.apache.hadoop.hdfs.server.datanode.SecureDataNodeStarter
root     12064 12057  0 Aug02 ?        00:00:00 python2.7 /usr/lib/cmf/agent/build/env/bin/cmf-redactor /usr/lib/cmf/service/hdfs/hdfs.sh datanode
root     20040  9494  0 06:08 pts/0    00:00:00 grep --color=auto datanode

there is no process running with HDFS user and also no process running on port 1004 and 1006 related to jsvc.exec as on one of our healthy cluster both ports are listening:

Healthy cluster logs - result of netstat -tulnp

tcp        0      0 10.3.12.83:1004         0.0.0.0:*               LISTEN      10917/jsvc.exec
tcp        0      0 10.3.12.83:1006         0.0.0.0:*               LISTEN      10917/jsvc.exec

Please let me know in case any other information is required.

 

Best Regards,

TM

Who agreed with this topic