Created on 10-27-2017 04:36 PM - edited 09-16-2022 05:27 AM
Hi Guys,
I'm unable to start DataNode after enabling the kerberos in my cluster. I tried all the suggested solutions in the community and Internet and without any success to solve it.
All other servers started and my cluster and node able to authenticate against the active directory.
Here the important config in the HDFS:
dfs.datanode.http.address 1006
dfs.datanode.address 1004
hadoop.security.authentication kerberos
hadoop.security.authorization true
hadoop.rpc.protection authentication
Enable Kerberos Authentication for HTTP Web-Consoles true
and here is the log: STARTUP_MSG: java = 1.8.0_101 ************************************************************/ 2017-10-23 06:56:02,698 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: registered UNIX signal handlers for [TERM, HUP, INT] 2017-10-23 06:56:03,449 INFO org.apache.hadoop.security.UserGroupInformation: Login successful for user hdfs/aopr-dhc001.lpdomain.com@LPDOMAIN.COM using keytab file hdfs.keytab 2017-10-23 06:56:03,812 INFO org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from hadoop-metrics2.properties 2017-10-23 06:56:03,891 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s). 2017-10-23 06:56:03,891 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: DataNode metrics system started 2017-10-23 06:56:03,899 INFO org.apache.hadoop.hdfs.server.datanode.BlockScanner: Initialized block scanner with targetBytesPerSec 1048576 2017-10-23 06:56:03,900 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: File descriptor passing is enabled. 2017-10-23 06:56:03,903 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Configured hostname is aopr-dhc001.lpdomain.com 2017-10-23 06:56:03,908 FATAL org.apache.hadoop.hdfs.server.datanode.DataNode: Exception in secureMain java.lang.RuntimeException: Cannot start secure DataNode without configuring either privileged resources or SASL RPC data transfer protection and SSL for HTTP. Using privileged resources in combination with SASL RPC data transfer protection is not supported. at org.apache.hadoop.hdfs.server.datanode.DataNode.checkSecureConfig(DataNode.java:1371) at org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:1271) at org.apache.hadoop.hdfs.server.datanode.DataNode.<init>(DataNode.java:464) at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:2583) at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:2470) at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:2517) at org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:2699) at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:2723) 2017-10-23 06:56:03,919 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 1 2017-10-23 06:56:03,921 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: SHUTDOWN_MSG: /************************************************************ SHUTDOWN_MSG: Shutting down DataNode at aopr-dhc001.lpdomain.com/10.16.144.131 ************************************************************/ 2017-10-23 06:56:08,422 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: STARTUP_MSG: /************************************************************ STARTUP_MSG: Starting DataNode STARTUP_MSG: host = aopr-dhc001.lpdomain.com/10.16.144.131 STARTUP_MSG: args = [] STARTUP_MSG: version = 2.6.0-cdh5.13.0=======================
Created 10-27-2017 04:58 PM
A few things to check for:
The Apache Hadoop documentation for Secure DN setup is good.
Created 10-27-2017 05:02 PM
Hi Arpit
I'm using hadoop 2.6
1- I'm starting the DN using the superuser.
2- No, HADOOP_SECURE_DN_USER is commented under /etc/default/hadoop-hdfs-datanode, no config for JSVC_HOME
3- dfs.data.transfer.protection is none
Do i need to add there 2 parameters for my hadoop-env.sh under /etc/hadoop/conf?
Created 10-27-2017 06:07 PM
Did you enable security using the Ambari Kerberos wizard? That usually takes care of these settings for you.
Created 10-27-2017 06:08 PM
Have you recently upgraded your Operating System Kernel? Is your kernel version something like this "kernel-3.10.0-514.21.2.el7.x86_64"
Can you please try to add “-Xss2m” as following inside the "/usr/hdp/$VERSION/hadoop-hdfs/bin/hdfs.distro" on all the Datanodes
exec "$JSVC" \ -Xss2m \ org.apache.hadoop.hdfs.server.datanode.SecureDataNodeStarter "$@“
.
Then try starting DN again.
.
NOTE: Also please check if there is any JVM crash file created as. If this file is created then it might be related to :
https://community.hortonworks.com/questions/109594/datanode-failing-to-start-jre-sigbus-error.html and the -Xss2m solution should work.
<em>/var/log/hadoop/hs_err_pid#.log</em>
Created 10-27-2017 06:16 PM
You might also want to update HADOOP_DATANODE_OPTS environment varialbe in hadoop-env to have "-Xss2m"
Created 10-27-2017 06:30 PM
Tried this but with no success
Created 10-27-2017 06:30 PM
My kernel is: 2.6.32-573.26.1.el6.x86_64
Created 10-27-2017 06:10 PM
Yes, I did.
Created 10-28-2017 08:42 AM
I see "STARTUP_MSG: version = 2.6.0-cdh5.13.0 " is this a cloudera cluster ?
Curiously I contribute in cloudera community and I see you opened also a thread in http://community.cloudera.com/t5/Storage-Random-Access-HDFS/Unable-to-Start-DataNode-in-kerberos-cl...
Could you be precise on the distribution so you can get better help?