Support Questions

Find answers, ask questions, and share your expertise

Unable to Start DataNode in kerberos cluster

avatar
Master Collaborator

Hi Guys,

I'm unable to start DataNode after enabling the kerberos in my cluster. I tried all the suggested solutions in the community and Internet and without any success to solve it.

All other servers started and my cluster and node able to authenticate against the active directory.

Here the important config in the HDFS:

dfs.datanode.http.address 1006

dfs.datanode.address 1004

hadoop.security.authentication kerberos

hadoop.security.authorization true

hadoop.rpc.protection authentication

Enable Kerberos Authentication for HTTP Web-Consoles true

and here is the log: STARTUP_MSG: java = 1.8.0_101 ************************************************************/ 2017-10-23 06:56:02,698 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: registered UNIX signal handlers for [TERM, HUP, INT] 2017-10-23 06:56:03,449 INFO org.apache.hadoop.security.UserGroupInformation: Login successful for user hdfs/aopr-dhc001.lpdomain.com@LPDOMAIN.COM using keytab file hdfs.keytab 2017-10-23 06:56:03,812 INFO org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from hadoop-metrics2.properties 2017-10-23 06:56:03,891 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s). 2017-10-23 06:56:03,891 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: DataNode metrics system started 2017-10-23 06:56:03,899 INFO org.apache.hadoop.hdfs.server.datanode.BlockScanner: Initialized block scanner with targetBytesPerSec 1048576 2017-10-23 06:56:03,900 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: File descriptor passing is enabled. 2017-10-23 06:56:03,903 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Configured hostname is aopr-dhc001.lpdomain.com 2017-10-23 06:56:03,908 FATAL org.apache.hadoop.hdfs.server.datanode.DataNode: Exception in secureMain java.lang.RuntimeException: Cannot start secure DataNode without configuring either privileged resources or SASL RPC data transfer protection and SSL for HTTP. Using privileged resources in combination with SASL RPC data transfer protection is not supported. at org.apache.hadoop.hdfs.server.datanode.DataNode.checkSecureConfig(DataNode.java:1371) at org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:1271) at org.apache.hadoop.hdfs.server.datanode.DataNode.<init>(DataNode.java:464) at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:2583) at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:2470) at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:2517) at org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:2699) at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:2723) 2017-10-23 06:56:03,919 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 1 2017-10-23 06:56:03,921 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: SHUTDOWN_MSG: /************************************************************ SHUTDOWN_MSG: Shutting down DataNode at aopr-dhc001.lpdomain.com/10.16.144.131 ************************************************************/ 2017-10-23 06:56:08,422 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: STARTUP_MSG: /************************************************************ STARTUP_MSG: Starting DataNode STARTUP_MSG: host = aopr-dhc001.lpdomain.com/10.16.144.131 STARTUP_MSG: args = [] STARTUP_MSG: version = 2.6.0-cdh5.13.0=======================

34 REPLIES 34

avatar

A few things to check for:

  1. Are you starting the DataNode process as root?
  2. Have you set HADOOP_SECURE_DN_USER and JSVC_HOME?
  3. Since you are using a privileged port number (<1024), ensure you have not set dfs.data.transfer.protection.

The Apache Hadoop documentation for Secure DN setup is good.

https://hadoop.apache.org/docs/r2.7.4/hadoop-project-dist/hadoop-common/SecureMode.html#Secure_DataN...

avatar
Master Collaborator

Hi Arpit

I'm using hadoop 2.6

1- I'm starting the DN using the superuser.

2- No, HADOOP_SECURE_DN_USER is commented under /etc/default/hadoop-hdfs-datanode, no config for JSVC_HOME

3- dfs.data.transfer.protection is none


Do i need to add there 2 parameters for my hadoop-env.sh under /etc/hadoop/conf?

avatar

Did you enable security using the Ambari Kerberos wizard? That usually takes care of these settings for you.

avatar
Master Mentor

@Fawze AbuJaber

Have you recently upgraded your Operating System Kernel? Is your kernel version something like this "kernel-3.10.0-514.21.2.el7.x86_64"


Can you please try to add “-Xss2m” as following inside the "/usr/hdp/$VERSION/hadoop-hdfs/bin/hdfs.distro" on all the Datanodes

exec "$JSVC" \
-Xss2m \
org.apache.hadoop.hdfs.server.datanode.SecureDataNodeStarter "$@“ 

.
Then try starting DN again.

.

NOTE: Also please check if there is any JVM crash file created as. If this file is created then it might be related to :

https://community.hortonworks.com/questions/109594/datanode-failing-to-start-jre-sigbus-error.html and the -Xss2m solution should work.

<em>/var/log/hadoop/hs_err_pid#.log</em>

avatar
Master Mentor

@Fawze AbuJaber

You might also want to update HADOOP_DATANODE_OPTS environment varialbe in hadoop-env to have "-Xss2m"

avatar
Master Collaborator

Tried this but with no success

avatar
Master Collaborator

My kernel is: 2.6.32-573.26.1.el6.x86_64

avatar
Master Collaborator

Yes, I did.

avatar
Master Mentor

@Fawze AbuJaber

I see "STARTUP_MSG: version = 2.6.0-cdh5.13.0 " is this a cloudera cluster ?

Curiously I contribute in cloudera community and I see you opened also a thread in http://community.cloudera.com/t5/Storage-Random-Access-HDFS/Unable-to-Start-DataNode-in-kerberos-cl...

Could you be precise on the distribution so you can get better help?