Support Questions

Find answers, ask questions, and share your expertise

Unable to Start DataNode in kerberos cluster

Super Collaborator

Hi Guys,

I'm unable to start DataNode after enabling the kerberos in my cluster. I tried all the suggested solutions in the community and Internet and without any success to solve it.

All other servers started and my cluster and node able to authenticate against the active directory.

Here the important config in the HDFS:

dfs.datanode.http.address 1006

dfs.datanode.address 1004

hadoop.security.authentication kerberos

hadoop.security.authorization true

hadoop.rpc.protection authentication

Enable Kerberos Authentication for HTTP Web-Consoles true

and here is the log: STARTUP_MSG: java = 1.8.0_101 ************************************************************/ 2017-10-23 06:56:02,698 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: registered UNIX signal handlers for [TERM, HUP, INT] 2017-10-23 06:56:03,449 INFO org.apache.hadoop.security.UserGroupInformation: Login successful for user hdfs/aopr-dhc001.lpdomain.com@LPDOMAIN.COM using keytab file hdfs.keytab 2017-10-23 06:56:03,812 INFO org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from hadoop-metrics2.properties 2017-10-23 06:56:03,891 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s). 2017-10-23 06:56:03,891 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: DataNode metrics system started 2017-10-23 06:56:03,899 INFO org.apache.hadoop.hdfs.server.datanode.BlockScanner: Initialized block scanner with targetBytesPerSec 1048576 2017-10-23 06:56:03,900 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: File descriptor passing is enabled. 2017-10-23 06:56:03,903 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Configured hostname is aopr-dhc001.lpdomain.com 2017-10-23 06:56:03,908 FATAL org.apache.hadoop.hdfs.server.datanode.DataNode: Exception in secureMain java.lang.RuntimeException: Cannot start secure DataNode without configuring either privileged resources or SASL RPC data transfer protection and SSL for HTTP. Using privileged resources in combination with SASL RPC data transfer protection is not supported. at org.apache.hadoop.hdfs.server.datanode.DataNode.checkSecureConfig(DataNode.java:1371) at org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:1271) at org.apache.hadoop.hdfs.server.datanode.DataNode.<init>(DataNode.java:464) at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:2583) at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:2470) at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:2517) at org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:2699) at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:2723) 2017-10-23 06:56:03,919 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 1 2017-10-23 06:56:03,921 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: SHUTDOWN_MSG: /************************************************************ SHUTDOWN_MSG: Shutting down DataNode at aopr-dhc001.lpdomain.com/10.16.144.131 ************************************************************/ 2017-10-23 06:56:08,422 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: STARTUP_MSG: /************************************************************ STARTUP_MSG: Starting DataNode STARTUP_MSG: host = aopr-dhc001.lpdomain.com/10.16.144.131 STARTUP_MSG: args = [] STARTUP_MSG: version = 2.6.0-cdh5.13.0=======================

34 REPLIES 34

A few things to check for:

  1. Are you starting the DataNode process as root?
  2. Have you set HADOOP_SECURE_DN_USER and JSVC_HOME?
  3. Since you are using a privileged port number (<1024), ensure you have not set dfs.data.transfer.protection.

The Apache Hadoop documentation for Secure DN setup is good.

https://hadoop.apache.org/docs/r2.7.4/hadoop-project-dist/hadoop-common/SecureMode.html#Secure_DataN...

Super Collaborator

Hi Arpit

I'm using hadoop 2.6

1- I'm starting the DN using the superuser.

2- No, HADOOP_SECURE_DN_USER is commented under /etc/default/hadoop-hdfs-datanode, no config for JSVC_HOME

3- dfs.data.transfer.protection is none


Do i need to add there 2 parameters for my hadoop-env.sh under /etc/hadoop/conf?

Did you enable security using the Ambari Kerberos wizard? That usually takes care of these settings for you.

Super Mentor

@Fawze AbuJaber

Have you recently upgraded your Operating System Kernel? Is your kernel version something like this "kernel-3.10.0-514.21.2.el7.x86_64"


Can you please try to add “-Xss2m” as following inside the "/usr/hdp/$VERSION/hadoop-hdfs/bin/hdfs.distro" on all the Datanodes

exec "$JSVC" \
-Xss2m \
org.apache.hadoop.hdfs.server.datanode.SecureDataNodeStarter "$@“ 

.
Then try starting DN again.

.

NOTE: Also please check if there is any JVM crash file created as. If this file is created then it might be related to :

https://community.hortonworks.com/questions/109594/datanode-failing-to-start-jre-sigbus-error.html and the -Xss2m solution should work.

<em>/var/log/hadoop/hs_err_pid#.log</em>

Super Mentor

@Fawze AbuJaber

You might also want to update HADOOP_DATANODE_OPTS environment varialbe in hadoop-env to have "-Xss2m"

Super Collaborator

Tried this but with no success

Super Collaborator

My kernel is: 2.6.32-573.26.1.el6.x86_64

Super Collaborator

Yes, I did.

Mentor

@Fawze AbuJaber

I see "STARTUP_MSG: version = 2.6.0-cdh5.13.0 " is this a cloudera cluster ?

Curiously I contribute in cloudera community and I see you opened also a thread in http://community.cloudera.com/t5/Storage-Random-Access-HDFS/Unable-to-Start-DataNode-in-kerberos-cl...

Could you be precise on the distribution so you can get better help?

Super Collaborator

Hi Geoffrey,

Yes i'm using CDH but the error i'm getting is not related to CDH.

Mentor

@Fawze AbuJaber

Can you change the below from the current "authentication" to "privacy"

core-site.xml

hadoop.rpc.protection = privacy

hdfs-site.xml

dfs.encrypt.data.transfer=true 

Does the Cluster have custom java classes and dependences? If so include them Have a look at this jira https://issues.apache.org/jira/browse/AMBARI-8174

You may need to configure both dfs.data.transfer.protection and hadoop.rpc.protection to specify QOP for rpc and data transfer protocols. In some cases, the values for these two properties will be same. In those cases, it may be easier to allow dfs.data.transfer.protection default to hadoop.rpc.protection.This also ensures that an admin will get QOP as Authentication if admin does not specify either of those values.

The restart the datanode after the 2 changes in the core / hdfs site .xml

Super Collaborator

Tried but with no success, indeed i'm notice such error before this error and don'w know how it might be related:

KdcAccessibility: remove ropr-mng01.lpdomain.com
>>> KDCRep: init() encoding tag is 126 req type is 11
>>>KRBError:
	 sTime is Sat Oct 28 06:26:45 EDT 2017 1509186405000
	 suSec is 487082
	 error code is 25
	 error Message is Additional pre-authentication required
	 sname is krbtgt/LPDOMAIN.COM@LPDOMAIN.COM
	 eData provided.

Super Collaborator

When i disable the kerberos, all is working fine.

Mentor

@Fawze AbuJaber

There could be a couple of issues with your Kerberos setup.

I am not familiar with the Cloudera Manager /Kerberos wizard but I have some pointers can you share your krb5.ini or conf?

It seems your KDC does not support the encryption type requested. The desired encryption types are specified in the following tags in the Kerberos Configuration file krb5.ini or conf:

 [libdefaults]

Enable debug by running the below kinit where xxx.ktab and xxx.ktab_Principal is the principal,you can get the values using klist

kinit -J-Dsun.security.krb5.debug=true -J-Djava.security.debug=true -k -t xxx.ktab {xxx.ktab_Principal}

Please let me know

Super Collaborator

@Geoffrey Shelton Okot

supported_enctypes = aes256-cts:normal aes128-cts:normal des3-hmac-sha1:normal arcfour-hmac:normal des-hmac-sha1:normal des-cbc-md5:normal des-cbc-crc:normal

I Have the following config also:

:dfs.encrypt.data.transfer.algorithm=AES/CTR/NoPadding

dfs.encrypt.data.transfer.cipher.key.bitlength=256

Kerberos Encryption Types=rc4-hmac

seems that kinit nor working in the same you in HDP:

[root@aopr-dhc001 ~]# kinit -V -J-Dsun.security.krb5.debug=true -J-Djava.security.debug=true -k -t cloudera-scm@LPDOMAIN.COM.ktab {cloudera-scm@LPDOMAIN.COM.ktab_Principal}

kinit: invalid option -- 'J' kinit: invalid option -- '-' kinit: invalid option -- 'D' Bad start time value un.security.krb5.debug=true kinit: invalid option -- 'J' kinit: invalid option -- '-' kinit: invalid option -- 'D' kinit: invalid option -- 'j' kinit: invalid option -- '.' Bad start time value ecurity.debug=true Usage: kinit [-V] [-l lifetime] [-s start_time] [-r renewable_life] [-f | -F] [-p | -P] -n [-a | -A] [-C] [-E] [-v] [-R] [-k [-t keytab_file]] [-c cachename] [-S service_name] [-T ticket_armor_cache] [-X <attribute>[=<value>]] [principal] options: -V verbose -l lifetime -s start time -r renewable lifetime -f forwardable -F not forwardable -p proxiable -P not proxiable -n anonymous -a include addresses -A do not include addresses -v validate -R renew -C canonicalize -E client is enterprise principal name -k use keytab -t filename of keytab to use -c Kerberos 5 cache name -S service -T armor credential cache -X <attribute>[=<value>]

Mentor

@Fawze AbuJaber

Please do this instead the previous {.......} was an example, sorry I didn't elaborate!

kinit -V -J-Dsun.security.krb5.debug=true -J-Djava.security.debug=true -k -t cloudera-scm@LPDOMAIN.COM.ktab cloudera-scm@LPDOMAIN.COM.ktab_Principal

And can you attach the krb5.conf (Linux) and krb5.ini (windows) I need to see what values you have in there.

Super Collaborator

@Geoffrey Shelton Okot

[root@aopr-dhc001 ~]# cat /etc/krb5.conf

[libdefaults]

default_realm = LPDOMAIN.COM

dns_lookup_kdc = true

dns_lookup_realm = false

ticket_lifetime = 86400

renew_lifetime = 604800

forwardable = true

default_tgs_enctypes = rc4-hmac

default_tkt_enctypes = rc4-hmac

permitted_enctypes = rc4-hmac

udp_preference_limit = 1

kdc_timeout = 5000

supported_enctypes = aes256-cts:normal aes128-cts:normal des3-hmac-sha1:normal arcfour-hmac:normal des-hmac-sha1:normal des-cbc-md5:normal des-cbc-crc:normal

[realms] LPDOMAIN.COM = { kdc = ropr-mng01.lpdomain.com

admin_server = ropr-mng01.lpdomain.com }

[domain_realm]

Super Collaborator

@Geoffrey Shelton Okot

[root@aopr-dhc001 ~]# kinit -V -J-Dsun.security.krb5.debug=true -J-Djava.security.debug=true -k -t cloudera-scm@LPDOMAIN.COM.ktab cloudera-scm@LPDOMAIN.COM.ktab_Principal

kinit: invalid option -- 'J' kinit: invalid option -- '-' kinit: invalid option -- 'D' Bad start time value un.security.krb5.debug=true kinit: invalid option -- 'J' kinit: invalid option -- '-' kinit: invalid option -- 'D' kinit: invalid option -- 'j' kinit: invalid option -- '.' Bad start time value ecurity.debug=true Usage: kinit [-V] [-l lifetime] [-s start_time] [-r renewable_life] [-f | -F] [-p | -P] -n [-a | -A] [-C] [-E] [-v] [-R] [-k [-t keytab_file]] [-c cachename] [-S service_name] [-T ticket_armor_cache] [-X <attribute>[=<value>]] [principal] options: -V verbose -l lifetime -s start time -r renewable lifetime -f forwardable -F not forwardable -p proxiable -P not proxiable -n anonymous -a include addresses -A do not include addresses -v validate -R renew -C canonicalize -E client is enterprise principal name -k use keytab -t filename of keytab to use -c Kerberos 5 cache name -S service -T armor credential cache

Mentor

@Fawze AbuJaber

Can you make a backup and replace your krb5.conf with this file below please notice the difference! Can you make sure the supported_enctypes match your AD encryption ?

[libdefaults]
  default_realm = LPDOMAIN.COM
  dns_lookup_kdc = true
  dns_lookup_realm = false
  ticket_lifetime = 86400
  renew_lifetime = 604800
  forwardable = true
  default_tgs_enctypes = rc4-hmac
  default_tkt_enctypes = rc4-hmac
  permitted_enctypes = rc4-hmac
  udp_preference_limit = 1
  kdc_timeout = 5000
  supported_enctypes = aes256-cts:normal aes128-cts:normal des3-hmac-sha1:normal arcfour-hmac:normal des-hmac-sha1:normal des-cbc-md5:normal des-cbc-crc:normal
[domain_realm]
  lpdomain.com = LPDOMAIN.COM
  .lpdomain.com = LPDOMAIN.COM
[realms] 
  LPDOMAIN.COM = { 
  kdc = ropr-mng01.lpdomain.com
  admin_server = ropr-mng01.lpdomain.com 
  }
[domain_realm]
  lpdomain.com = LPDOMAIN.COM
  .lpdomain.com = LPDOMAIN.COM

BRB

Super Collaborator

Tried but still getting the same error,

Below attached my AD supported encryption

ad-conf-in-ad.pngad-part-2.png