Member since
10-02-2021
4
Posts
0
Kudos Received
0
Solutions
10-25-2021
03:51 PM
I researched the reason why my KDC server always becomes down when hadoop task is running. I checked KDC logs and I found too many of TGS requests inside. My standard hdfs algorythm: kdestroy kinit cdh_test # many of similar hdfs operations for example: -sh-4.2$ hdfs dfs -ls /tmp -sh-4.2$ hdfs dfs -ls /tmp .... then I do klist Ticket cache: FILE:/tmp/krb5cc_1796600024 Default principal: cdh_test@DEV.WINDOWS.LOCAL Valid starting Expires Service principal 10/26/2021 01:09:39 10/27/2021 01:09:39 krbtgt/DEV.WINDOWS.LOCAL@DEV.WINDOWS.LOCAL renew until 11/02/2021 01:09:39 I see only TGT ticket but no TGS. Why? if I execute psql or curl so I always see TGS for those services, but hdfs service is out in cache. In KDC logs many of TGS: Oct 26 01:11:07 ldap.dev.windows.local krb5kdc[27198](info): TGS_REQ (6 etypes {aes256-cts-hmac-sha1-96(18), aes128-cts-hmac-sha1-96(17), UNSUPPORTED:des3-hmac-sha1(16), DEPRECATED:arcfour-hmac(23), UNSUPPORTED:des-cbc-crc(1), UNSUPPORTED:des-cbc-md5(3)}) 192.168.1.122: ISSUE: authtime 1635199779, etypes {rep=aes256-cts-hmac-sha1-96(18), tkt=aes256-cts-hmac-sha1-96(18), ses=aes256-cts-hmac-sha1-96(18)}, cdh_test@DEV.WINDOWS.LOCAL for hdfs/cloudera.ipa.dev.windows.local@DEV.WINDOWS.LOCAL Oct 26 01:11:11 ldap.dev.windows.local krb5kdc[27199](info): TGS_REQ (6 etypes {aes256-cts-hmac-sha1-96(18), aes128-cts-hmac-sha1-96(17), UNSUPPORTED:des3-hmac-sha1(16), DEPRECATED:arcfour-hmac(23), UNSUPPORTED:des-cbc-crc(1), UNSUPPORTED:des-cbc-md5(3)}) 192.168.1.122: ISSUE: authtime 1635199779, etypes {rep=aes256-cts-hmac-sha1-96(18), tkt=aes256-cts-hmac-sha1-96(18), ses=aes256-cts-hmac-sha1-96(18)}, cdh_test@DEV.WINDOWS.LOCAL for hdfs/cloudera.ipa.dev.windows.local@DEV.WINDOWS.LOCAL ..... ----------------------------------------------- I thought that it is normal because there is no description what is right Kerberos use in Hadoop applications. However I found some rare articles where authors said Java has never supported Kerberos protocols except build-in authentication. So it is impossbile to use Java native module as optimized for Windows and Unix servers. There was information that all existed Kerberos protocols work with own GSS libraries like SSPI in Windows environment and MIT Kerberos in Unix environment. Unsupported Java native module always stuck in the caching because it cannot write tickets to cache and it tries to autheticate any new thread so it ignores on TGT and TGS is valid. it means DDoS because KDC cannot serve too many of hdfs connections and it is risky to be down. hadoop threads by one user generates many of re-logins. It looks like stupid because when we login to Windows we do not inter password every time for any operation or service because it is SSO standard only to do it once and we work with service using service ticket in LSA cache. Also I found workaround there is only need to do: export HADOOP_OPTS="$HADOOP_OPTS -Dsun.security.jgss.native=true -Djavax.security.auth.useSubjectCredsOnly=false" So export HADOOP_OPTS="$HADOOP_OPTS -Dsun.security.jgss.native=true -Djavax.security.auth.useSubjectCredsOnly=false" kdestroy kinit cdh_test -sh-4.2$ hdfs dfs -ls /tmp -sh-4.2$ hdfs dfs -ls /tmp ..... -sh-4.2$ klist Ticket cache: FILE:/tmp/krb5cc_1796600024 Default principal: cdh_test@DEV.WINDOWS.LOCAL Valid starting Expires Service principal 10/26/2021 01:14:02 10/27/2021 01:14:02 krbtgt/DEV.WINDOWS.LOCAL@DEV.WINDOWS.LOCAL renew until 11/02/2021 01:14:02 10/26/2021 01:14:10 10/27/2021 01:14:02 hdfs/cloudera.ipa.dev.windows.local@DEV.WINDOWS.LOCAL renew until 11/02/2021 01:14:02 I see TGS and it looks like psql or curl behavior In KDC logs there only one TGS entry: Oct 26 01:14:10 ldap.dev.windows.local krb5kdc[27198](info): TGS_REQ (8 etypes {aes256-cts-hmac-sha1-96(18), aes128-cts-hmac-sha1-96(17), aes256-cts-hmac-sha384-192(20), aes128-cts-hmac-sha256-128(19), UNSUPPORTED:des3-hmac-sha1(16), DEPRECATED:arcfour-hmac(23), camellia128-cts-cmac(25), camellia256-cts-cmac(26)}) 192.168.1.122: ISSUE: authtime 1635200042, etypes {rep=aes256-cts-hmac-sha1-96(18), tkt=aes256-cts-hmac-sha1-96(18), ses=aes256-cts-hmac-sha1-96(18)}, cdh_test@DEV.WINDOWS.LOCAL for hdfs/cloudera.ipa.dev.windows.local@DEV.WINDOWS.LOCAL It looks strange that Cloudera doesn't support RHEL and Windows. Or maybe some specific configuration is existed that switch on full support those OS and DDoS disappears. Could you comment?
... View more
Labels:
- Labels:
-
Apache Hadoop
-
HDFS
-
Kerberos
-
Security
10-13-2021
02:36 AM
I tried to find why my hadoop cluster makes too many and too ofter requests to KDC after my KDC is down. I use FreeIPA as KDC. I used Active Directory, but it also overloaded when Hadoop works. I researched Cloudera services use native Java Kerberos authentication module but on IBM and Redhat website there is no support for Java native classes for Kerberos. I compared Hadoop services and ETL external services then I found too strange thing: external services ask only one TGT and only one TGS for any acceptor service to connect within my lifetime (default value = 24h), but Hadoop services ask KDC for every operation or internal operation hdfs service asks one TGT every hour (it is less than 24h) but many of TGS when I execute multi-threading hadoop job. For example hbase service asks TGT and TGS everytime when it does something. Also I found that namenode service uses hdfs/_HOST@REALM and HTTP/_HOST@REALM principal then I checked keytab by klist and I found there 2 these keys inside Keytab name: FILE:/var/run/cloudera-scm-agent/process/1740-hdfs-DATANODE/hdfs.keytab KVNO Timestamp Principal ---- ------------------- ------------------------------------------------------ 6 05/25/2021 22:02:44 HTTP/cloudera.ipa.dev.windows.local@DEV.WINDOWS.LOCAL (aes256-cts-hmac-sha1-96) 6 05/25/2021 22:02:44 HTTP/cloudera.ipa.dev.windows.local@DEV.WINDOWS.LOCAL (aes128-cts-hmac-sha1-96) 6 05/25/2021 22:02:44 HTTP/cloudera.ipa.dev.windows.local@DEV.WINDOWS.LOCAL (aes256-cts-hmac-sha384-192) 6 05/25/2021 22:02:44 HTTP/cloudera.ipa.dev.windows.local@DEV.WINDOWS.LOCAL (aes128-cts-hmac-sha256-128) 6 05/25/2021 22:02:44 HTTP/cloudera.ipa.dev.windows.local@DEV.WINDOWS.LOCAL (des3-cbc-sha1) 6 05/25/2021 22:02:44 HTTP/cloudera.ipa.dev.windows.local@DEV.WINDOWS.LOCAL (arcfour-hmac) 1 05/25/2021 22:02:44 hdfs/cloudera.ipa.dev.windows.local@DEV.WINDOWS.LOCAL (aes256-cts-hmac-sha1-96) 1 05/25/2021 22:02:44 hdfs/cloudera.ipa.dev.windows.local@DEV.WINDOWS.LOCAL (aes128-cts-hmac-sha1-96) 1 05/25/2021 22:02:44 hdfs/cloudera.ipa.dev.windows.local@DEV.WINDOWS.LOCAL (aes256-cts-hmac-sha384-192) 1 05/25/2021 22:02:44 hdfs/cloudera.ipa.dev.windows.local@DEV.WINDOWS.LOCAL (aes128-cts-hmac-sha256-128) 2 or more principals are not acceptable in production environment and also it must not be many encryption types except AES256 if it was only setup in KDC config. GSSAPI is supported by many products and FreeIPA or AD to use credential cache for both TGT and TGS. Freeipa doesn't support Java native module cause Java native module related on Solaris only and I must use only Java native GSSAPI via MIT C++ library for client and service applications. It means my Hadoop must be switched to GSSAPI library mode. Hadoop TGT/TGS flood request related on unsupported Hadoop Java method becuase JVM is caching inside and no one app or thread can re-use it. When need to use 2 or more pricipals in keytabs it must be divided to indendent services because standard describes 1 SPN == 1 service process. GSSAPI doesn't support one process with 2+ principals. becuase it is cachable model with strong keytab defining. So my questions are: 1. How to switch Hadoop to GSSAPI use that is compatible with FreeIPA to get only one ticket per 24h for any service inside Hadoop and to get only one TGS per 24h for other services? 2. How to divide HDFS to 2 independent service with hdfs and HTTP principals to complete GSS switching?
... View more
Labels:
10-10-2021
01:43 PM
if you use MIT kerberos server or Freeipa so 'kdc' it is bad workaround because you should make HA for kerberos using DNS as balancing for KDC servers. You need to switch on dns_lookup_kdc=true and it will discover any external realms so if that realm have a trust (for example two-ways) you can use direct connection to any external KDC to get TGT and then to ask TGS from your realm service or to get TGT in your realm and to connect external server with TGS for that service. Java (not Hadoop) doesn't support included configs but when you use execute authentication class the processing goes over sssd that use config to get KDC info. However if your Active Directory domain is second domain level but MIT has the third domain level you will look conflict for routing becase all of you internal realm request will go to AD. It can be solved by adding routing in krb5.conf to [domain_realm] section like: [domain_realm] mit.domain.local = MIT.DOMAIN.LOCAL .mit.domain.local = MIT.DOMAIN.LOCAL host.mit.domain.local = MIT.DOMAIN.LOCAL domain.local = AD.DOMAIN.LOCAL .domain.local = AD.DOMAIN.LOCAL
... View more
10-02-2021
02:47 AM
when some Client application connects to Hadoop service so this Client asks KDC each time it makes request to acceptor service. ex: hdfs dfs -ls /tmp
Also Hadoop service ask another Hadoop service (usually HTTP SPN) I don't know why but I guess it is some status request-response. ex. hdfs/host1@REALM for HTTP/host2@REALM
If I do thousands of request my KDC server goes mad for request amount.
I attempted to create my simple Java client and server application and during the stress-testing I got same KDC DDoS.
I checked Kerberos (GSS) mechanism in another server and application for example PostgreSQL server and psql client app and Apache Web server and curl. Both of them are written on C++. So I cannot configure JAAS config. Also I have to execute kinit each 24 hours for psql application when I get Ticket expired error. Those application create some ticket cache (klist) and re-use for each request.
I thought Java is not supported Kerberos and I was stuck until I found official Oracle article https://docs.oracle.com/en/java/javase/11/security/accessing-native-gss-api.html
I did kinit for both of app session and defined KRB_KTNAME for keytab then I executed with jgss.native argument. It was the wonder I got only one TGT and TGS for my Java server in KDC logs.
Thousands of additional requests with no KDC activity.
I tried for many of different Java applications so it's the only solution for initiator and acceptor mode.
How to use JGSS in Cloudera core? And why it asks KDC too often because one TGT and TGS for 24 hours?
... View more
Labels:
- Labels:
-
Apache Hadoop
-
HDFS
-
Kerberos