Support Questions
Find answers, ask questions, and share your expertise

Hadoop Kerberos optimization help

New Contributor

I tried  to find why my hadoop cluster makes too many and too ofter requests to KDC after my KDC is down.

I use FreeIPA as KDC. I used Active Directory, but it also overloaded when Hadoop works.

 

I researched Cloudera services use native Java Kerberos authentication module but on IBM and Redhat website there is no support for Java native classes for Kerberos.

 

I compared Hadoop services and ETL external services then I found too strange thing:

external services ask only one TGT and only one TGS for any acceptor service to connect within my lifetime (default value = 24h),

but Hadoop services ask KDC for every operation or internal operation

hdfs service asks one TGT every hour (it is less than 24h) but many of TGS when I execute multi-threading hadoop job.

For example hbase service asks TGT and TGS everytime when it does something.

 

Also I found that namenode service uses hdfs/_HOST@REALM and HTTP/_HOST@REALM principal  then I checked keytab by klist and I found there 2 these keys inside

 

Keytab name: FILE:/var/run/cloudera-scm-agent/process/1740-hdfs-DATANODE/hdfs.keytab
KVNO Timestamp Principal
---- ------------------- ------------------------------------------------------
6 05/25/2021 22:02:44 HTTP/cloudera.ipa.dev.windows.local@DEV.WINDOWS.LOCAL (aes256-cts-hmac-sha1-96)
6 05/25/2021 22:02:44 HTTP/cloudera.ipa.dev.windows.local@DEV.WINDOWS.LOCAL (aes128-cts-hmac-sha1-96)
6 05/25/2021 22:02:44 HTTP/cloudera.ipa.dev.windows.local@DEV.WINDOWS.LOCAL (aes256-cts-hmac-sha384-192)
6 05/25/2021 22:02:44 HTTP/cloudera.ipa.dev.windows.local@DEV.WINDOWS.LOCAL (aes128-cts-hmac-sha256-128)
6 05/25/2021 22:02:44 HTTP/cloudera.ipa.dev.windows.local@DEV.WINDOWS.LOCAL (des3-cbc-sha1)
6 05/25/2021 22:02:44 HTTP/cloudera.ipa.dev.windows.local@DEV.WINDOWS.LOCAL (arcfour-hmac)
1 05/25/2021 22:02:44 hdfs/cloudera.ipa.dev.windows.local@DEV.WINDOWS.LOCAL (aes256-cts-hmac-sha1-96)
1 05/25/2021 22:02:44 hdfs/cloudera.ipa.dev.windows.local@DEV.WINDOWS.LOCAL (aes128-cts-hmac-sha1-96)
1 05/25/2021 22:02:44 hdfs/cloudera.ipa.dev.windows.local@DEV.WINDOWS.LOCAL (aes256-cts-hmac-sha384-192)
1 05/25/2021 22:02:44 hdfs/cloudera.ipa.dev.windows.local@DEV.WINDOWS.LOCAL (aes128-cts-hmac-sha256-128)

 

2 or more principals are not acceptable in production environment and also it must not be many encryption types except AES256 if it was only setup in KDC config.

GSSAPI is supported by many products and FreeIPA or AD to use credential cache for both TGT and TGS.

Freeipa doesn't support Java native module cause Java native module related on Solaris only and I must use only Java native GSSAPI via MIT C++ library for client and service applications.

It means my Hadoop must be switched to GSSAPI library mode.

Hadoop TGT/TGS flood request related on unsupported Hadoop Java method becuase JVM is caching inside and no one app or thread can re-use it.

When need to use 2 or more pricipals in keytabs it must be divided to indendent services because standard describes 1 SPN == 1 service process.

GSSAPI doesn't support one process with 2+ principals. becuase it is cachable model with strong keytab defining.

 

 

So my questions are:

1. How to switch Hadoop to GSSAPI use that is compatible with FreeIPA to get only one ticket per 24h for any service inside Hadoop and to get only one TGS per 24h for other services?

2. How to divide HDFS to 2 independent service with hdfs and HTTP principals to complete GSS switching?

0 REPLIES 0