I have setup our Clouder cluster with Kerbers + AD.
A user authenticate through MIT Kerberos to AD, and user/group info is read through ShellBasedUnixGroupsMapping to AD.
If we run MR job with 300 simultaneous mappers (total 7000 mappers), our domain controller in AD hits CPU load of 100% throughout the job execution.
I executed MR job using Hive after appropritately authenticating with Kerberos server (kinit).
Why would MR job cause such CPU load?
My initial thought was MR authenticate and gets user/group info once when job starts or hive-shell stats. Apparently my job is constantly overloading AD resources.
Does the job casue CPU spike each time mapper gets created? or is it each time mapper access HDFS?
And what would be the best way to resolve this issue?
I was thinking of configuring separate NIS server for user/group mapping, however you came in at right time to save me :D
NSCD in fact resolved the issue. NSCD was turned off on chkconfig by default, we switched it on, now no more heavy CPU load on dc servers.
Now I wonder why hadoop.security.groups.cache.secs didn't have effect from the beginning?
We haven't specifically configured it on our system, but isn't it set to 300sec by default?
Harsh, i know you helped me couple times in the past, thank you very much for your support!
The security group cache will only work within a JVM. If you have lots of JVM's or short lived JVM's for your jobs then caching inside the JVM will only give a limited relief.
NSCD will prevent the OS going out for every call that is made and works over different JVM's. So instead of having one call per JVM you will now have one call for a lot of JVM's