Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Authorization questions LDAP

avatar
Rising Star

 Hi, I'm trying to enable authorization system in Cloudera.

I'm reading this link https://www.cloudera.com/documentation/enterprise/5-3-x/topics/cm_sg_ldap_grp_mappings.html .

Q0: Why can we use LdapGroupsMapping in production environment? I would like to use Apache Zeppline to integreted Apache Spark. I would like to use LDAP as a unifined account system.

Q1: If I use org.apache.hadoop.security.ShellBasedUnixGroupsMapping, Should I create users and groups in EVERY host in my cluster?

Q2: If I use org.apache.hadoop.security.LdapGroupsMapping. When new users and groups are created, will they sync to EVERY host in my cluster?

Q3:When adding new service in Cloudera Manager, for example, kafka service, will `kafka` user created both in LDAP database and EVERY host in my cluster?

Q4: I've enabled MIT kerberos in my cluster. Can I submit task from Windows IDE with proper kerberos keytab files. For example, using impyla in Python in Windows machine.

1 ACCEPTED SOLUTION

avatar
Master Guru

A0: While using LDAP as a "unified account system", Cloudera recommends against leveraging LDAP Group Mapping.  I'll repost the Note on the page you mentioned:

Important: Cloudera strongly recommends against using Hadoop's LdapGroupsMapping provider. LdapGroupsMapping should only be used in cases where OS-level integration is not possible. Production clusters require an identity provider that works well with all applications, not just Hadoop. Hence, often the preferred mechanism is to use tools such as SSSD, VAS or Centrify to replicate LDAP groups.

 

The idea is to allow tools that were designed for unix account integration with LDAP/Active Directory, etc.

You could enable LDAP Groups Mapping for HDFS, but only HDFS would know about users/groups.  The OS would not know about them.

 

A1: Yes, each host should have the same set of users.  Two common methods of managing this (without having to manually update every host's passwd and group files:

 

- Tools such as SSSD, VAS, and Centrify allow hosts to retrieve user information from one location.  As long as each host in the cluster is configured to use the tool, each host can find a singular entry in LDAP (hdfs user for instance)

- Puppet, Chef, or other automation tools can be used to push out passwd/group changes to all hosts.

 

A2: No. There is no "syncing" for LDAP Groups Mapping; rather, there is one LDAP entry that services will reference.

 

A3: By default, Cloudera Manager has "Create Users and Groups, and Apply File Permissions for Parcels" enabled.  When the parcel is activated, the agents on each host managed by that Cloudera Manager will create local users and groups if that setting is enabled.  It won't create them in LDAP, though.

 

A4: I'm affraid I don't understand the question completely, so I'll answer generally.  As long as your client has the proper configuration and credentials to authenticate, it should be able to work.

 

I hope that all helps.

 

Regards,

 

Ben

View solution in original post

4 REPLIES 4

avatar
Master Guru

A0: While using LDAP as a "unified account system", Cloudera recommends against leveraging LDAP Group Mapping.  I'll repost the Note on the page you mentioned:

Important: Cloudera strongly recommends against using Hadoop's LdapGroupsMapping provider. LdapGroupsMapping should only be used in cases where OS-level integration is not possible. Production clusters require an identity provider that works well with all applications, not just Hadoop. Hence, often the preferred mechanism is to use tools such as SSSD, VAS or Centrify to replicate LDAP groups.

 

The idea is to allow tools that were designed for unix account integration with LDAP/Active Directory, etc.

You could enable LDAP Groups Mapping for HDFS, but only HDFS would know about users/groups.  The OS would not know about them.

 

A1: Yes, each host should have the same set of users.  Two common methods of managing this (without having to manually update every host's passwd and group files:

 

- Tools such as SSSD, VAS, and Centrify allow hosts to retrieve user information from one location.  As long as each host in the cluster is configured to use the tool, each host can find a singular entry in LDAP (hdfs user for instance)

- Puppet, Chef, or other automation tools can be used to push out passwd/group changes to all hosts.

 

A2: No. There is no "syncing" for LDAP Groups Mapping; rather, there is one LDAP entry that services will reference.

 

A3: By default, Cloudera Manager has "Create Users and Groups, and Apply File Permissions for Parcels" enabled.  When the parcel is activated, the agents on each host managed by that Cloudera Manager will create local users and groups if that setting is enabled.  It won't create them in LDAP, though.

 

A4: I'm affraid I don't understand the question completely, so I'll answer generally.  As long as your client has the proper configuration and credentials to authenticate, it should be able to work.

 

I hope that all helps.

 

Regards,

 

Ben

avatar
Rising Star

Thanks, bgooley.

 

Q1. I've tried ansible to maitain users and groups. It works.

 

Q4. I mean. I would like to use python with PyCharm in Windows, in order to get an iteractive shell. In both following cases, I've kinit using MIT kerberos for Windows.

 

case1: impyla in Python scripts.

 

 

https://community.cloudera.com/t5/Interactive-Short-cycle-SQL/connect-kerberozied-cluster-from-impyl...

 

case2: pyspark in Python scripts.

 

https://community.cloudera.com/t5/Advanced-Analytics-Apache-Spark/connect-kerberozied-cluster-from-p...

avatar
Master Guru

For the impyla issue, I believe Git is a good place to look for assistance too.

I see there is already a discussion in Git: https://github.com/cloudera/impyla/issues/233

 

 

avatar
Rising Star
OK. Actually, It was me that started that isse in Github 🙂