12-25-2016 11:55 PM
Hi, I'm trying to enable authorization system in Cloudera.
I'm reading this link https://www.cloudera.com/documentation/enterprise/5-3-x/topics/cm_sg_ldap_grp_mappings.html .
Q0: Why can we use LdapGroupsMapping in production environment? I would like to use Apache Zeppline to integreted Apache Spark. I would like to use LDAP as a unifined account system.
Q1: If I use org.apache.hadoop.security.ShellBasedUnixGroupsMapping, Should I create users and groups in EVERY host in my cluster?
Q2: If I use org.apache.hadoop.security.LdapGroupsMapping. When new users and groups are created, will they sync to EVERY host in my cluster?
Q3:When adding new service in Cloudera Manager, for example, kafka service, will `kafka` user created both in LDAP database and EVERY host in my cluster?
Q4: I've enabled MIT kerberos in my cluster. Can I submit task from Windows IDE with proper kerberos keytab files. For example, using impyla in Python in Windows machine.
12-28-2016 11:36 PM
A0: While using LDAP as a "unified account system", Cloudera recommends against leveraging LDAP Group Mapping. I'll repost the Note on the page you mentioned:
Important: Cloudera strongly recommends against using Hadoop's LdapGroupsMapping provider. LdapGroupsMapping should only be used in cases where OS-level integration is not possible. Production clusters require an identity provider that works well with all applications, not just Hadoop. Hence, often the preferred mechanism is to use tools such as SSSD, VAS or Centrify to replicate LDAP groups.
The idea is to allow tools that were designed for unix account integration with LDAP/Active Directory, etc.
You could enable LDAP Groups Mapping for HDFS, but only HDFS would know about users/groups. The OS would not know about them.
A1: Yes, each host should have the same set of users. Two common methods of managing this (without having to manually update every host's passwd and group files:
- Tools such as SSSD, VAS, and Centrify allow hosts to retrieve user information from one location. As long as each host in the cluster is configured to use the tool, each host can find a singular entry in LDAP (hdfs user for instance)
- Puppet, Chef, or other automation tools can be used to push out passwd/group changes to all hosts.
A2: No. There is no "syncing" for LDAP Groups Mapping; rather, there is one LDAP entry that services will reference.
A3: By default, Cloudera Manager has "Create Users and Groups, and Apply File Permissions for Parcels" enabled. When the parcel is activated, the agents on each host managed by that Cloudera Manager will create local users and groups if that setting is enabled. It won't create them in LDAP, though.
A4: I'm affraid I don't understand the question completely, so I'll answer generally. As long as your client has the proper configuration and credentials to authenticate, it should be able to work.
I hope that all helps.
12-28-2016 11:48 PM - edited 12-29-2016 12:11 AM
Q1. I've tried ansible to maitain users and groups. It works.
Q4. I mean. I would like to use python with PyCharm in Windows, in order to get an iteractive shell. In both following cases, I've kinit using MIT kerberos for Windows.
case1: impyla in Python scripts.
case2: pyspark in Python scripts.