05-04-2017 07:52 PM
I am attempting to configure a BDR backup from a secured (kerberos & Sentry with HDFS permission synchronization enabled) CDH 5.9.0 cluster to S3. I can successfully use BDR to backup my own data (e.g. /users/myname) but now I want to backup some Hive/Impala data that is protected by Sentry. I am using HDFS rather than Hive replication (I don't believe this is material to the question).
If I configure BDR to run using my own userid, which happens to have full access according to Sentry permissions this results in an AccessControlException
org.apache.hadoop.security.AccessControlException: Permission denied: user=myuser, access=READ, inode="/data":hive:hive:drwxrwx--x
I would have thought that the fact that Sentry has been configured to synchronize HDFS permissions would have meant that I could run this.
According to https://www.cloudera.com/documentation/enterprise/5-9-x/topics/cm_bdr_hive_replication.html when Kerberos is in use it is necessary to use a user with an ID greater than 1000, so this rules out the hdfs and hive users.
It also states that read and execute permissions are needed on the source cluster for BDR to operate.
So if my user cannot be used this would means I need to create a BDR user account that has these permissions.
The directories I want to back up are protected with Sentry, so as per https://www.cloudera.com/documentation/enterprise/5-9-x/topics/sg_sentry_service_config.html#concept... these directories have permissions as follows
$ hdfs dfs -chown hive:hive /data $ hdfs dfs -chmod 771 /data
Continuing down this path, to be able to use BDR I will need to use an extended ACL to assign rx permissions on the relevant directories to the user.
To cater for new directories that come along I am thinking that it would also be necessary to add the sticky bit on this operation. Does the following seem reasonable (running as hdfs user with relevant keytab)?
$ hdfs dfs -setfacl -R -m group:backup_users:r-xt /data
Information on using the sticky bit is thin on the ground; is this even supported and supported for extended ACLs?
Is there something I am missing that makes BDR with a kerberos enabled cluster easier than this?
05-26-2017 06:28 AM
The trick is to create a BDR specific user and add them to the hive or supergroup groups as relevant for hive or general hdfs backups. No facl or sticky bits are required.