Created 10-31-2016 02:34 PM
Hi,
I have a fundamental query on how permissions work in hadoop.
We are setting up a cluster with master nodes, data nodes and edge nodes. Edge nodes are the ones exposed to outside world and all hadoop clients are installed on these machines. External applications stage their data on edge nodes first and then load them into hadoop. We are implementing security to our clusters and thinking to have data ownership and permissions defined through Ranger policies to the app-usr for both HDFS and Hive data.
So if a application user app-usr is only given login access to edge nodes (through Active Directory groups), will the user be able to own any data in hadoop? For example, can I have a HDFS directory or Hive table that is owned by app-usr though the user is not available on the master nodes or data nodes but only on edge nodes. Will this allow me to configure Ranger policies for that user? Or should the user be able to login to all the nodes in the cluster?
Looking for ideas on the best strategy around this. Thanks
Created 10-31-2016 03:17 PM
I believe you need to integrate your Hadoop cluster to AD including Ranger usersync to define policies for app-user.
Created 10-31-2016 03:17 PM
I believe you need to integrate your Hadoop cluster to AD including Ranger usersync to define policies for app-user.
Created 10-31-2016 03:22 PM
Thanks for your response. Yes, cluster is integrated to AD and ranger-usersync is enabled. My question is around whether its needed to allow the app-usr to be able to login to master nodes and edge nodes vs just visible from these nodes. For security reasons, we wanted to disallow application users from logging into master nodes and data nodes.
Created 10-31-2016 03:24 PM
@bigdata.neophyte - I think login access to the edge node is enough. Other nodes will have information about this user from AD so logically it should work.
Created 10-31-2016 05:07 PM
Thanks @Kuldeep Kulkarni