Created 08-19-2016 03:13 PM
Recently I was given an Hadoop Eco system to support. In this system, there is no Ranger, LDAP..etc etc..and the access was given directly to the boxes. Could you please suggest me some ways to get the list of users who uses our Hadoop System ?
Created 08-19-2016 03:54 PM
First the easy part. Let's assume Kerberos is enabled. Run "listprincs" using "kadmin" to find the service principals. Without LDAP and Kerberos enabled these are the users who have access to your cluster.
If Kerberos is not enabled, then pretty much all users on your cluster machines should be able to access your cluster.
Created 08-23-2016 03:53 PM
@mqureshi Kerberos is not enabled.
Is it enough if I pull the users from the name node and admin node ?
Created 08-23-2016 03:53 PM
Without Kerberos, pretty much anyone can access your cluster. Your list of users who can access cluster is anyone who has access to the linux machines where cluster is running.
Created 08-23-2016 04:15 PM
We can find the list of users in linux and Hadoop as well:
1) List of users in Linux:
awk -F':' '{ print $1}' /etc/passwd
2) List of user in hadoop:
Go to Hue:
Look for user admin tab like hive & pig tabs in Hue
3) Check the users in HDFS:
ideally we can find user directory in hdfs as well /usr/<userID>/
Created 08-23-2016 04:53 PM
+1, small correction that the HDFS directories will be under "/user" not "/usr":
hdfs dfs -ls /user
Created 08-23-2016 10:28 PM
Assuming that you are only interested who has access to Hadoop services, extract all OS users from all nodes by checking /etc/passwd file content. Some of them are legitimate users needed by Hadoop tools, e.g. hive, hdfs, etc.For hdfs, they will have a /user/username folder in hdfs. You can see that with hadoop -fs ls -l /user executed as a user member of the hadoop group. If they have access to hive client, they are able to also perform DDL and DML actions in Hive.
The above will allow you to understand the current state, however, this is your opportunity to improve security even without the bells and whistles of Kerberos/LDAP/Ranger. You can force the users to access Hadoop ecosystem client services via a few client/edge nodes, where only client services are running, e.g. Hive client. Users, other than power users, should not have accounts on name node, admin node or data nodes. Any user that can access those nodes where client services are running can access those services, e.g. hdfs or Hive.