Support Questions

Find answers, ask questions, and share your expertise

How to get the list of users

avatar
Expert Contributor

Recently I was given an Hadoop Eco system to support. In this system, there is no Ranger, LDAP..etc etc..and the access was given directly to the boxes. Could you please suggest me some ways to get the list of users who uses our Hadoop System ?

6 REPLIES 6

avatar
Super Guru
@Kumar Veerappan

First the easy part. Let's assume Kerberos is enabled. Run "listprincs" using "kadmin" to find the service principals. Without LDAP and Kerberos enabled these are the users who have access to your cluster.

If Kerberos is not enabled, then pretty much all users on your cluster machines should be able to access your cluster.

avatar
Expert Contributor

@mqureshi Kerberos is not enabled.

Is it enough if I pull the users from the name node and admin node ?

avatar
Super Guru

@Kumar Veerappan

Without Kerberos, pretty much anyone can access your cluster. Your list of users who can access cluster is anyone who has access to the linux machines where cluster is running.

avatar

@Kumar Veerappan

We can find the list of users in linux and Hadoop as well:

1) List of users in Linux:

awk -F':' '{ print $1}' /etc/passwd

2) List of user in hadoop:

Go to Hue:

Look for user admin tab like hive & pig tabs in Hue

3) Check the users in HDFS:

ideally we can find user directory in hdfs as well /usr/<userID>/

avatar
Expert Contributor

+1, small correction that the HDFS directories will be under "/user" not "/usr":

hdfs dfs -ls /user

avatar
Super Guru

@Kumar Veerappana

Assuming that you are only interested who has access to Hadoop services, extract all OS users from all nodes by checking /etc/passwd file content. Some of them are legitimate users needed by Hadoop tools, e.g. hive, hdfs, etc.For hdfs, they will have a /user/username folder in hdfs. You can see that with hadoop -fs ls -l /user executed as a user member of the hadoop group. If they have access to hive client, they are able to also perform DDL and DML actions in Hive.

The above will allow you to understand the current state, however, this is your opportunity to improve security even without the bells and whistles of Kerberos/LDAP/Ranger. You can force the users to access Hadoop ecosystem client services via a few client/edge nodes, where only client services are running, e.g. Hive client. Users, other than power users, should not have accounts on name node, admin node or data nodes. Any user that can access those nodes where client services are running can access those services, e.g. hdfs or Hive.