Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

HDFS default permissioning workes weird, CDH5.1

avatar
Explorer

I read the documentation about permissioning in hdfs and it says:

 

1) by default Hadoop checks "groups" output of linux command for a user

2) "supergroup" is default super users group

 

My root directory looks like this:

 

drwxr-xr-x - hdfs supergroup 0 2015-01-27 23:08 /

 

So i would assume only hdfs and users belonging to supergroup would be able to create directory under it. But there is no "supergroup" group on any of my boxes! There is ony "hadoop" one, that contains hdfs, yarn and mapred. 

 

And basically any user that i create, can execute hdfs commands, like hdfs dfs -mkdir /blabla and do whatever he wants. The files created will have set him as an owner, and supergroup as a group. Even though he doesnt belong to "supergroup" neither to "hadoop". 

 

How does it work then? And is there some simple way to prevent it and make it work as in docs, i.e. make hadoop listen to linux permissions (the only access to cluster is through box managed by us anyway, so this would be enough).

1 ACCEPTED SOLUTION

avatar
Mentor
The UI approach is easier, and it serves the same file from the
location you're looking under, but to answer your question, you can
look for the latest (numbered or mtime) directory that carries just
the role name. For NameNodes, the following is the pattern:

NNNN-hdfs-NAMENODE

The other dirs with NNNN-hdfs-NAMENODE-xyz are specific sub-commands
that run on top of a live NN or similar, but the daemon itself runs on
the original dir format mentioned above.

Likewise for other role types.

View solution in original post

5 REPLIES 5

avatar
Explorer

Well, this is embarassing - I just saw in my cluster Cloudera Manager that security (dfs.permissioning) is false... That explains everything.

 

HOWEVER: the reason i was confused is I couldn't see this property set in any of conf files (grep dfs.permissions /etc/hadoop/conf/*.xml). And according to documentation the default value is true. Could anyone please let me know then where does this property get overriden?

avatar
Mentor
The dfs.permissions property is a server-side (NameNode) one. It cannot be overriden by clients.

If you use CM, then /etc/hadoop/conf/ is just where your client properties are. The Server-side generated properties are maintained in isolation elsewhere, which you can view for example by going to CM ->HDFS -> Instances -> NameNode -> Processes -> hdfs-site.xml.

A good doc on understanding the CM approach can also be found at http://www.cloudera.com/content/cloudera/en/documentation/core/latest/topics/cm_intro_primer.html#co...

avatar
Explorer

Good read!

 

Administrators are sometimes surprised that modifying /etc/hadoop/conf and then restarting HDFS has no effect

 

Oh yes. Ok, I think I understand now where server-side configuration comes from. I can find it in CM, altough I still have a bit problem with finding it on a filesystem. When I go to my namenode I see this:

 

 

 

 

root@node9:/var/run/cloudera-scm-agent/process# ls -qlrt | grep NAME
drwxr-x--x 3 hdfs      hdfs      420 Sep 28 12:40 4035-hdfs-NAMENODE
drwxr-x--x 3 hdfs      hdfs      420 Sep 28 13:17 4091-hdfs-NAMENODE-refresh
drwxr-x--x 3 hdfs      hdfs      420 Sep 28 13:18 4092-hdfs-NAMENODE-monitor-decommissioning
drwxr-x--x 3 hdfs      hdfs      420 Sep 28 13:24 4097-hdfs-NAMENODE-refresh
drwxr-x--x 3 hdfs      hdfs      420 Sep 28 13:25 4098-hdfs-NAMENODE-monitor-decommissioning
drwxr-x--x 3 hdfs      hdfs      420 Sep 28 13:25 4100-hdfs-NAMENODE-refresh
drwxr-x--x 3 hdfs      hdfs      420 Sep 28 13:30 4149-hdfs-NAMENODE-refresh
drwxr-x--x 3 hdfs      hdfs      420 Sep 28 13:30 4150-hdfs-NAMENODE-monitor-decommissioning
drwxr-x--x 3 hdfs      hdfs      420 Sep 28 13:30 4152-hdfs-NAMENODE-refresh
drwxr-x--x 3 hdfs      hdfs      420 Sep 28 13:46 4167-hdfs-NAMENODE-createdir
drwxr-x--x 3 hdfs      hdfs      420 Sep 28 13:50 4185-hdfs-NAMENODE
drwxr-x--x 3 hdfs      hdfs      420 Jan 26 16:19 4785-hdfs-NAMENODE-refresh
drwxr-x--x 3 hdfs      hdfs      420 Jan 26 16:19 4787-hdfs-NAMENODE-monitor-decommissioning

 Soo,  which of these contains hdfs-site.xml of my currently running NameNode?

 

 

 

 

 

 

avatar
Mentor
The UI approach is easier, and it serves the same file from the
location you're looking under, but to answer your question, you can
look for the latest (numbered or mtime) directory that carries just
the role name. For NameNodes, the following is the pattern:

NNNN-hdfs-NAMENODE

The other dirs with NNNN-hdfs-NAMENODE-xyz are specific sub-commands
that run on top of a live NN or similar, but the daemon itself runs on
the original dir format mentioned above.

Likewise for other role types.

avatar
Expert Contributor

But this still doesn't answer the original question - why is dfs.permissions.superusergroup defaulted to supergroup, and then CM doesn't create it in Linux?

 

In our case, we also found discrepancies in other default Hadoop user/groups created from the documentation: Guide to Special Users in the Hadoop Environment  For example, hdfs user not assigned to hdfs group, mapred group not created at all.  We are running CDH 5.2.0 on Debian.

 

I've done my share of research, and still find the HDFS user/group permission mechanism confusing.  For the plain Linux-CDH installation without Kerberos (the majority I believe), HDFS relies on Unix user/group permission mechanism, but interprets in its own way.  Thus the confusing and unintuitive behaviors:

 

  1. Unix root has less privilege than 'hdfs' user in HDFS - treated like an (uninitialized) regular user.
  2. No built-in user-admin commands for HDFS similar to Linux useradd, userdel, gpasswd, etc.
  3. No tool to 'migrate' existing Linux users to HDFS in bulk
  4. Hadoop app users that need to create HDFS files (e.g. mapred, flume, mapred, etc.) are not automatically set up in /user
  5. No pre-defined Unix group to include *all* Hadoop app users that need HDFS superuser access
  6. Regular users cannot run 'hdfs fsck /', since staging dir /tmp/logs/hdfs is 770

Perhaps Cloudera can write a more understandable adaptation of the Apache HDFS document.

 

Thanks,

Miles