Reply
New Contributor
Posts: 2
Registered: ‎10-01-2017

Group permissions for files written using Crunch Pipeline

The reducer part files in hive that are written using Apache Crunch pipelines has the
correct username but the groupname is marked as '*supergroup*', to which
the user does not belong to.

 

How does group membership is derived by Apache Crunch? Also, is there a way to
specify the group information to be used for those part files in Apache Crunch
configuration?

Posts: 375
Topics: 11
Kudos: 58
Solutions: 32
Registered: ‎09-02-2016

Re: Group permissions for files written using Crunch Pipeline

@Amirdha

 

In general, if your environment is Kerberoized and kinit with hdfs.keytab then if you create a folder, it will show the owner as hdfs and group as supergroup. Basically the supergroup appears based on your setting in ClouderaManager -> HDFS -> Configuration -> dfs.permissions.superusergroup

 

Ex:
drwxr-xr-x - hdfs supergroup 0 2016-10-02 11:45 /user

 

But the underneath folder & group should be owned by your userid/batchid.

Ex:
drwxr-xr-x - abc123 abc123 0 2017-08-11 16:38 /user/abc123

 

But there are some cases where the userid/batchid folder owned by supergroup, then the files created inside the folder might belongs to supergroup as well. Please work with your admin to change the parent folder group id (or) you can change the file group as follows... it may help you

 

Ex:

sudo -u hdfs hdfs dfs -chown abc123:abc123 /user/abc123/file1.txt

 

 

New Contributor
Posts: 2
Registered: ‎10-01-2017

Re: Group permissions for files written using Crunch Pipeline

Thanks Saranvisa.

 

On a closer look, the issue was identified with the tmp folders of Crunch pipelines. By default, crunch uses /tmp directory to store intermediate and final output, and at last copies the output to actual destination. The group ownership of /tmp is supergroup, and hence when copied to actual destination, the group ownership remains the same.

 

The solution that we are planning is to change the tmp directory location to another directory which has the correct group owner.

Announcements