Created 08-17-2016 04:59 PM
This is a kerberized instance of HDFS. Is it possible for flowfiles themselves to specify the HDFS user, or only on a per processor basis?
Some of our use cases call for allowing business users to define their own ingest streams and push to varying HDFS directories owned by different users, and generating and distributing keytabs to business users seems less than appealing.
Created 08-17-2016 05:22 PM
You can set the "Remote Owner" attribute to the user you want to own the files in HDFS. You can set "Remote Group" as well. Both of these are at the processor level and do not support Expression Language, so you'd have to set them for the processor. You could use a RouteOnAttribute processor to determine which user should own the files in HDFS and route the flow to the proper PutHDFS processor, but this will be more cumbersome than distributing keytabs to the users.
In a secure environment, the users would likely need to have their keytab to write to HDFS anyway since you'd have to authenticate somehow and there's not a way presently to pass a Kerberos ticket to NiFi.
Created 08-17-2016 05:22 PM
You can set the "Remote Owner" attribute to the user you want to own the files in HDFS. You can set "Remote Group" as well. Both of these are at the processor level and do not support Expression Language, so you'd have to set them for the processor. You could use a RouteOnAttribute processor to determine which user should own the files in HDFS and route the flow to the proper PutHDFS processor, but this will be more cumbersome than distributing keytabs to the users.
In a secure environment, the users would likely need to have their keytab to write to HDFS anyway since you'd have to authenticate somehow and there's not a way presently to pass a Kerberos ticket to NiFi.
Created 08-17-2016 09:08 PM
Thanks for the suggestions. On further thought, what about using a single PutHDFS with Directory set by flowfiles.
Ranger HDFS permissions are set to allow the NiFi user to write to specific ingest directories, and downstream consumers should have Ranger HDFS read permissions on the ingest directories necessary for their application.