- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
NiFi PutHDFS Processor - Remote Owner and Remote Group
Created ‎11-05-2015 02:20 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
When writing data to HDFS in the PutHDFS NiFi Processor, the data is owned by "anonymous". I'm trying to find a good way to control the ownership of data landed via this processor.
I looked into Remote Owner and Remote Group, however, those require that the NiFi server is running as the "hdfs" user. This seems like a bad idea to me.
I'm curious why this processor doesn't leverage Hadoop Proxy Users, versus enforcing that the NiFi server runs as hdfs?
Any other workarounds? My initial thought was to stage the data in HDFS with NiFi and use Falcon to move it to it's final location, however, this seems overkill for users that simply want to ingest the data into its final location.
Am I missing something obvious here?
Created ‎11-05-2015 03:11 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Shane, only the 'hdfs' user can change ownership of the files, there's no way around it. In a real production environment one would have security in place with Kerberos, at which point you can specify the Kerberos principal which will be used to write to HDFS.
Without security in place the discussion of data ownership is, IMO, pointless.
Hope this helps.
Created ‎11-05-2015 03:11 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Shane, only the 'hdfs' user can change ownership of the files, there's no way around it. In a real production environment one would have security in place with Kerberos, at which point you can specify the Kerberos principal which will be used to write to HDFS.
Without security in place the discussion of data ownership is, IMO, pointless.
Hope this helps.
Created ‎11-09-2015 04:01 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I don't necessarily agree with this answer. We could avoid needing to change ownership through leveraging proxy users. I hope to find time to write a patch to demonstrate this.
I'd also be interested in how many clusters are actually kerberos enabled. I expect it's lower than you think. Data ownership does matter and provides at least rudimentary controls when the user does not or can not enable Kerberos.
