Support Questions

Find answers, ask questions, and share your expertise

NiFi PutHDFS Processor - Remote Owner and Remote Group

avatar
Rising Star

When writing data to HDFS in the PutHDFS NiFi Processor, the data is owned by "anonymous". I'm trying to find a good way to control the ownership of data landed via this processor.

I looked into Remote Owner and Remote Group, however, those require that the NiFi server is running as the "hdfs" user. This seems like a bad idea to me.

I'm curious why this processor doesn't leverage Hadoop Proxy Users, versus enforcing that the NiFi server runs as hdfs?

Any other workarounds? My initial thought was to stage the data in HDFS with NiFi and use Falcon to move it to it's final location, however, this seems overkill for users that simply want to ingest the data into its final location.

Am I missing something obvious here?

1 ACCEPTED SOLUTION

avatar

Shane, only the 'hdfs' user can change ownership of the files, there's no way around it. In a real production environment one would have security in place with Kerberos, at which point you can specify the Kerberos principal which will be used to write to HDFS.

Without security in place the discussion of data ownership is, IMO, pointless.

Hope this helps.

View solution in original post

2 REPLIES 2

avatar

Shane, only the 'hdfs' user can change ownership of the files, there's no way around it. In a real production environment one would have security in place with Kerberos, at which point you can specify the Kerberos principal which will be used to write to HDFS.

Without security in place the discussion of data ownership is, IMO, pointless.

Hope this helps.

avatar
Rising Star

I don't necessarily agree with this answer. We could avoid needing to change ownership through leveraging proxy users. I hope to find time to write a patch to demonstrate this.

I'd also be interested in how many clusters are actually kerberos enabled. I expect it's lower than you think. Data ownership does matter and provides at least rudimentary controls when the user does not or can not enable Kerberos.