Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

NiFi PutHDFS Processor - Remote Owner and Remote Group

Solved Go to solution

NiFi PutHDFS Processor - Remote Owner and Remote Group

Contributor

When writing data to HDFS in the PutHDFS NiFi Processor, the data is owned by "anonymous". I'm trying to find a good way to control the ownership of data landed via this processor.

I looked into Remote Owner and Remote Group, however, those require that the NiFi server is running as the "hdfs" user. This seems like a bad idea to me.

I'm curious why this processor doesn't leverage Hadoop Proxy Users, versus enforcing that the NiFi server runs as hdfs?

Any other workarounds? My initial thought was to stage the data in HDFS with NiFi and use Falcon to move it to it's final location, however, this seems overkill for users that simply want to ingest the data into its final location.

Am I missing something obvious here?

1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted

Re: NiFi PutHDFS Processor - Remote Owner and Remote Group

Shane, only the 'hdfs' user can change ownership of the files, there's no way around it. In a real production environment one would have security in place with Kerberos, at which point you can specify the Kerberos principal which will be used to write to HDFS.

Without security in place the discussion of data ownership is, IMO, pointless.

Hope this helps.

View solution in original post

2 REPLIES 2
Highlighted

Re: NiFi PutHDFS Processor - Remote Owner and Remote Group

Shane, only the 'hdfs' user can change ownership of the files, there's no way around it. In a real production environment one would have security in place with Kerberos, at which point you can specify the Kerberos principal which will be used to write to HDFS.

Without security in place the discussion of data ownership is, IMO, pointless.

Hope this helps.

View solution in original post

Highlighted

Re: NiFi PutHDFS Processor - Remote Owner and Remote Group

Contributor

I don't necessarily agree with this answer. We could avoid needing to change ownership through leveraging proxy users. I hope to find time to write a patch to demonstrate this.

I'd also be interested in how many clusters are actually kerberos enabled. I expect it's lower than you think. Data ownership does matter and provides at least rudimentary controls when the user does not or can not enable Kerberos.

Don't have an account?
Coming from Hortonworks? Activate your account here