Support Questions

Find answers, ask questions, and share your expertise

How to write files into WebHDFS with Nifi?

avatar
Rising Star

I want to write to HDFS with NiFi but NiFi is on a different network so I have to go over WebHDFS (via Knox). I'm trying to use the InvokeHTTP processor and am testing with a simple upstream GetFile. I've tried follow redirects true and including the file in the PUT body but it fails, presumably because the processor can't follow the redirect properly, as outlined in https://hadoop.apache.org/docs/r1.0.4/webhdfs.html#CREATE. So I am going down the path of two InvokeHTTP calls, the first to create the inode (with follow redirects false and no body in the PUT) and the second to PUT the body to the Location returned in the first response.

The first call works and I get a Location header with the datanode that will write my file. But I can't figure out how to pull that Location string out of the response header. The response code (307) and a few other fields are accessible but not that Location string (since it's in the header, and the body is empty). The only reason I know it's coming back is from turning on NiFi debug logging and poring over nifi-app.log (ie it's definitely not an attribute on the flowfile).

This is NiFi 1.0, HDP 2.2, and Java 1.8.0_77.

Any ideas?

1 ACCEPTED SOLUTION

avatar
Master Guru

There was a discussion about this at one point which resulted in this JIRA:

https://issues.apache.org/jira/browse/NIFI-1924

It was determined that rather than creating new processors, it should be possible to change the scheme of the filesystem from hdfs:// to webhdfs:// and still use the existing processors.

It is unclear to me whether this ended up fully working or not.

View solution in original post

7 REPLIES 7

avatar
Master Guru

I have not tried this yet but just a suggestion. @Oliver Meyn have you tried using putHDFS processor? from the target cluster pull the 'core-site.xml' and 'hdfs-site.xml' and store them in a location on the your nifi cluster. reference them in the processor. verify the dns is resolved. if dns can can not be resolved them use IP in site.xml

avatar
Rising Star

Because NiFi is in a different network the access rules block it from even seeing the cluster machines. On top of which it can't see the kdc (so couldn't authenticate) for the cluster network. It has to be through Knox (which means WebHDFS). I'm surprised this appears to be an edge case - would have thought many orgs have different, heavily firewalled networks talking to their clusters.

avatar
Master Guru

There was a discussion about this at one point which resulted in this JIRA:

https://issues.apache.org/jira/browse/NIFI-1924

It was determined that rather than creating new processors, it should be possible to change the scheme of the filesystem from hdfs:// to webhdfs:// and still use the existing processors.

It is unclear to me whether this ended up fully working or not.

avatar
Rising Star

Nice find @Bryan Bende - not an instant solution but there is hope.

avatar
New Contributor

@Oliver Meyn: I'm sitting in front of the same problem: NiFi -> WebHDFS. Did you find a solution?

avatar
Rising Star

Sadly no, @Tilmann Piffl. We ended up with one NiFi outside the HDP cluster network and one inside the cluster network. Then we had the two talk to each other over Site-to-Site and the internal one could write to HDFS directly with PutHDFS.

avatar
New Contributor

Thanks, @Oliver Meyn, I hadn't thought about this approach. It might be a last resort for us, too.