- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
How to write files into WebHDFS with Nifi?
- Labels:
-
Apache Hadoop
-
Apache NiFi
Created ‎10-03-2016 08:04 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I want to write to HDFS with NiFi but NiFi is on a different network so I have to go over WebHDFS (via Knox). I'm trying to use the InvokeHTTP processor and am testing with a simple upstream GetFile. I've tried follow redirects true and including the file in the PUT body but it fails, presumably because the processor can't follow the redirect properly, as outlined in https://hadoop.apache.org/docs/r1.0.4/webhdfs.html#CREATE. So I am going down the path of two InvokeHTTP calls, the first to create the inode (with follow redirects false and no body in the PUT) and the second to PUT the body to the Location returned in the first response.
The first call works and I get a Location header with the datanode that will write my file. But I can't figure out how to pull that Location string out of the response header. The response code (307) and a few other fields are accessible but not that Location string (since it's in the header, and the body is empty). The only reason I know it's coming back is from turning on NiFi debug logging and poring over nifi-app.log (ie it's definitely not an attribute on the flowfile).
This is NiFi 1.0, HDP 2.2, and Java 1.8.0_77.
Any ideas?
Created ‎10-05-2016 01:53 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
There was a discussion about this at one point which resulted in this JIRA:
https://issues.apache.org/jira/browse/NIFI-1924
It was determined that rather than creating new processors, it should be possible to change the scheme of the filesystem from hdfs:// to webhdfs:// and still use the existing processors.
It is unclear to me whether this ended up fully working or not.
Created ‎10-05-2016 04:51 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have not tried this yet but just a suggestion. @Oliver Meyn have you tried using putHDFS processor? from the target cluster pull the 'core-site.xml' and 'hdfs-site.xml' and store them in a location on the your nifi cluster. reference them in the processor. verify the dns is resolved. if dns can can not be resolved them use IP in site.xml
Created ‎10-05-2016 01:11 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Because NiFi is in a different network the access rules block it from even seeing the cluster machines. On top of which it can't see the kdc (so couldn't authenticate) for the cluster network. It has to be through Knox (which means WebHDFS). I'm surprised this appears to be an edge case - would have thought many orgs have different, heavily firewalled networks talking to their clusters.
Created ‎10-05-2016 01:53 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
There was a discussion about this at one point which resulted in this JIRA:
https://issues.apache.org/jira/browse/NIFI-1924
It was determined that rather than creating new processors, it should be possible to change the scheme of the filesystem from hdfs:// to webhdfs:// and still use the existing processors.
It is unclear to me whether this ended up fully working or not.
Created ‎10-05-2016 02:42 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Nice find @Bryan Bende - not an instant solution but there is hope.
Created ‎10-09-2017 12:06 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Oliver Meyn: I'm sitting in front of the same problem: NiFi -> WebHDFS. Did you find a solution?
Created ‎10-09-2017 05:24 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Sadly no, @Tilmann Piffl. We ended up with one NiFi outside the HDP cluster network and one inside the cluster network. Then we had the two talk to each other over Site-to-Site and the internal one could write to HDFS directly with PutHDFS.
Created ‎10-10-2017 06:36 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks, @Oliver Meyn, I hadn't thought about this approach. It might be a last resort for us, too.
