- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Using NiFi to load data from localFS to HDFS
- Labels:
-
Apache NiFi
Created on ‎05-29-2018 06:31 AM - edited ‎08-17-2019 08:51 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I'm trying to load a csv from a 'local FS' into HDFS using NiFi. But I can't seem to figure out what I'm doing wrong.
Based on this thread from @Matt Clarke-
I've configured the GetFile and PutHDFS procs as shown-
And here's the file system-
I feel like there's something simple I'm missing, but I'm struggling with it nonetheless. Can anyone spot where I've gone wrong?
Created on ‎05-29-2018 06:35 AM - edited ‎08-17-2019 08:51 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
NiFi UI view-
Created ‎05-29-2018 02:05 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Not sure what issue you are exactly seeing here based on your screenshots.
Your dataflow consists of two processors and I see no queued FlowFiles.
Is the GetFile retrieving your 2008.csv file? Based on your configuration it should be retrieving it non stop (keep source file = true).
What user owns the running nifi software on this same local machine? (# ps - ef |grep nifi)
If you become that user, can you access the file you are trying to consume with NiFi?
What do you see in the nifi-app.log when you "start" the GetFile processor? (Any WARN or ERROR logs)
-
Thanks,
Matt
Created ‎05-30-2018 12:47 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
*** Important Forum tip: Please try to avoid responding to an existing "Answer" by starting a new "Answer". Instead use the the "Add comment" to respond to an existing "Answer". The forum offers no guaranteed order to answers which can make following a conversation difficult.
Created ‎05-30-2018 09:24 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Per your suggestion, I turned on debugging in the nifi-app.log file and captured about 10 seconds of a run. Files are still not being ingested and I'm starting to ret really frustrated
Log file attachednifi-applog.zip
Created ‎05-31-2018 10:19 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Does the listFile exhibit the same behavior or does it list your file correctly?
-
The fact that the logs shows it the processor yielding tells me it found no work to do (meaning no files to list). It yields so that it does not consume not stop CPU looking for work that does not exist.
-
Did you check your properties for leading or trailing whitespace?
Did you try removing the "\" from your file filter?
-
Thanks,
Matt
Created ‎05-30-2018 12:59 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Did you try becoming the nifi user (sudo su - nifi) and try to navigate the the 2008.csv file and view it?
-
Another option would be to enable debug logging on the getFile processor to get more details on what is going on here. Doing so requires you to add a new line to the NiFi logback.xml file:
-
<logger name="org.apache.nifi.processors.standard.GetFile" level="DEBUG"/>
-
No need to restart Nifi when editing the logback.xml file (this is one of the only conf file in NiFi you can edit that will not require a restart).
-
Wait 30 seconds after adding this line. Then star tailing the nifi app log:
# tail -F ../logs/nifi-app.log
-
Go to you canvas and start or stop then start your GetFile processor. What output do you see in the nifi-app.log?
-
Thanks,
Matt
Created ‎05-30-2018 01:05 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You may also want to verify your processor configuration for both input directory and file filter to make sure you do not have any leading or trailing spaces. Spaces are treated as valid characters to NiFi which can result in NiFi not finding the file or even the directory.
Created on ‎05-29-2018 10:40 PM - edited ‎08-17-2019 08:51 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Your observations are correct, the processors seem to be configured fine, but the GetFile doesn't retrieve the CSV.
I ran the ps, but not sure where to find the user that owns nifi (see screenshot)
I've attached the latest log file. I don't see any WARN or ERROR logs...nifi-applog.zip
Created on ‎05-29-2018 11:22 PM - edited ‎08-17-2019 08:51 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quick update:
I tried changing the owner:group to nifi:nifi based on the ps -ef | grep nifi
Still not seeing any queued FlowFiles nor am I seeing anything in hdfs
