Created on 05-29-2018 06:31 AM - edited 08-17-2019 08:51 PM
I'm trying to load a csv from a 'local FS' into HDFS using NiFi. But I can't seem to figure out what I'm doing wrong.
Based on this thread from @Matt Clarke-
I've configured the GetFile and PutHDFS procs as shown-
And here's the file system-
I feel like there's something simple I'm missing, but I'm struggling with it nonetheless. Can anyone spot where I've gone wrong?
Created on 05-29-2018 06:35 AM - edited 08-17-2019 08:51 PM
NiFi UI view-
Created 05-29-2018 02:05 PM
Not sure what issue you are exactly seeing here based on your screenshots.
Your dataflow consists of two processors and I see no queued FlowFiles.
Is the GetFile retrieving your 2008.csv file? Based on your configuration it should be retrieving it non stop (keep source file = true).
What user owns the running nifi software on this same local machine? (# ps - ef |grep nifi)
If you become that user, can you access the file you are trying to consume with NiFi?
What do you see in the nifi-app.log when you "start" the GetFile processor? (Any WARN or ERROR logs)
-
Thanks,
Matt
Created 05-30-2018 12:47 PM
*** Important Forum tip: Please try to avoid responding to an existing "Answer" by starting a new "Answer". Instead use the the "Add comment" to respond to an existing "Answer". The forum offers no guaranteed order to answers which can make following a conversation difficult.
Created 05-30-2018 09:24 PM
Per your suggestion, I turned on debugging in the nifi-app.log file and captured about 10 seconds of a run. Files are still not being ingested and I'm starting to ret really frustrated
Log file attachednifi-applog.zip
Created 05-31-2018 10:19 AM
Does the listFile exhibit the same behavior or does it list your file correctly?
-
The fact that the logs shows it the processor yielding tells me it found no work to do (meaning no files to list). It yields so that it does not consume not stop CPU looking for work that does not exist.
-
Did you check your properties for leading or trailing whitespace?
Did you try removing the "\" from your file filter?
-
Thanks,
Matt
Created 05-30-2018 12:59 PM
Did you try becoming the nifi user (sudo su - nifi) and try to navigate the the 2008.csv file and view it?
-
Another option would be to enable debug logging on the getFile processor to get more details on what is going on here. Doing so requires you to add a new line to the NiFi logback.xml file:
-
<logger name="org.apache.nifi.processors.standard.GetFile" level="DEBUG"/>
-
No need to restart Nifi when editing the logback.xml file (this is one of the only conf file in NiFi you can edit that will not require a restart).
-
Wait 30 seconds after adding this line. Then star tailing the nifi app log:
# tail -F ../logs/nifi-app.log
-
Go to you canvas and start or stop then start your GetFile processor. What output do you see in the nifi-app.log?
-
Thanks,
Matt
Created 05-30-2018 01:05 PM
You may also want to verify your processor configuration for both input directory and file filter to make sure you do not have any leading or trailing spaces. Spaces are treated as valid characters to NiFi which can result in NiFi not finding the file or even the directory.
Created on 05-29-2018 10:40 PM - edited 08-17-2019 08:51 PM
Your observations are correct, the processors seem to be configured fine, but the GetFile doesn't retrieve the CSV.
I ran the ps, but not sure where to find the user that owns nifi (see screenshot)
I've attached the latest log file. I don't see any WARN or ERROR logs...nifi-applog.zip
Created on 05-29-2018 11:22 PM - edited 08-17-2019 08:51 PM
Quick update:
I tried changing the owner:group to nifi:nifi based on the ps -ef | grep nifi
Still not seeing any queued FlowFiles nor am I seeing anything in hdfs