Support Questions

Find answers, ask questions, and share your expertise

Using NiFi to load data from localFS to HDFS

avatar
Contributor

I'm trying to load a csv from a 'local FS' into HDFS using NiFi. But I can't seem to figure out what I'm doing wrong.

Based on this thread from @Matt Clarke-

https://community.hortonworks.com/questions/77019/how-to-load-data-from-local-system-file-to-hdfs-us...

I've configured the GetFile and PutHDFS procs as shown-

77405-screen-shot-2018-05-29-at-11444-am.png

And here's the file system-

I feel like there's something simple I'm missing, but I'm struggling with it nonetheless. Can anyone spot where I've gone wrong?

77407-screen-shot-2018-05-29-at-22737-am.png

77406-screen-shot-2018-05-29-at-11541-am.png

10 REPLIES 10

avatar
Contributor

NiFi UI view-

77408-screen-shot-2018-05-29-at-23430-am.png

avatar
Master Mentor
@Mike Wong

Not sure what issue you are exactly seeing here based on your screenshots.

Your dataflow consists of two processors and I see no queued FlowFiles.

Is the GetFile retrieving your 2008.csv file? Based on your configuration it should be retrieving it non stop (keep source file = true).

What user owns the running nifi software on this same local machine? (# ps - ef |grep nifi)

If you become that user, can you access the file you are trying to consume with NiFi?

What do you see in the nifi-app.log when you "start" the GetFile processor? (Any WARN or ERROR logs)

-

Thanks,

Matt

avatar
Master Mentor

@Mike Wong

*** Important Forum tip: Please try to avoid responding to an existing "Answer" by starting a new "Answer". Instead use the the "Add comment" to respond to an existing "Answer". The forum offers no guaranteed order to answers which can make following a conversation difficult.

avatar
Contributor

@Matt Clarke

Per your suggestion, I turned on debugging in the nifi-app.log file and captured about 10 seconds of a run. Files are still not being ingested and I'm starting to ret really frustrated

Log file attachednifi-applog.zip

avatar
Master Mentor

@Mike Wong

Does the listFile exhibit the same behavior or does it list your file correctly?
-
The fact that the logs shows it the processor yielding tells me it found no work to do (meaning no files to list). It yields so that it does not consume not stop CPU looking for work that does not exist.

-

Did you check your properties for leading or trailing whitespace?
Did you try removing the "\" from your file filter?

-

Thanks,

Matt

avatar
Master Mentor

@Mike Wong

Did you try becoming the nifi user (sudo su - nifi) and try to navigate the the 2008.csv file and view it?

-

Another option would be to enable debug logging on the getFile processor to get more details on what is going on here. Doing so requires you to add a new line to the NiFi logback.xml file:

-

<logger name="org.apache.nifi.processors.standard.GetFile" level="DEBUG"/>

-

No need to restart Nifi when editing the logback.xml file (this is one of the only conf file in NiFi you can edit that will not require a restart).

-

Wait 30 seconds after adding this line. Then star tailing the nifi app log:

# tail -F ../logs/nifi-app.log

-

Go to you canvas and start or stop then start your GetFile processor. What output do you see in the nifi-app.log?

-

Thanks,

Matt

avatar
Master Mentor

@Mike Wong

You may also want to verify your processor configuration for both input directory and file filter to make sure you do not have any leading or trailing spaces. Spaces are treated as valid characters to NiFi which can result in NiFi not finding the file or even the directory.

avatar
Contributor

@Matt Clarke

Your observations are correct, the processors seem to be configured fine, but the GetFile doesn't retrieve the CSV.

I ran the ps, but not sure where to find the user that owns nifi (see screenshot)

77426-screen-shot-2018-05-29-at-63216-pm.png

I've attached the latest log file. I don't see any WARN or ERROR logs...nifi-applog.zip

avatar
Contributor

@Matt Clarke

Quick update:

I tried changing the owner:group to nifi:nifi based on the ps -ef | grep nifi

Still not seeing any queued FlowFiles nor am I seeing anything in hdfs

77428-screen-shot-2018-05-29-at-72125-pm.png