Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Nifi GetHDFS Warning - Could not remove from HDFS

avatar
New Contributor

I have created a process in Nifi to get a file from a first folder, compress it and delete the uncompressed file. I have used:

GetHDFS: to get the file, deleting it from the folder (Keep Source File is set to False)

PutHDFS: to compress the file and save in a second folder

 

The process seems working, in fact the file is not anymore in the first folder and the compressed file is in the second folder.

The problem is that a warning message is displayed:

Could not remove <file path> from HDFS. Not ingesting this file...

 

So I have the doubt that the uncompressed file is still somewhere in the HDFS, but I don't know where.

What does it mean the warning message?

 

1 ACCEPTED SOLUTION

avatar

9 out of 10 times this message is caused because you run the GetHDFS on multiple nodes. 

Both nodes see it, perhaps even try to pick it up, but clearly not both of these can delete it.

 

In old versions of NiFi you can fix this by setting the GetHDFS to run only on the primary node.

 

However, that will ofcourse burden the primary node more than it should.

 

So in recent versions (and likely yours) you will find the ListHDFS and FetchHDFS processors (and similar sets for different data sources). The lightweight List processor can then run on the primary node, and loadbalance to all nodes which will then Fetch.


- Dennis Jaheruddin

If this answer helped, please mark it as 'solved' and/or if it is valuable for future readers please apply 'kudos'.

View solution in original post

1 REPLY 1

avatar

9 out of 10 times this message is caused because you run the GetHDFS on multiple nodes. 

Both nodes see it, perhaps even try to pick it up, but clearly not both of these can delete it.

 

In old versions of NiFi you can fix this by setting the GetHDFS to run only on the primary node.

 

However, that will ofcourse burden the primary node more than it should.

 

So in recent versions (and likely yours) you will find the ListHDFS and FetchHDFS processors (and similar sets for different data sources). The lightweight List processor can then run on the primary node, and loadbalance to all nodes which will then Fetch.


- Dennis Jaheruddin

If this answer helped, please mark it as 'solved' and/or if it is valuable for future readers please apply 'kudos'.