Created 01-28-2021 03:26 AM
I have created a process in Nifi to get a file from a first folder, compress it and delete the uncompressed file. I have used:
GetHDFS: to get the file, deleting it from the folder (Keep Source File is set to False)
PutHDFS: to compress the file and save in a second folder
The process seems working, in fact the file is not anymore in the first folder and the compressed file is in the second folder.
The problem is that a warning message is displayed:
Could not remove <file path> from HDFS. Not ingesting this file...
So I have the doubt that the uncompressed file is still somewhere in the HDFS, but I don't know where.
What does it mean the warning message?
Created 02-01-2021 06:27 AM
9 out of 10 times this message is caused because you run the GetHDFS on multiple nodes.
Both nodes see it, perhaps even try to pick it up, but clearly not both of these can delete it.
In old versions of NiFi you can fix this by setting the GetHDFS to run only on the primary node.
However, that will ofcourse burden the primary node more than it should.
So in recent versions (and likely yours) you will find the ListHDFS and FetchHDFS processors (and similar sets for different data sources). The lightweight List processor can then run on the primary node, and loadbalance to all nodes which will then Fetch.
Created 02-01-2021 06:27 AM
9 out of 10 times this message is caused because you run the GetHDFS on multiple nodes.
Both nodes see it, perhaps even try to pick it up, but clearly not both of these can delete it.
In old versions of NiFi you can fix this by setting the GetHDFS to run only on the primary node.
However, that will ofcourse burden the primary node more than it should.
So in recent versions (and likely yours) you will find the ListHDFS and FetchHDFS processors (and similar sets for different data sources). The lightweight List processor can then run on the primary node, and loadbalance to all nodes which will then Fetch.