Do you know how use deleteHDFS to remove empty directories ?
FlowFiles generated by the listHDFS processor all have a "path" attribute created on them:
That attribute could be used to trigger you directory deletion via the DeleteHDFS processor.
What is difficult here is determining when all data has been successfully pulled from an HDFS directory before deleting the directory itself.
You could try using two DeleteHDFS processors in series with one another. The first DeleteHDFS deletes the files from the target "path" of the incoming FlowFiles and the second deletes the directory (Recursive property set to false).
@matt: it is partially worked but we received errors for directory non-empty
2017-02-21 15:00:28,938 WARN [Timer-Driven Process Thread-8] o.a.nifi.processors.hadoop.DeleteHDFS DeleteHDFS[id=85d330b2-6cdd-1d81-a764-460fe51ef064] Error processing delete for file or directory org.apache.hadoop.ipc.RemoteException: `/user/ml/apply/toto/03 is non empty': Directory is not empty
That was the intent... It would only be successful after all files where deleted first. So only after the last file was removed would the directory deletion be successful.
You can reduce or even eliminate the WARN messages by placing a MergeContent processor between your first and second DeleteHDFS processors that merges using "path" as the value to the "Correlation Attribute Name" property. The resulting merged FlowFile(s) would still have the same "path" that would be used by the second DeleteHDFS to remove your directory.
@Matt thanks for your helps.
Do you know if it is possible to move to the next flowfile and send failed flowfile to the next processor ?
I'm trying to send one of failure flowfile from deleteHDFS to RouteText but nothing goes to.
One thing you could do is set "FlowFile Expiration" on the connection containing the "merged" relationship. And set the "Available Prioritizers" to " Newest FlowFileFirstPrioritizer". FlowFile expiration is measured against the age of the FlowFile (from creation time to now) and not how long it has been in a particular connection. If the FlowFile age exceeds this configured value, it is purged from the queue.