Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

NIFI : deleteHDFS

Solved Go to solution
Highlighted

NIFI : deleteHDFS

Explorer

Hi all,

Do you know how use deleteHDFS to remove empty directories ?

thanks

1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted

Re: NIFI : deleteHDFS

Master Guru
@mayki wogno

Make sure the user your NiFi is running as is authorized to delete files and directories in your target HDFS.

The DeleteHDFS processor properties are as follows:

12757-screen-shot-2017-02-21-at-81354-am.png

Thanks,

Matt

View solution in original post

11 REPLIES 11
Highlighted

Re: NIFI : deleteHDFS

Master Guru
@mayki wogno

Make sure the user your NiFi is running as is authorized to delete files and directories in your target HDFS.

The DeleteHDFS processor properties are as follows:

12757-screen-shot-2017-02-21-at-81354-am.png

Thanks,

Matt

View solution in original post

Highlighted

Re: NIFI : deleteHDFS

Master Guru

@mayki wogno

FlowFiles generated by the listHDFS processor all have a "path" attribute created on them:

12758-screen-shot-2017-02-21-at-83810-am.png

That attribute could be used to trigger you directory deletion via the DeleteHDFS processor.

What is difficult here is determining when all data has been successfully pulled from an HDFS directory before deleting the directory itself.

You could try using two DeleteHDFS processors in series with one another. The first DeleteHDFS deletes the files from the target "path" of the incoming FlowFiles and the second deletes the directory (Recursive property set to false).

Matt

Re: NIFI : deleteHDFS

Explorer

@Matt : thanks, i've already used this processor for deleleting files, but how use it with listHDFS to delete empty directories.

Highlighted

Re: NIFI : deleteHDFS

Explorer

@matt: it is partially worked but we received errors for directory non-empty

2017-02-21 15:00:28,938 WARN [Timer-Driven Process Thread-8] o.a.nifi.processors.hadoop.DeleteHDFS DeleteHDFS[id=85d330b2-6cdd-1d81-a764-460fe51ef064] Error processing delete for file or directory
org.apache.hadoop.ipc.RemoteException: `/user/ml/apply/toto/03 is non empty': Directory is not empty

Highlighted

Re: NIFI : deleteHDFS

Master Guru

That was the intent... It would only be successful after all files where deleted first. So only after the last file was removed would the directory deletion be successful.

Highlighted

Re: NIFI : deleteHDFS

Explorer

It worked but it is not clean to have warning in the log file.

Highlighted

Re: NIFI : deleteHDFS

Master Guru

@mayki wogno

You can reduce or even eliminate the WARN messages by placing a MergeContent processor between your first and second DeleteHDFS processors that merges using "path" as the value to the "Correlation Attribute Name" property. The resulting merged FlowFile(s) would still have the same "path" that would be used by the second DeleteHDFS to remove your directory.

Matt

Highlighted

Re: NIFI : deleteHDFS

Explorer

@Matt thanks for your helps.

Do you know if it is possible to move to the next flowfile and send failed flowfile to the next processor ?

I'm trying to send one of failure flowfile from deleteHDFS to RouteText but nothing goes to.

12782-deletehdfs.jpg

Highlighted

Re: NIFI : deleteHDFS

Master Guru

@mayki wogno

One thing you could do is set "FlowFile Expiration" on the connection containing the "merged" relationship. And set the "Available Prioritizers" to " Newest FlowFileFirstPrioritizer". FlowFile expiration is measured against the age of the FlowFile (from creation time to now) and not how long it has been in a particular connection. If the FlowFile age exceeds this configured value, it is purged from the queue.

Don't have an account?
Coming from Hortonworks? Activate your account here