Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

How can we overcome known bug related to file filtering on recursive listing in ListHDFS?

How can we overcome known bug related to file filtering on recursive listing in ListHDFS?

New Contributor

How can we overcome known bug related to file filtering on recursive listing in ListHDFS?

We are trying to filter out the specific files within subdirectories found under a recursive directory listing.

We have testing the bug in NiFi 1.5 and 1.7.

We are using ListHDFS to find any file that's been added under the parentdir directory tree and filter out for *.CSV.

There is a NiFi bug that won't allow filtering within subdirectories. Has anyone been able to do this?

Goal: get the list of all *.csv files under the parentdir directory.

Example:

/parentdir/subdir1/subdir1/subdir1/file1.txt

/parentdir/subdir1/subdir1/subdir1/file2.txt

/parentdir/subdir1/subdir1/subdir1/xfile1.csv

/parentdir/subdir1/subdir1/subdir1/xfile2.csv

/parentdir/subdir1/subdir2/subdir6/file1.txt

/parentdir/subdir1/subdir2/subdir6/file2.txt

/parentdir/subdir1/subdir2/subdir6/xfile1.csv

/parentdir/subdir1/subdir2/subdir6/xfile2.csv

Please help!

1 REPLY 1
Highlighted

Re: How can we overcome known bug related to file filtering on recursive listing in ListHDFS?

Super Guru

@Kristine N

As you are using `ListHDFS` processor and ListHDFS processor adds filename attribute to the flowfile.

Use RouteOnAttribute processor after ListHDFS processor and check the filename

Add new property as

Csvfiles

${filename:substringAfterLast('.'):equlas("csv")}

Use Csvfiles relation to feed to FetchHDFS processor .

Flow:

1.ListHDFS //list all the files recursively in the directories
2.RouteOnAttribute //filter out csv files
3.FetchHDFS //fetch the csv files from HDFS

By using this method we are fetching only the required csv files from HDFS directories and filtering out all the other format files in RouteOnAttribute processor.

-

If the Answer helped to resolve your issue, Click on Accept button below to accept the answer, That would be great help to Community users to find solution quickly for these kind of issues.

Don't have an account?
Coming from Hortonworks? Activate your account here