Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

NIFI Scan attribute using dictionary

NIFI Scan attribute using dictionary

New Contributor

Hi,

I have been searching but the information by way of examples is a little thin on the ground (for me anyway) to make much headway.
Simply put I have a flow that decompresses a zipped file (tar.gz) - actually double compressed as this is the format i receive it in.
Sometime the file is corrupt which I can detect by following the failure path. However, I wish to put the failures into a 'Failure' folder for checking at a later time. The problem is some file in the flow are regular zip files and i want to ignore those.
So I am using scan attribute processor and using regular expressions to match the discitionary.

the dictionary text file just contains: /.*.tar.gz/

(do I need the back slashes here?)

The properties are set:

Dictionary File: D:\NIFI\nifi-1.6.0\Dictionary\Matches.txt
Attribute Pattern: .*
Match Criteria: At least 1 must match
Dictionary Filter Pattern: empty string set

Is there something I am missing here?
I have tried varies changes to the above but still not getting a match!

Thanks,

Paul

4 REPLIES 4

Re: NIFI Scan attribute using dictionary

New Contributor

Ah yes, and the flow file name is: 837c9563-a138-446a-bb86-29763e60b95eTestPrint.tar.gz
Where 837c9563-a138-446a-bb86-29763e60b95e is the UUID - added to make the flow file ID unique. Necessary for the bigger picture I am working on.

Highlighted

Re: NIFI Scan attribute using dictionary

Super Guru
@Paul Burger

Scan Attribute processor matches exactly the content of Dictionary File depending on Match Criteria value i.e as your dictionary file having /.*.tar.gz then if your file is exactly /.*.tar.gz ,it goes to matched relationship.

If you want to filter out the filenames that are having .tar.gz then use RouteOnAttribute processor and add new property to check the files

${filename:contains(".tar.gz")} //look for is there .tar.gz in filename value
${filename:substringAfter("."):equals("tar.gz")} //get the filename value after . and match with tar.gz

By using either of the above expression languages we are checking for .tar.gz in filename value and the matching flowfiles will routes to the newly added property.

if you want to negate to the above expression then use :not() function

For more details refer to this link regarding nifi expression language.

Re: NIFI Scan attribute using dictionary

New Contributor

ok, so couldn't get the solution to work using the scan attribute processor with the dictionary even if I put the exact file name in the dictionary! I removed UUID for test purposes so was left with just TestPrint.tar.gz,
However, the RoueOnAttribute processor worked for my purposes nicely so will go with that and tackle dictionaries at some point in the future.

Paul

Re: NIFI Scan attribute using dictionary

New Contributor

Hi @Paul Burger Please ping me if you want me to connect you with some of the engineers. I am responsible for Nokia and would be very happy to try and help out. Kind regards, Andy agriffin@hortonworks.com