Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Nifi-fixed format file parsing based on recordtype and routing

Highlighted

Nifi-fixed format file parsing based on recordtype and routing

New Contributor

Hi,

Relatively new to Nifi, trying the following scenario

Fixed width File has multiple record types in the format 

Header1
Row1
Row N
Trailer1
Header2
Row1
Row N
Trailer2
Header3
Row1
Row N
Trailer3
Header4
Row1
Row N
Trailer4

File sample looks like this

1AXXXXXXX2019269          
AAA121212 XXXX DDDDD SSSSSS
SSSS12233 ASAS AWWWW SSSSSS
9000000002          
1BAXXXXXXX2019269
AAA121212 XXXX DDDDD SSSSSS
SSSS12233 ASAS AWWWW SSSSSS
AAA121212 XXXX DDDDD SSSSSS
SSSS12233 ASAS AWWWW SSSSSS
9000000004 
1CAXXXXXXX2019269
AAA121212 XXXX DDDDD SSSSSS
SSSS12233 ASAS AWWWW SSSSSS
9000000002 
1DAXXXXXXX2019269
9000000000

Trying to identify each of the record types and route then accordingly using regex.

ex: for the first record type using the following regex to extract the header till the trailer record.

/(1A)([\S\s\w\d])*?(9[0-9]{9})/

Outcome expected:

1AXXXXXXX2019269         
AAA121212 XXXX DDDDD SSSSSS
SSSS12233 ASAS AWWWW SSSSSS

9000000002

This works when tested in regexr.com but in nifi it ends up pushing all the records into this route and not just this recordtype.

Tried multiple variations of the regex but still nor getting the expected outcome from nifi.

Any suggestions? 

 

1 REPLY 1

Nifi- Regex not filtering as desired

New Contributor

Hi,

Trying the extract multiple record types from fixed width File.

File format is the following  

Header1
Row1
Row N
Trailer1
Header2
Row1
Row N
Trailer2
Header3
Row1
Row N
Trailer3
Header4
Row1
Row N
Trailer4

File sample looks like this

1AXXXXXXX2019269          
AAA121212 XXXX DDDDD SSSSSS
SSSS12233 ASAS AWWWW SSSSSS
9000000002          
1BAXXXXXXX2019269
AAA121212 XXXX DDDDD SSSSSS
SSSS12233 ASAS AWWWW SSSSSS
AAA121212 XXXX DDDDD SSSSSS
SSSS12233 ASAS AWWWW SSSSSS
9000000004 
1CAXXXXXXX2019269
AAA121212 XXXX DDDDD SSSSSS
SSSS12233 ASAS AWWWW SSSSSS
9000000002 
1DAXXXXXXX2019269
9000000000

Trying to identify each of the record types and route then accordingly using regex.

ex: for the first record type using the following regex to extract the header till the trailer record.

/(1A)([\S\s\w\d])*?(9[0-9]{9})/

Outcome expected:

1AXXXXXXX2019269          
AAA121212 XXXX DDDDD SSSSSS
SSSS12233 ASAS AWWWW SSSSSS

9000000002

This works when tested in regexr.com but in nifi it ends up pushing all the records into only one routeoverall flowoverall flowproperties for  adding sample dataproperties for adding sample datasample datasample dataregex based routingregex based routing.

Tried multiple variations of the regex but still nor getting the expected outcome from nifi.

Any suggestions on how to make this regex work?

 

Don't have an account?
Coming from Hortonworks? Activate your account here