Support Questions

Find answers, ask questions, and share your expertise

Using ExtractText for filtering rows with regular expression does not work

avatar
Contributor

I have create a data pipeline that fetches an excel file and copies it into a table in a SQL Server. I have used ExtractText processor to filter rows by using a regular expression, as from the post at https://community.cloudera.com/t5/Community-Articles/NiFi-ETL-Removing-columns-filtering-rows-changi...

 

I have validated the used regular expression at https://www.freeformatter.com/java-regex-tester.html#ad-output

 

However, the filter produces no match and no data is written in the SQL Server.

 

By omitting the filter, the flow works fine so the problem is clearly in the ExtractText processor.

 

I have searched for possible hints in this and other forums with no success. Any idea what might be wrong?

 

Below is my complete configuration, the used source data and the configured SQL table.

 

Thanks a lot in advance!

 

Flow:

 

1_flow.png

 

GetFile processor:

 

2_get_file.png

 

ConvertExcelToCSVProcessor:

 

3_convert_to_CSV.png

 

SplitText processor:

 

4_split_rows.png

 

ExtractText processor:

 

5_filter_rows.png

 

PutDatabaseRecord processor:

 

6_put_DB_record.png

 

DBCPConnectionPool controller service:

 

7_DB_Connector.png

 

CSVReader controller service:

 

8_CSV_reader.png

Source data

 

9_source_data.png

 

Target table:

 

10_target_table.png

1 REPLY 1

avatar
Contributor

I have managed to solve the issue by using the RouteText processor.

 

I have added a propriety containing a regular expression that matches the rows I want to omit and I have configured the corresponding relationship to be terminated locally.

 

11_Route_Text.png

 

12_Route_Text.png

 

Still, it would be nice to know how to implement the same functionality by using the ExtractText processor.