Support Questions

Find answers, ask questions, and share your expertise

How to process corrupted CSV data with NiFi

avatar
New Contributor

My NiFi flow fails when encountering a CSV with a column containing a double quote within a string, such as:
"Protection from Abuse Order on file against dad Raul Martinez Lopez." NO CONTACT WITH DAD. 12/16/2014 kb.
The error is occurring at the Record Reader stage. Has anyone else successfully handled CSV data with embedded double quotes?
From csv:

DuyChan_0-1732787817821.png

My record reader config

DuyChan_1-1732787893026.png

 

 

1 ACCEPTED SOLUTION

avatar
Super Guru

Hi,

First, if the data you have posted contain real personal info I would recommend to remove and use some dummy data instead. Its violation of community guidelines to post personal information (see point 7 of community guidelines).

In regards to the error: you are getting it because of the property setting Quote Character = "  in the CSVReader service. What this setting means is that when you have sentence that has once of the reserved CSV characters  like comma (,) as column separator and  new line (\n) to separate records  where you dont\cant use the escape character (\), then you can surround the whole column value with double quotes at both ends. This means you should not have any following character for the same column. For more info please refer tohttps://csv-loader.com/csv-guide/why-quotation-marks-are-used-in-csv

Since the line you have listed has following characters after the closing " , you are getting   the illegal character error.

To Resolve:

You have two options:

1- Use Replace Text to replace any double quote " character with \" to escape the double quote. However this might not be so efficient if you have large CSV file.

SAMSAL_0-1732802227968.png

2- More efficient option, is to replace the Quote Character in the CSVReader with something other than " , however you have to make sure that your data is not going to contain the new character in any of the CSV values. Possible options: $,%,^

If this helps please accept the solution.

Thanks

 

View solution in original post

2 REPLIES 2

avatar
Super Guru

Hi,

First, if the data you have posted contain real personal info I would recommend to remove and use some dummy data instead. Its violation of community guidelines to post personal information (see point 7 of community guidelines).

In regards to the error: you are getting it because of the property setting Quote Character = "  in the CSVReader service. What this setting means is that when you have sentence that has once of the reserved CSV characters  like comma (,) as column separator and  new line (\n) to separate records  where you dont\cant use the escape character (\), then you can surround the whole column value with double quotes at both ends. This means you should not have any following character for the same column. For more info please refer tohttps://csv-loader.com/csv-guide/why-quotation-marks-are-used-in-csv

Since the line you have listed has following characters after the closing " , you are getting   the illegal character error.

To Resolve:

You have two options:

1- Use Replace Text to replace any double quote " character with \" to escape the double quote. However this might not be so efficient if you have large CSV file.

SAMSAL_0-1732802227968.png

2- More efficient option, is to replace the Quote Character in the CSVReader with something other than " , however you have to make sure that your data is not going to contain the new character in any of the CSV values. Possible options: $,%,^

If this helps please accept the solution.

Thanks

 

avatar
New Contributor

I tried to delete the data you mentioned, but I don't know how to edit the topic. Thank you very much for your support.