Support Questions

Find answers, ask questions, and share your expertise

How to remove ETX character from json value ?

Explorer

Hello,

I could not split a JSON due to a special ETX character existing in a JSON value, SplitJson processor returns an error that it is not a Valid JSON.

The JSON looks like that:
[ {"key_data":"val_data"},

 {"key_data":"val_ETXdata"},

 {"key_data":"val_data"},

 {"key_data":"val_data"},... ]

 

Ghilani_1-1668002267091.png

I used The next regex expression to replace it but it doesn't work:

\x03(?=[^"]*")
also
[\x03](?=[^"]*")
 
can someone show me a trick to remove it please, I would appreciate it.

 

 

 

10 REPLIES 10

Super Collaborator

Hi,

I think you are having an issue because you have carriage return (\r\n) in the json , try using regex replace for the following as well: [\r\n]

 

Hope that helps, if it does please accept solution.

Thanks

Explorer

No, it is not the problem, the problem is due to the EXT symbol, \n\r\t\" am filtering them in other processors, Am filtering more than 10000 JSON files the error happens only with 4 files that contain ETX.

Super Collaborator

can you send me a sample json data with the error. The one you posted seems to be valid and Im able to split it .

Explorer

share please your email

Super Collaborator

Hi,

I tried the follow pattern on the sent file  and it worked: [\x03]+

 

 

SAMSAL_1-1668097298495.png

 

 

 

Explorer

I tried both solutions
I got this error

Ghilani_0-1668098136435.png

Failed to process session due to null; Processor Administratively Yielded for 1 sec: java.nio.BufferOverflowException

 

Super Collaborator

can you share the configuration for the ReplaceText Processor? Also how big is the jsonfile?

Explorer

Ghilani_0-1668108638985.png

 properties are the same as in the picture you have shared.

Super Collaborator

Not sure how big is your Json and if its well formatted into multiple line. Make sure you have the Evaluation Mode is set to Line-by-Line , also you can increase the Maximum Buffer Size incase the text processed is greater than 1MB. Also what version of Nifi are you using ? there seems to be a bug around that as  well where the flowfile will remain in the upstream queue and the overflow error is thrown: https://issues.apache.org/jira/browse/NIFI-10154

Explorer

I think everything works fine, the json flowfiles are the response of an invokeHttp request to an API, I receive more than 5000 flowfiles, each flowfile contains 200 records.
Problems happen only with 4 files that contain the ETX symbol.