Created on 11-24-2016 04:46 PM - edited 08-19-2019 03:30 AM
I am working with the ReplaceText processor to replace only some instances of a double quote character (") in a FlowFile and I am having difficulty with my Regex syntax.
Background:
I am pulling an XML column from our database using ExecuteSQL which converts the results to Avro format. I run this through an AvroToJson processor but the JSON produced does not correctly escape double quotes found in my DB columns. I am converting to JSON because my end goal is to have the XML values in a FlowFile, line by line.
Example:
[ { "XML": "<MyXML> This is a "test" XML </MyXML>" } ]
As you can see the quotes surrounding "test" are invalid and need to be escaped to be:
[ { "XML": "<MyXML> This is a \"test\" XML </MyXML>" } ]
I am trying to achieve this with the ReplaceText Processor. Using Regex I can correctly retrieve all the text between the <MyXML> tags but I am unable to single out the double quotes for replacement.
I have attempted to use back-references to replace the value in the middle capturing group, but that does not appear to work. Am I able to achieve this or do I need to be looking at an ExecuteScript processor and attempting it with Python/Groovy?
Sample processor config:
Created on 11-24-2016 07:24 PM - edited 08-19-2019 03:30 AM
You could split it into ExtractText with a dynamic property "json" as "(.*?<MyXML>)(.*?)(<\/MyXML>.*)"
and ReplaceText as follows: "${json.1}${json.2:replace('"', '\\"')}${json.3}" (i.e. with 4 backslashes)
This will create
[ { "XML": "<MyXML> This is a \"test\" XML </MyXML>" } ]
Created on 11-24-2016 07:24 PM - edited 08-19-2019 03:30 AM
You could split it into ExtractText with a dynamic property "json" as "(.*?<MyXML>)(.*?)(<\/MyXML>.*)"
and ReplaceText as follows: "${json.1}${json.2:replace('"', '\\"')}${json.3}" (i.e. with 4 backslashes)
This will create
[ { "XML": "<MyXML> This is a \"test\" XML </MyXML>" } ]
Created 11-28-2016 03:54 PM
Thank you! I was having difficulty with the replace function. I had not thought to first use the ExtractText processor.