Support Questions
Find answers, ask questions, and share your expertise

ReplaceText Regex to replace double quotes in a string

Solved Go to solution
Highlighted

ReplaceText Regex to replace double quotes in a string

New Contributor

I am working with the ReplaceText processor to replace only some instances of a double quote character (") in a FlowFile and I am having difficulty with my Regex syntax.

Background:

I am pulling an XML column from our database using ExecuteSQL which converts the results to Avro format. I run this through an AvroToJson processor but the JSON produced does not correctly escape double quotes found in my DB columns. I am converting to JSON because my end goal is to have the XML values in a FlowFile, line by line.

Example:

[ { "XML": "<MyXML> This is a "test" XML </MyXML>" } ]

As you can see the quotes surrounding "test" are invalid and need to be escaped to be:

[ { "XML": "<MyXML> This is a \"test\" XML </MyXML>" } ]

I am trying to achieve this with the ReplaceText Processor. Using Regex I can correctly retrieve all the text between the <MyXML> tags but I am unable to single out the double quotes for replacement.

I have attempted to use back-references to replace the value in the middle capturing group, but that does not appear to work. Am I able to achieve this or do I need to be looking at an ExecuteScript processor and attempting it with Python/Groovy?

Sample processor config:

9760-screen-shot-2016-11-24-at-44503-pm.png

1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted

Re: ReplaceText Regex to replace double quotes in a string

You could split it into ExtractText with a dynamic property "json" as "(.*?<MyXML>)(.*?)(<\/MyXML>.*)"

9771-extracttext.png

and ReplaceText as follows: "${json.1}${json.2:replace('"', '\\"')}${json.3}" (i.e. with 4 backslashes)

9772-replacetext.png

This will create

[ { "XML": "<MyXML> This is a \"test\" XML </MyXML>" } ]

View solution in original post

2 REPLIES 2
Highlighted

Re: ReplaceText Regex to replace double quotes in a string

You could split it into ExtractText with a dynamic property "json" as "(.*?<MyXML>)(.*?)(<\/MyXML>.*)"

9771-extracttext.png

and ReplaceText as follows: "${json.1}${json.2:replace('"', '\\"')}${json.3}" (i.e. with 4 backslashes)

9772-replacetext.png

This will create

[ { "XML": "<MyXML> This is a \"test\" XML </MyXML>" } ]

View solution in original post

Highlighted

Re: ReplaceText Regex to replace double quotes in a string

New Contributor

Thank you! I was having difficulty with the replace function. I had not thought to first use the ExtractText processor.