Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

ReplaceText Regex to replace double quotes in a string

avatar
New Member

I am working with the ReplaceText processor to replace only some instances of a double quote character (") in a FlowFile and I am having difficulty with my Regex syntax.

Background:

I am pulling an XML column from our database using ExecuteSQL which converts the results to Avro format. I run this through an AvroToJson processor but the JSON produced does not correctly escape double quotes found in my DB columns. I am converting to JSON because my end goal is to have the XML values in a FlowFile, line by line.

Example:

[ { "XML": "<MyXML> This is a "test" XML </MyXML>" } ]

As you can see the quotes surrounding "test" are invalid and need to be escaped to be:

[ { "XML": "<MyXML> This is a \"test\" XML </MyXML>" } ]

I am trying to achieve this with the ReplaceText Processor. Using Regex I can correctly retrieve all the text between the <MyXML> tags but I am unable to single out the double quotes for replacement.

I have attempted to use back-references to replace the value in the middle capturing group, but that does not appear to work. Am I able to achieve this or do I need to be looking at an ExecuteScript processor and attempting it with Python/Groovy?

Sample processor config:

9760-screen-shot-2016-11-24-at-44503-pm.png

1 ACCEPTED SOLUTION

avatar

You could split it into ExtractText with a dynamic property "json" as "(.*?<MyXML>)(.*?)(<\/MyXML>.*)"

9771-extracttext.png

and ReplaceText as follows: "${json.1}${json.2:replace('"', '\\"')}${json.3}" (i.e. with 4 backslashes)

9772-replacetext.png

This will create

[ { "XML": "<MyXML> This is a \"test\" XML </MyXML>" } ]

View solution in original post

2 REPLIES 2

avatar

You could split it into ExtractText with a dynamic property "json" as "(.*?<MyXML>)(.*?)(<\/MyXML>.*)"

9771-extracttext.png

and ReplaceText as follows: "${json.1}${json.2:replace('"', '\\"')}${json.3}" (i.e. with 4 backslashes)

9772-replacetext.png

This will create

[ { "XML": "<MyXML> This is a \"test\" XML </MyXML>" } ]

avatar
New Member

Thank you! I was having difficulty with the replace function. I had not thought to first use the ExtractText processor.