Support Questions

d_muldoon · ‎11-24-2016

I am working with the ReplaceText processor to replace only some instances of a double quote character (") in a FlowFile and I am having difficulty with my Regex syntax.

Background:

I am pulling an XML column from our database using ExecuteSQL which converts the results to Avro format. I run this through an AvroToJson processor but the JSON produced does not correctly escape double quotes found in my DB columns. I am converting to JSON because my end goal is to have the XML values in a FlowFile, line by line.

Example:

[ { "XML": "<MyXML> This is a "test" XML </MyXML>" } ]

As you can see the quotes surrounding "test" are invalid and need to be escaped to be:

[ { "XML": "<MyXML> This is a \"test\" XML </MyXML>" } ]

I am trying to achieve this with the ReplaceText Processor. Using Regex I can correctly retrieve all the text between the <MyXML> tags but I am unable to single out the double quotes for replacement.

I have attempted to use back-references to replace the value in the middle capturing group, but that does not appear to work. Am I able to achieve this or do I need to be looking at an ExecuteScript processor and attempting it with Python/Groovy?

Sample processor config:

bwalter1 · ‎11-24-2016

You could split it into ExtractText with a dynamic property "json" as "(.*?<MyXML>)(.*?)(<\/MyXML>.*)"

and ReplaceText as follows: "${json.1}${json.2:replace('"', '\\"')}${json.3}" (i.e. with 4 backslashes)

This will create

[ { "XML": "<MyXML> This is a \"test\" XML </MyXML>" } ]

View solution in original post

bwalter1 · ‎11-24-2016

You could split it into ExtractText with a dynamic property "json" as "(.*?<MyXML>)(.*?)(<\/MyXML>.*)"

and ReplaceText as follows: "${json.1}${json.2:replace('"', '\\"')}${json.3}" (i.e. with 4 backslashes)

This will create

[ { "XML": "<MyXML> This is a \"test\" XML </MyXML>" } ]

d_muldoon · ‎11-28-2016

Thank you! I was having difficulty with the replace function. I had not thought to first use the ExtractText processor.

Cloudera Community

Support Questions

ReplaceText Regex to replace double quotes in a string