Hi have a large numbers of xml files stores in hbase, the files containing binary data like pdf. word etc.
The column contents holds content of the xml file.
I want to replace the binary value from the xml tag DokumentFilIndhold with the value "Content Removed"
REGEXP_REPLACE(contents,"(?s)<ns0:DokumentFilIndhold[^>]*>.*?</ns0:DokumentFilIndhold>", "Content Removed")
The regular expression seems to work exactly as expected when i test it with https://regexr.com/
But when i run the query on my data it cuts of the contents. So its no longer a valid xml file.
Does the function REGEXP_REPLACE have some limitations or is it my expression that's wrong the value is up to 65000 chars.
Its Urgent for me to find a solution, so any idea will be very well recieved.