Support Questions

Find answers, ask questions, and share your expertise

nifi regex replace special characters

avatar
Super Guru

Hi

I have a json file which has some special characters like "$" and "@" symbol. I would like to get rid of these characters while keeping everything else the way it is. So, for example I have "$type". This should become "type" or "@version" should become "version".

The way I am currently doing it is using replaceText processor twice and using literal replace. It works and solves my problem. However, I would prefer to use a regex. I have tried \$ but that doesn't work because the string has lot more than just \$ symbol. I am very bad with regular expressions so I need help with figuring out the regex to solve this.

7 REPLIES 7

avatar
Master Guru

@mqureshi

you have tons of options. for example

[^0-9a-zA-Z]+ white list characters

or

[^\w\d] which means matches any non-alphanumeric characters

and many other ways. the first one works for me to remove special characters.

avatar
Super Guru

Thanks @Sunile Manjee. This helps. I was wondering how to make sure in my replaceText processor to not replace everything that matched with empty string. But apparently it takes care of this but it also gets rid of curly braces as well as double quotes that are part of the json. I would like to keep those. I will update here once I figure out.

avatar
Master Guru

@mqureshi to keep the curly braces use this

[^0-9a-zA-Z{}]+

avatar
Master Guru

@mqureshi did this answer your question?

avatar
Super Guru

@mqureshi

I assume that you ask about Java regex. There are various flavors based on the language, e.g. Java, C#, VB etc.

str.replace(/[@]/g,"")

str.replace(/[$]/g,"")

or

str.replace(/[$@]/g,"") if you want to have one pass at both. I assume that you want all of those characters replaced at once, as such you could use str.replaceAll

Keep in mind that $ is also a special character in regex. Matches end of line. That is if you want to handle other scenarios where there could be some ambiguousity between $ as a character and the end of the line. Use an escape character to indicate that you really mean $.

A good testing tool for your patterns: http://regexr.com/

avatar
Super Guru

@mqureshi

Your option is to either replace the unwanted characters as I specified above, or keep the wanted characters as Sunile provided in his answer. The first gives you the option to replace the unwanted with wanted characters, the later allows you to keep only wanted from the existent.

avatar
Explorer

You can try this ${message:unescapeXml()}

This function unescapes a string containing XML entity escapes to a string containing the actual Unicode characters corresponding to the escapes.