Support Questions
Find answers, ask questions, and share your expertise

nifi regex replace special characters

Highlighted

nifi regex replace special characters

Super Guru

Hi

I have a json file which has some special characters like "$" and "@" symbol. I would like to get rid of these characters while keeping everything else the way it is. So, for example I have "$type". This should become "type" or "@version" should become "version".

The way I am currently doing it is using replaceText processor twice and using literal replace. It works and solves my problem. However, I would prefer to use a regex. I have tried \$ but that doesn't work because the string has lot more than just \$ symbol. I am very bad with regular expressions so I need help with figuring out the regex to solve this.

7 REPLIES 7
Highlighted

Re: nifi regex replace special characters

Super Guru

@mqureshi

you have tons of options. for example

[^0-9a-zA-Z]+ white list characters

or

[^\w\d] which means matches any non-alphanumeric characters

and many other ways. the first one works for me to remove special characters.

Highlighted

Re: nifi regex replace special characters

Super Guru

Thanks @Sunile Manjee. This helps. I was wondering how to make sure in my replaceText processor to not replace everything that matched with empty string. But apparently it takes care of this but it also gets rid of curly braces as well as double quotes that are part of the json. I would like to keep those. I will update here once I figure out.

Highlighted

Re: nifi regex replace special characters

Super Guru

@mqureshi to keep the curly braces use this

[^0-9a-zA-Z{}]+

Highlighted

Re: nifi regex replace special characters

Super Guru

@mqureshi did this answer your question?

Highlighted

Re: nifi regex replace special characters

@mqureshi

I assume that you ask about Java regex. There are various flavors based on the language, e.g. Java, C#, VB etc.

str.replace(/[@]/g,"")

str.replace(/[$]/g,"")

or

str.replace(/[$@]/g,"") if you want to have one pass at both. I assume that you want all of those characters replaced at once, as such you could use str.replaceAll

Keep in mind that $ is also a special character in regex. Matches end of line. That is if you want to handle other scenarios where there could be some ambiguousity between $ as a character and the end of the line. Use an escape character to indicate that you really mean $.

A good testing tool for your patterns: http://regexr.com/

Highlighted

Re: nifi regex replace special characters

@mqureshi

Your option is to either replace the unwanted characters as I specified above, or keep the wanted characters as Sunile provided in his answer. The first gives you the option to replace the unwanted with wanted characters, the later allows you to keep only wanted from the existent.

Highlighted

Re: nifi regex replace special characters

Explorer

You can try this ${message:unescapeXml()}

This function unescapes a string containing XML entity escapes to a string containing the actual Unicode characters corresponding to the escapes.