- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
nifi regex replace special characters
- Labels:
-
Apache NiFi
Created ‎08-15-2016 11:40 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi
I have a json file which has some special characters like "$" and "@" symbol. I would like to get rid of these characters while keeping everything else the way it is. So, for example I have "$type". This should become "type" or "@version" should become "version".
The way I am currently doing it is using replaceText processor twice and using literal replace. It works and solves my problem. However, I would prefer to use a regex. I have tried \$ but that doesn't work because the string has lot more than just \$ symbol. I am very bad with regular expressions so I need help with figuring out the regex to solve this.
Created ‎08-16-2016 02:29 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
you have tons of options. for example
[^0-9a-zA-Z]+ white list characters
or
[^\w\d] which means matches any non-alphanumeric characters
and many other ways. the first one works for me to remove special characters.
Created ‎08-16-2016 04:05 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks @Sunile Manjee. This helps. I was wondering how to make sure in my replaceText processor to not replace everything that matched with empty string. But apparently it takes care of this but it also gets rid of curly braces as well as double quotes that are part of the json. I would like to keep those. I will update here once I figure out.
Created ‎08-16-2016 04:42 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@mqureshi to keep the curly braces use this
[^0-9a-zA-Z{}]+
Created ‎08-17-2016 03:27 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@mqureshi did this answer your question?
Created ‎08-16-2016 04:00 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I assume that you ask about Java regex. There are various flavors based on the language, e.g. Java, C#, VB etc.
str.replace(/[$]/g,"")
or
str.replace(/[$@]/g,"") if you want to have one pass at both. I assume that you want all of those characters replaced at once, as such you could use str.replaceAll
Keep in mind that $ is also a special character in regex. Matches end of line. That is if you want to handle other scenarios where there could be some ambiguousity between $ as a character and the end of the line. Use an escape character to indicate that you really mean $.
A good testing tool for your patterns: http://regexr.com/
Created ‎08-17-2016 06:26 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Created ‎11-11-2020 01:20 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You can try this ${message:unescapeXml()}
This function unescapes a string containing XML entity escapes to a string containing the actual Unicode characters corresponding to the escapes.
