- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Removing Special Characters from JSON in NIFI
- Labels:
-
Apache NiFi
Created ‎08-12-2021 07:38 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Support Team,
I have JSON input for Nifi flow with some special characters. Could someone help me with how to remove special characters following payload? we would need only value with array and double-quotes.
input json: -
{
"TOT_NET_AMT" : "[\"55.00\"]",
"H_OBJECT" : "File",
"H_GROSS_AMNT" : "[\"55.00\"]",
"TOT_TAX_AMT" : "[9.55]"
}
Expect Result :
{
"TOT_NET_AMT" : "55.00",
"H_OBJECT" : "File",
"H_GROSS_AMNT" : "55.00",
"TOT_TAX_AMT" : "9.55"
}
Created ‎08-13-2021 07:33 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@smartraman
You can use a ReplaceText processor to remove these special characters from your json.
Using your example, I could produce yoru desired output using the following java regular expression:
(\\")|[\Q[\E]|[\Q]\E]
Source:
My ReplaceText processor was configured as follows:
result:
If you found this response addressed your query, please take a moment to login and click on "Accept as Solution".
Thank you,
Matt
Created ‎08-24-2021 10:59 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@smartraman
This can also be accomplished through a different and more complex configuration of the ReplaceText processor:
Using below input content example:
{
"TOT_NET_AMT" : "[\"55.00\"]",
"H_OBJECT" : "File",
"H_GROSS_AMNT" : "[\"55.00,58.00\"]",
"TOT_TAX_AMT" : "[9.55]"
}
I would set up the replaceText processor as follows:
Instead of just searching for those character patterns and replacing them with nothing, I break entire input line-by-line in to a series of capture groups. That way I can omit the capture groups matching the patterns you want removed ([ or [\" or \"] or ]) and then manipulate the capture group containing a possible comma separated list, so that only the last value in that list is returned.
I used below java regular expression which results in 5 capture groups:
(.*?)([\Q[\E]\\\"|[\Q[\E])(.*?)(\\\"[\Q]\E]|[\Q]\E])(.*?)$
I then used the following Replacement Value in which I used NiFi expression language against the 3rd capture group. If that capture group does not contain any commas, the entire string is returned.
With example above and this configuration, you end up with the following new content:
{
"TOT_NET_AMT" : "55.00",
"H_OBJECT" : "File",
"H_GROSS_AMNT" : "58.00",
"TOT_TAX_AMT" : "9.55"
}
If you found this helped with your latest query, please take a moment to login and click on "Accept as Solution" below this response.
Thank you,
Matt
Created ‎08-13-2021 07:33 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@smartraman
You can use a ReplaceText processor to remove these special characters from your json.
Using your example, I could produce yoru desired output using the following java regular expression:
(\\")|[\Q[\E]|[\Q]\E]
Source:
My ReplaceText processor was configured as follows:
result:
If you found this response addressed your query, please take a moment to login and click on "Accept as Solution".
Thank you,
Matt
Created on ‎08-16-2021 07:44 PM - edited ‎08-16-2021 08:10 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Support,
Thanks for the previous help. but some time input payloads have multiple entries in this list format after replacing regex the below output came. but we are expecting H_GROSS_AMNT following the below details.
we would need the last value of the array or string.
{
"TOT_NET_AMT" : "55.00",
"H_OBJECT" : "File",
"H_GROSS_AMNT" : "55.00,58.00",
"TOT_TAX_AMT" : "9.55"
}
expected value -
{
"TOT_NET_AMT" : "55.00",
"H_OBJECT" : "File",
"H_GROSS_AMNT" : "58.00",
"TOT_TAX_AMT" : "9.55"
}
Much appreciate it in advance.
Created ‎08-24-2021 10:59 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@smartraman
This can also be accomplished through a different and more complex configuration of the ReplaceText processor:
Using below input content example:
{
"TOT_NET_AMT" : "[\"55.00\"]",
"H_OBJECT" : "File",
"H_GROSS_AMNT" : "[\"55.00,58.00\"]",
"TOT_TAX_AMT" : "[9.55]"
}
I would set up the replaceText processor as follows:
Instead of just searching for those character patterns and replacing them with nothing, I break entire input line-by-line in to a series of capture groups. That way I can omit the capture groups matching the patterns you want removed ([ or [\" or \"] or ]) and then manipulate the capture group containing a possible comma separated list, so that only the last value in that list is returned.
I used below java regular expression which results in 5 capture groups:
(.*?)([\Q[\E]\\\"|[\Q[\E])(.*?)(\\\"[\Q]\E]|[\Q]\E])(.*?)$
I then used the following Replacement Value in which I used NiFi expression language against the 3rd capture group. If that capture group does not contain any commas, the entire string is returned.
With example above and this configuration, you end up with the following new content:
{
"TOT_NET_AMT" : "55.00",
"H_OBJECT" : "File",
"H_GROSS_AMNT" : "58.00",
"TOT_TAX_AMT" : "9.55"
}
If you found this helped with your latest query, please take a moment to login and click on "Accept as Solution" below this response.
Thank you,
Matt
