- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Replace Text using Regex with large file in NIFI
- Labels:
-
Apache NiFi
Created ‎02-15-2017 08:28 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I'm using HDP 2.5 with Nifi 0.6 version. I have 10GB CSV file, i want replace double quotes with pipe delimiter. Is it possible on NIFI Replace Text processor? or I need to write any external script like Groovy,Luna. Please tell me any reference links, how to implement either scripting or replace text processor for large file.
Created on ‎02-15-2017 01:14 PM - edited ‎08-19-2019 04:54 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The ReplaceText processor has two Evaluation Modes (Line-by-Line and Entire text). Entire Text is the default which reads the entire contents of your FLowFile in to NIFI's JVM heap memory for evaluation. With such a large file this strategy is not ideal and could lead to out of memory conditions for your NiFi. If the content of your FlowFile is multiple lines, you could switch to using the Line-by-Line evaluation mode which will result in less heap memory usage but ultimately the same resulting modified content in the outgoing FlowFile.
So you might want to try a ReplaceText processor configuration like the following:
Thanks,
Matt
Created on ‎02-15-2017 01:14 PM - edited ‎08-19-2019 04:54 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The ReplaceText processor has two Evaluation Modes (Line-by-Line and Entire text). Entire Text is the default which reads the entire contents of your FLowFile in to NIFI's JVM heap memory for evaluation. With such a large file this strategy is not ideal and could lead to out of memory conditions for your NiFi. If the content of your FlowFile is multiple lines, you could switch to using the Line-by-Line evaluation mode which will result in less heap memory usage but ultimately the same resulting modified content in the outgoing FlowFile.
So you might want to try a ReplaceText processor configuration like the following:
Thanks,
Matt
Created ‎02-15-2017 04:35 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for response. I will try this.I have regex like 's/","/\|/g; s/"//g', How do i use in this replace text processor.
Created ‎02-16-2017 04:28 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
It works!!!!
Created ‎02-16-2017 11:53 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
Anshu here.
We have a requirement to anonymize IP address. So we would identify IP address and replace the last part with some arbitrary value.
We have tried using the following regex for that (with "Regex Replace" for replacement strategy, "Line-by-Line" as evaluation mode)
(1) \\d{1,3}\\.\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}
(2) ^(([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])\.){3}([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])$
(3) ((25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.){3}(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)
None of this works.
Can you help me with the right value of regex we can use and the replacement value as well.
Thank you in advance!
Created ‎02-16-2017 02:40 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Please open a new question for this issue rather then attaching to a question with an existing answer. While dealing with the same processor, the problem is different and it benefits the community to have it addressed under its own question.
Thanks,
Matt
Created ‎02-16-2017 02:48 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Tip: Make sure you tag your question with NiFi, so that the community of NiFi followers are notified about your new question.
Created ‎02-16-2017 03:02 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks! I will do that. Regret for any inconvenience.
Created ‎02-16-2017 03:04 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
No inconvenience at all. We just want to keep it as easy as possible for community members to find similar issues and solutions. I would have moved it myself if I couldhave. 🙂
