Created 02-15-2017 08:28 AM
Hi,
I'm using HDP 2.5 with Nifi 0.6 version. I have 10GB CSV file, i want replace double quotes with pipe delimiter. Is it possible on NIFI Replace Text processor? or I need to write any external script like Groovy,Luna. Please tell me any reference links, how to implement either scripting or replace text processor for large file.
Created on 02-15-2017 01:14 PM - edited 08-19-2019 04:54 AM
The ReplaceText processor has two Evaluation Modes (Line-by-Line and Entire text). Entire Text is the default which reads the entire contents of your FLowFile in to NIFI's JVM heap memory for evaluation. With such a large file this strategy is not ideal and could lead to out of memory conditions for your NiFi. If the content of your FlowFile is multiple lines, you could switch to using the Line-by-Line evaluation mode which will result in less heap memory usage but ultimately the same resulting modified content in the outgoing FlowFile.
So you might want to try a ReplaceText processor configuration like the following:
Thanks,
Matt
Created on 02-15-2017 01:14 PM - edited 08-19-2019 04:54 AM
The ReplaceText processor has two Evaluation Modes (Line-by-Line and Entire text). Entire Text is the default which reads the entire contents of your FLowFile in to NIFI's JVM heap memory for evaluation. With such a large file this strategy is not ideal and could lead to out of memory conditions for your NiFi. If the content of your FlowFile is multiple lines, you could switch to using the Line-by-Line evaluation mode which will result in less heap memory usage but ultimately the same resulting modified content in the outgoing FlowFile.
So you might want to try a ReplaceText processor configuration like the following:
Thanks,
Matt
Created 02-15-2017 04:35 PM
Thanks for response. I will try this.I have regex like 's/","/\|/g; s/"//g', How do i use in this replace text processor.
Created 02-16-2017 04:28 AM
It works!!!!
Created 02-16-2017 11:53 AM
Hello,
Anshu here.
We have a requirement to anonymize IP address. So we would identify IP address and replace the last part with some arbitrary value.
We have tried using the following regex for that (with "Regex Replace" for replacement strategy, "Line-by-Line" as evaluation mode)
(1) \\d{1,3}\\.\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}
(2) ^(([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])\.){3}([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])$
(3) ((25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.){3}(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)
None of this works.
Can you help me with the right value of regex we can use and the replacement value as well.
Thank you in advance!
Created 02-16-2017 02:40 PM
Please open a new question for this issue rather then attaching to a question with an existing answer. While dealing with the same processor, the problem is different and it benefits the community to have it addressed under its own question.
Thanks,
Matt
Created 02-16-2017 02:48 PM
Tip: Make sure you tag your question with NiFi, so that the community of NiFi followers are notified about your new question.
Created 02-16-2017 03:02 PM
Thanks! I will do that. Regret for any inconvenience.
Created 02-16-2017 03:04 PM
No inconvenience at all. We just want to keep it as easy as possible for community members to find similar issues and solutions. I would have moved it myself if I couldhave. 🙂