Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Replace Text using Regex with large file in NIFI

avatar
Expert Contributor

Hi,

I'm using HDP 2.5 with Nifi 0.6 version. I have 10GB CSV file, i want replace double quotes with pipe delimiter. Is it possible on NIFI Replace Text processor? or I need to write any external script like Groovy,Luna. Please tell me any reference links, how to implement either scripting or replace text processor for large file.

1 ACCEPTED SOLUTION

avatar
Super Mentor

@Varun R

The ReplaceText processor has two Evaluation Modes (Line-by-Line and Entire text). Entire Text is the default which reads the entire contents of your FLowFile in to NIFI's JVM heap memory for evaluation. With such a large file this strategy is not ideal and could lead to out of memory conditions for your NiFi. If the content of your FlowFile is multiple lines, you could switch to using the Line-by-Line evaluation mode which will result in less heap memory usage but ultimately the same resulting modified content in the outgoing FlowFile.

So you might want to try a ReplaceText processor configuration like the following:

12488-screen-shot-2017-02-15-at-81839-am.png

Thanks,

Matt

View solution in original post

8 REPLIES 8

avatar
Super Mentor

@Varun R

The ReplaceText processor has two Evaluation Modes (Line-by-Line and Entire text). Entire Text is the default which reads the entire contents of your FLowFile in to NIFI's JVM heap memory for evaluation. With such a large file this strategy is not ideal and could lead to out of memory conditions for your NiFi. If the content of your FlowFile is multiple lines, you could switch to using the Line-by-Line evaluation mode which will result in less heap memory usage but ultimately the same resulting modified content in the outgoing FlowFile.

So you might want to try a ReplaceText processor configuration like the following:

12488-screen-shot-2017-02-15-at-81839-am.png

Thanks,

Matt

avatar
Expert Contributor

Thanks for response. I will try this.I have regex like 's/","/\|/g; s/"//g', How do i use in this replace text processor.

avatar
Expert Contributor

It works!!!!

avatar

Hello,

Anshu here.

We have a requirement to anonymize IP address. So we would identify IP address and replace the last part with some arbitrary value.

We have tried using the following regex for that (with "Regex Replace" for replacement strategy, "Line-by-Line" as evaluation mode)

(1) \\d{1,3}\\.\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}

(2) ^(([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])\.){3}([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])$

(3) ((25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.){3}(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)

None of this works.

Can you help me with the right value of regex we can use and the replacement value as well.

Thank you in advance!

avatar
Super Mentor

@Anshuman Ghosh

Please open a new question for this issue rather then attaching to a question with an existing answer. While dealing with the same processor, the problem is different and it benefits the community to have it addressed under its own question.

Thanks,

Matt

avatar
Super Mentor

@Anshuman Ghosh

Tip: Make sure you tag your question with NiFi, so that the community of NiFi followers are notified about your new question.

avatar

@Matt Clarke

Thanks! I will do that. Regret for any inconvenience.

avatar
Super Mentor

@Anshuman Ghosh

No inconvenience at all. We just want to keep it as easy as possible for community members to find similar issues and solutions. I would have moved it myself if I couldhave. 🙂