Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Replace Text using Regex with large file in NIFI

Solved Go to solution
Highlighted

Replace Text using Regex with large file in NIFI

Contributor

Hi,

I'm using HDP 2.5 with Nifi 0.6 version. I have 10GB CSV file, i want replace double quotes with pipe delimiter. Is it possible on NIFI Replace Text processor? or I need to write any external script like Groovy,Luna. Please tell me any reference links, how to implement either scripting or replace text processor for large file.

1 ACCEPTED SOLUTION

Accepted Solutions

Re: Replace Text using Regex with large file in NIFI

Master Guru

@Varun R

The ReplaceText processor has two Evaluation Modes (Line-by-Line and Entire text). Entire Text is the default which reads the entire contents of your FLowFile in to NIFI's JVM heap memory for evaluation. With such a large file this strategy is not ideal and could lead to out of memory conditions for your NiFi. If the content of your FlowFile is multiple lines, you could switch to using the Line-by-Line evaluation mode which will result in less heap memory usage but ultimately the same resulting modified content in the outgoing FlowFile.

So you might want to try a ReplaceText processor configuration like the following:

12488-screen-shot-2017-02-15-at-81839-am.png

Thanks,

Matt

8 REPLIES 8

Re: Replace Text using Regex with large file in NIFI

Master Guru

@Varun R

The ReplaceText processor has two Evaluation Modes (Line-by-Line and Entire text). Entire Text is the default which reads the entire contents of your FLowFile in to NIFI's JVM heap memory for evaluation. With such a large file this strategy is not ideal and could lead to out of memory conditions for your NiFi. If the content of your FlowFile is multiple lines, you could switch to using the Line-by-Line evaluation mode which will result in less heap memory usage but ultimately the same resulting modified content in the outgoing FlowFile.

So you might want to try a ReplaceText processor configuration like the following:

12488-screen-shot-2017-02-15-at-81839-am.png

Thanks,

Matt

Re: Replace Text using Regex with large file in NIFI

Contributor

Thanks for response. I will try this.I have regex like 's/","/\|/g; s/"//g', How do i use in this replace text processor.

Re: Replace Text using Regex with large file in NIFI

Contributor

It works!!!!

Re: Replace Text using Regex with large file in NIFI

New Contributor

Hello,

Anshu here.

We have a requirement to anonymize IP address. So we would identify IP address and replace the last part with some arbitrary value.

We have tried using the following regex for that (with "Regex Replace" for replacement strategy, "Line-by-Line" as evaluation mode)

(1) \\d{1,3}\\.\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}

(2) ^(([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])\.){3}([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])$

(3) ((25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.){3}(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)

None of this works.

Can you help me with the right value of regex we can use and the replacement value as well.

Thank you in advance!

Re: Replace Text using Regex with large file in NIFI

Master Guru

@Anshuman Ghosh

Please open a new question for this issue rather then attaching to a question with an existing answer. While dealing with the same processor, the problem is different and it benefits the community to have it addressed under its own question.

Thanks,

Matt

Re: Replace Text using Regex with large file in NIFI

Master Guru

@Anshuman Ghosh

Tip: Make sure you tag your question with NiFi, so that the community of NiFi followers are notified about your new question.

Re: Replace Text using Regex with large file in NIFI

New Contributor

@Matt Clarke

Thanks! I will do that. Regret for any inconvenience.

Re: Replace Text using Regex with large file in NIFI

Master Guru

@Anshuman Ghosh

No inconvenience at all. We just want to keep it as easy as possible for community members to find similar issues and solutions. I would have moved it myself if I couldhave. :)

Don't have an account?
Coming from Hortonworks? Activate your account here