About Raj_B

Raj_B · ‎01-12-2017

Also, I'm in the process of having the Socket buffer (for ListenTCP) increased to 4 MB (the max the Unix admins can change it to).

Raj_B · ‎01-12-2017

Thanks a lot @Timothy Spann I'm going to work with our Admin about the JVM settings and about the # of cores we have. The flowfiles are small, about 5 KB each or less. ListenTCP processor is throwing these errors - "Internal queue at maximum capacity, could not queue event"; and messages are queuing on the source system side. Below are the memory settings for the ListenTCP that I set.

Raj_B · ‎01-12-2017

Hi guys, I have a use case where we need to load near real-time streaming data into HDFS; incoming data is of high volume, about 1500 messages per second; I've a NiFi dataflow where the ListenTCP processor is ingesting the streaming data, but the requirement is to check the incoming messages for the required structure; so, messages from ListenTCP go to a custom processor that does the structure checking; only messages that have the right structure move forward to MergeContent processor and onto PutHDFS; right now, the validation/check processor became a bottleneck and the backpressure from that processor is causing ListenTCP to queue messages at the source system (the one sending the messages); Since the message validation processor is not able to handle the incoming data fast enough, I'm thinking that I write the messages from ListenTCP first to the file system and then let the validation processor get the messages from the file system and continue forward. Is this the right approach to resolve this; are there any suggestions for alternatives. Thanks in advance.

Raj_B · ‎01-11-2017

Hello, From using the NiFi's PutHDFS processor, it seems it creates missing directories in HDFS by default, but documentation doesn't specify whether it would or would not create missing directories and there are no properties related to it (like there is one for PutFile I believe). Does anyone know anything more about this "undocumented" feature, if I can call that. My only concern is, if this is not a documented feature, does this feature (creating missing directories) work reliably in a Production environment. Did i miss this in the documentation ? Thanks.

Raj_B · ‎01-11-2017

Thank you @jwitt for the heads up on what's in the pipeline.

Raj_B · ‎01-11-2017

Terrific, thanks a lot for your time.

Raj_B · ‎01-11-2017

Awesome explanation @Matt, thanks. Earlier I saw some examples of people retrying failed flow files 3 times, etc., but I was not sure where that would make sense; but I see now where it would be appropriate to retry flow files; for retrying, besides the failed flowfiles for network related errors, at what other processors or types of scenarios would need a retrying of failed flowfiles? since we have 2 types of scenarios, one where you want to retry flow files and the other where you want to log, etc., I was thinking to have 2 process groups that accommodate these 2 scenarios and if I have a lot of processors where there is potential for failure, then collect the 2 kinds of failed flowfiles (one to retry and one to log) and send them to either of these 2 process groups accordingly. would that approach work ? Thanks in advance.

Raj_B · ‎01-10-2017

Hi All, I would appreciate if you guys can point me to where I can find best practices for error handling in NiFi. Below is how I'm envisioning handling errors in my workflows. Would you suggest any enhancements or better ways to do it. My error handling requirements are simple, basically to log the errored flow files to the file system and send an alert; so all the processors in the dataflow that have a "failure" relationship would send the failed flowfiles to a funnel and from there they would go to an error handling Process group, which does the logging and alerting. Thanks

Raj_B · ‎01-10-2017

Greetings, I am not sure if I understand why you would create multiple input and output ports for a process group (PG). What purpose would the additional ports serve ? I am thinking that if you want to "call" (or send/receive data to/from) the same PG from many different processors in NiFi, then you would use different ports for each processor, to avoid mixing data the PG is getting from the different processors. As an example (please see the image below) if the PG has a MergeContent processor, then I would not want to merge flow files that are coming from different Processors into the PG. Is that one (if any) of the reason for having multiple ports in a PG ?

Raj_B · ‎11-22-2016

@jfrazee @Andrew Grande @jwitt thank you all for the ideas and information.

Online	Offline
Last Visited	‎08-19-2020 03:25 PM

Member Since	‎04-29-2016 04:49 PM
Last Visited	‎08-19-2020 03:25 PM
Posts	192
Kudos received	20

Cloudera Community

Re: Does NiFi evaluate processor properties (with ...

Re: NiFi's GetHDFS processor with Cron schedule no...

Re: Suggestions to handle high volume streaming da...

Re: Suggestions to handle high volume streaming da...

Suggestions to handle high volume streaming data i...

NiFi PutHDFS & create missing directories

Re: NiFi best practices for error handling

Re: NiFi best practices for error handling

Re: NiFi best practices for error handling

NiFi best practices for error handling

What's the purpose of multiple input and output po...

Re: NiFi Code to Description mapping