About dhieru

bbende · ‎10-26-2017

Here is the description of the Kafka properties from their source code... max.request.size The maximum size of a request. This is also effectively a cap on the maximum record size. Note that the server has its own cap on record size which may be different from this. This setting will limit the number of record batches the producer will send in a single request to avoid sending huge requests. buffer.memory The total bytes of memory the producer can use to buffer records waiting to be sent to the server. If records are sent faster than they can be delivered to the server the producer will either block or throw an exception based on the preference specified by <code>block.on.buffer.full</code>. This setting should correspond roughly to the total memory the producer will use, but is not a hard bound since not all memory the producer uses is used for buffering. Some additional memory will be used for compression (if compression is enabled) as well as for maintaining in-flight requests. For your case I don't think you really need to change either of these values from the defaults since you are sending 4Kb messages. Usually you would increase max.request.size if you have a single message that is larger than 1MB.

dhieru · ‎10-26-2017

@Abdelkrim Hadjidj @Matt Burgess The use case here is getting the files from Mainframe, once a each day, however I got to know that there is no concept directory structure in Mainframes ( no idea how mainframes work), so it is not able to list the files (GetFTP as well as ListFTP and FetchFTP) Is there any other way to get around? I read some blogs and answers which suggested to use syncsort or Informatica powercenter. We tried using our current way of running shell script which goes and fetch the files. We can run the script using executeProcess and save it on one of the nodes (primary nodes), however primary node keeps changing. yesterday it was different primary today it is different one. In addition, if we mount it to share the directory across the node, it will be against the policies (too much admin work) Any help, thoughts

dhieru · ‎10-25-2017

@Shu Thanks I just did that before I saw this answer, it worked. Appreciate it. Using the website you refered yesterday http://www.cronmaker.com/ Thanks again

dhieru · ‎10-25-2017

@Bryan Bende Aweosme Thanks and appreciate it

dhieru · ‎10-25-2017

@Wynner Thanks a lot and appreciate your help always

dhieru · ‎10-24-2017

@Wynner Thanks for the reply, appreciate it

ahadjidj · ‎10-24-2017

Hi @dhieru singh You need to do two things: First, you need a good capacity planing to evaluate the required infrastructure that can handle your data flows. Consider the worst case scenario to have room for improvement and the capacity to manage bursts. There are several resources out there that can help you https://community.hortonworks.com/articles/135337/nifi-sizing-guide-deployment-best-practices.html https://docs.hortonworks.com/HDPDocuments/HDF3/HDF-3.0.0/bk_command-line-installation/content/hdf_isg_hardware.html Second, as you said, you need to monitor your system at different level. Pierre has a set of articles on this topic that I recommend you to read : https://pierrevillard.com/2017/05/11/monitoring-nifi-introduction/

dhieru · ‎10-23-2017

@Shu Thanks for explanation. It is very helpful, appreciate it. Dhieru

mburgess · ‎10-23-2017

You can use the Run Schedule property on the Scheduling tab of the processor to set the interval at which it will be scheduled to run, so for 10k events per second you can set it to "100 nanos".

MattWho · ‎10-20-2017

@Shu @dhieru singh By default their is no guaranteed order in which FlowFiles are pulled from he queue feeding any given processor. This is because NiFi favor performance over order. If you want enforce some sort of order in which FlowFiles are pulled from a inbound queue, you must add a "Prioritizer" to the inbound connection. By default, no prioritizers are added. To apply a prioritizer, simply drag the desired prioritizer(s) to the "Selected Prioritizers" box. Regardless of strategy used in your DistributeLoad processor (round Robin or next available), There will not be a continuos order to the FlowFiles queued to either MergeContent processor. Thanks, Matt

Online	Offline
Last Visited	‎10-10-2019 12:00 AM

Member Since	‎06-26-2017 06:54 PM
Last Visited	‎10-10-2019 12:00 AM
Posts	191
Kudos received	10

Cloudera Community

Re: ListHDFS - Hadoop User Login, Non-Kerberos

Re: Significance and impact of Max Request Size on...

Re: fetchFtp is it mandatory to have upstream conn...

Re: Use ExcuteProcess to a shell script file

Re: listentTCP, Publishkafka throughput performanc...

Re: Setting the time in the Run Schedule of a proc...

Re: listSFTP and fetchSFTP is secure?

Re: Is there any problem with the JVM of NiFi, is ...

Re: Nifi cluster, list files from one of the nodes...

Re: Generate flow file processor, can we configure...

Re: using distribute load processor to connect to ...