Member since
06-26-2017
191
Posts
10
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1912 | 09-22-2017 07:13 PM |
10-26-2017
02:53 PM
Here is the description of the Kafka properties from their source code... max.request.size The maximum size of a request. This is also effectively a cap on the maximum record size. Note that the server has its own cap on record size which may be different from this. This setting will limit the number of record batches the producer will send in a single request to avoid sending huge requests. buffer.memory The total bytes of memory the producer can use to buffer records waiting to be sent to the server. If records are sent faster than they can be delivered to the server the producer will either block or throw an exception based on the preference specified by <code>block.on.buffer.full</code>.
This setting should correspond roughly to the total memory the producer will use, but is not a hard bound since not all memory the producer uses is used for buffering. Some additional memory will be used for compression (if compression is enabled) as well as for maintaining in-flight requests. For your case I don't think you really need to change either of these values from the defaults since you are sending 4Kb messages. Usually you would increase max.request.size if you have a single message that is larger than 1MB.
... View more
10-26-2017
01:12 PM
@Abdelkrim Hadjidj @Matt Burgess The use case here is getting the files from Mainframe, once a each day, however I got to know that there is no concept directory structure in Mainframes ( no idea how mainframes work), so it is not able to list the files (GetFTP as well as ListFTP and FetchFTP) Is there any other way to get around? I read some blogs and answers which suggested to use syncsort or Informatica powercenter. We tried using our current way of running shell script which goes and fetch the files. We can run the script using executeProcess and save it on one of the nodes (primary nodes), however primary node keeps changing. yesterday it was different primary today it is different one. In addition, if we mount it to share the directory across the node, it will be against the policies (too much admin work) Any help, thoughts
... View more
10-25-2017
06:47 PM
1 Kudo
@Shu Thanks I just did that before I saw this answer, it worked. Appreciate it. Using the website you refered yesterday http://www.cronmaker.com/ Thanks again
... View more
10-25-2017
04:03 PM
@Bryan Bende Aweosme Thanks and appreciate it
... View more
10-25-2017
12:18 PM
@Wynner Thanks a lot and appreciate your help always
... View more
10-24-2017
04:44 AM
Hi @dhieru singh You need to do two things: First, you need a good capacity planing to evaluate the required infrastructure that can handle your data flows. Consider the worst case scenario to have room for improvement and the capacity to manage bursts. There are several resources out there that can help you https://community.hortonworks.com/articles/135337/nifi-sizing-guide-deployment-best-practices.html https://docs.hortonworks.com/HDPDocuments/HDF3/HDF-3.0.0/bk_command-line-installation/content/hdf_isg_hardware.html Second, as you said, you need to monitor your system at different level. Pierre has a set of articles on this topic that I recommend you to read : https://pierrevillard.com/2017/05/11/monitoring-nifi-introduction/
... View more
10-23-2017
09:49 PM
@Shu Thanks for explanation. It is very helpful, appreciate it. Dhieru
... View more
10-23-2017
05:06 PM
1 Kudo
You can use the Run Schedule property on the Scheduling tab of the processor to set the interval at which it will be scheduled to run, so for 10k events per second you can set it to "100 nanos".
... View more
10-20-2017
07:53 PM
@Shu @dhieru singh By default their is no guaranteed order in which FlowFiles are pulled from he queue feeding any given processor. This is because NiFi favor performance over order. If you want enforce some sort of order in which FlowFiles are pulled from a inbound queue, you must add a "Prioritizer" to the inbound connection. By default, no prioritizers are added. To apply a prioritizer, simply drag the desired prioritizer(s) to the "Selected Prioritizers" box. Regardless of strategy used in your DistributeLoad processor (round Robin or next available), There will not be a continuos order to the FlowFiles queued to either MergeContent processor. Thanks, Matt
... View more