Member since
09-29-2015
871
Posts
723
Kudos Received
255
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 4309 | 12-03-2018 02:26 PM | |
| 3250 | 10-16-2018 01:37 PM | |
| 4344 | 10-03-2018 06:34 PM | |
| 3210 | 09-05-2018 07:44 PM | |
| 2448 | 09-05-2018 07:31 PM |
10-30-2017
01:53 PM
Nice article! You could also use the "Message Demarcator" property in PublishKafka (set to a new-line) and this way you never have to split up your flow file, it will stream the large flow file and read based on the demarcator so you still get each line sent as an individual message to Kafka.
... View more
10-30-2017
01:48 PM
Hello, this post is for ListenUDP, ListenTCP, ListenSyslog, and ListenRELP. The ListenWebSocket processor is implemented differently and does not necessarily follow what is described here. I'm not familiar with the websocket processor, but maybe others have experience with tuning it.
... View more
10-26-2017
02:53 PM
Here is the description of the Kafka properties from their source code... max.request.size The maximum size of a request. This is also effectively a cap on the maximum record size. Note that the server has its own cap on record size which may be different from this. This setting will limit the number of record batches the producer will send in a single request to avoid sending huge requests. buffer.memory The total bytes of memory the producer can use to buffer records waiting to be sent to the server. If records are sent faster than they can be delivered to the server the producer will either block or throw an exception based on the preference specified by <code>block.on.buffer.full</code>.
This setting should correspond roughly to the total memory the producer will use, but is not a hard bound since not all memory the producer uses is used for buffering. Some additional memory will be used for compression (if compression is enabled) as well as for maintaining in-flight requests. For your case I don't think you really need to change either of these values from the defaults since you are sending 4Kb messages. Usually you would increase max.request.size if you have a single message that is larger than 1MB.
... View more
10-25-2017
03:40 PM
Yes this is correct! It is hard to say what batch size is the best, but as long as you are batching together at least a couple of thousand messages, it will be significantly better than 1 message per flow file. So maybe start with 10,000 and tune from there.
... View more
10-24-2017
01:56 PM
It would probably be best to start a new question for this, but... If you have three different CAs that were used to generate three different certs for each of your standalone nodes, then you will need to create a single truststore that has the public keys of all three CAs, and each NiFi node will need to use that trutstore in order to trust the other nodes. The initial admin identity must be the same on all nodes. If you have a client cert that worked on one of your standalone nodes, and if you do what I described above with the truststore, then you can still use this client cert. Each node, will need all the node identities listed. You will also need to delete the users.xml and authorizations.xml from each node so that it starts over.
... View more
10-11-2017
02:12 PM
I have no idea how Kerberos works with MSSQL, but in general you would need a way to tell the driver which principal and keytab to use.... The DBCPConnectionPool does not have properties for this because it is driver specific and not part of the JDBC specification. The only way I could see the driver obtaining the information was if it was somehow passed in the connection URL, or if it looked for a JAAS config file set through a system property.
... View more
10-03-2017
12:50 PM
1 Kudo
You could improve the performance significantly by using the record-oriented capabilities introduced in Apache NiFi 1.2.0... You would use ConsumeKafkaRecord_0_10 with a JsonTreeReader and an AvroRecordSetWriter and set the batch size to something like 1000 (or more). This would produce 1 flow file coming out of ConsumeKafkaRecord_0_10 that already has the Avro records in it, then you could eliminate the need for ConvertJSONToAvro, and possibly eliminate MergeContent since you will already have a bunch of records in a flow file.
... View more
09-27-2017
01:12 PM
I believe it was mentioned - "Remote Process Group to distribute the listings to all the NiFi nodes, then a FetchFile for each node to retrieve the listings."
... View more
09-20-2017
01:33 PM
If you want to have a default value of "null" then the type of your field needs to be a union of null and the real type. For example, for timestamp you would need: "type": ["long", "null"]
... View more
09-19-2017
01:31 PM
You probably also need to increase Minimum Number of Entries to something greater than 1.
... View more