About bbende

bbende · ‎10-30-2017

Nice article! You could also use the "Message Demarcator" property in PublishKafka (set to a new-line) and this way you never have to split up your flow file, it will stream the large flow file and read based on the demarcator so you still get each line sent as an individual message to Kafka.

bbende · ‎10-30-2017

Hello, this post is for ListenUDP, ListenTCP, ListenSyslog, and ListenRELP. The ListenWebSocket processor is implemented differently and does not necessarily follow what is described here. I'm not familiar with the websocket processor, but maybe others have experience with tuning it.

bbende · ‎10-26-2017

Here is the description of the Kafka properties from their source code... max.request.size The maximum size of a request. This is also effectively a cap on the maximum record size. Note that the server has its own cap on record size which may be different from this. This setting will limit the number of record batches the producer will send in a single request to avoid sending huge requests. buffer.memory The total bytes of memory the producer can use to buffer records waiting to be sent to the server. If records are sent faster than they can be delivered to the server the producer will either block or throw an exception based on the preference specified by <code>block.on.buffer.full</code>. This setting should correspond roughly to the total memory the producer will use, but is not a hard bound since not all memory the producer uses is used for buffering. Some additional memory will be used for compression (if compression is enabled) as well as for maintaining in-flight requests. For your case I don't think you really need to change either of these values from the defaults since you are sending 4Kb messages. Usually you would increase max.request.size if you have a single message that is larger than 1MB.

bbende · ‎10-25-2017

Yes this is correct! It is hard to say what batch size is the best, but as long as you are batching together at least a couple of thousand messages, it will be significantly better than 1 message per flow file. So maybe start with 10,000 and tune from there.

bbende · ‎10-24-2017

It would probably be best to start a new question for this, but... If you have three different CAs that were used to generate three different certs for each of your standalone nodes, then you will need to create a single truststore that has the public keys of all three CAs, and each NiFi node will need to use that trutstore in order to trust the other nodes. The initial admin identity must be the same on all nodes. If you have a client cert that worked on one of your standalone nodes, and if you do what I described above with the truststore, then you can still use this client cert. Each node, will need all the node identities listed. You will also need to delete the users.xml and authorizations.xml from each node so that it starts over.

bbende · ‎10-11-2017

I have no idea how Kerberos works with MSSQL, but in general you would need a way to tell the driver which principal and keytab to use.... The DBCPConnectionPool does not have properties for this because it is driver specific and not part of the JDBC specification. The only way I could see the driver obtaining the information was if it was somehow passed in the connection URL, or if it looked for a JAAS config file set through a system property.

bbende · ‎10-03-2017

You could improve the performance significantly by using the record-oriented capabilities introduced in Apache NiFi 1.2.0... You would use ConsumeKafkaRecord_0_10 with a JsonTreeReader and an AvroRecordSetWriter and set the batch size to something like 1000 (or more). This would produce 1 flow file coming out of ConsumeKafkaRecord_0_10 that already has the Avro records in it, then you could eliminate the need for ConvertJSONToAvro, and possibly eliminate MergeContent since you will already have a bunch of records in a flow file.

bbende · ‎09-27-2017

I believe it was mentioned - "Remote Process Group to distribute the listings to all the NiFi nodes, then a FetchFile for each node to retrieve the listings."

bbende · ‎09-20-2017

If you want to have a default value of "null" then the type of your field needs to be a union of null and the real type. For example, for timestamp you would need: "type": ["long", "null"]

bbende · ‎09-19-2017

You probably also need to increase Minimum Number of Entries to something greater than 1.

Online	Offline
Last Visited	‎09-10-2020 01:23 PM

Member Since	‎09-29-2015 04:02 PM
Last Visited	‎09-10-2020 01:23 PM
Posts	871
Kudos received	709

Cloudera Community

Re: Using nifi registry in a nifi cluster.

Re: Is there a way to enable a stateful status upd...

Re: Automated Start/Stop of a NiFi Processor

Re: PublishKafkaRecord_0_10 1.2.0.3.0.1.1-5 Error:...

Re: how to configure mergecontent processor

Re: Ingesting a Big CSV file into Kafka using a mu...

Re: Optimizing Performance of Apache NiFi's Networ...

Re: Significance and impact of Max Request Size on...

Re: listentTCP, Publishkafka throughput performanc...

Re: unable to set up nifi cluster using NIFI 1.0.0

Re: NiFi - Cannot create PoolableConnectionFactory...

Re: NiFi converting json from Kafka to columnar OR...

Re: How to distribute files on NiFi cluster and pr...

Re: Error while ingesting Plain CSV to SAM via NIF...

Re: How to merge flowfiles in nifi?