Created 08-16-2016 10:13 AM
Created 08-16-2016 01:59 PM
In general, concurrent tasks is the number of threads calling onTrigger for an instance of a processors. In a cluster, if you set concurrent tasks to 4, then it is 4 threads on each node of your cluster.
I am not as familiar with all the ins and outs of the kafka processors, but for GetKafka it does something like this:
int concurrentTaskToUse = context.getMaxConcurrentTasks(); final Map<String, Integer> topicCountMap = new HashMap<>(1); topicCountMap.put(topic, concurrentTaskToUse); final Map<String, List<KafkaStream<byte[], byte[]>>> consumerMap = consumer.createMessageStreams(topicCountMap);
The consumer is from the Kafka 0.8 client, so it is creating a message-stream for each concurrent task.
Then when the processor is triggered it takes one of those message-streams and consumes a message, and since multiple concurrent tasks are trigger the processor, it is consuming from each of those streams in parallel.
As far as rebalancing, I think the Kafka client does that transparently to NiFi, but I am not totally sure.
Messages that have already been pulled into a NiFi node will stay there until the node is back up and processing.
Created 08-16-2016 01:59 PM
In general, concurrent tasks is the number of threads calling onTrigger for an instance of a processors. In a cluster, if you set concurrent tasks to 4, then it is 4 threads on each node of your cluster.
I am not as familiar with all the ins and outs of the kafka processors, but for GetKafka it does something like this:
int concurrentTaskToUse = context.getMaxConcurrentTasks(); final Map<String, Integer> topicCountMap = new HashMap<>(1); topicCountMap.put(topic, concurrentTaskToUse); final Map<String, List<KafkaStream<byte[], byte[]>>> consumerMap = consumer.createMessageStreams(topicCountMap);
The consumer is from the Kafka 0.8 client, so it is creating a message-stream for each concurrent task.
Then when the processor is triggered it takes one of those message-streams and consumes a message, and since multiple concurrent tasks are trigger the processor, it is consuming from each of those streams in parallel.
As far as rebalancing, I think the Kafka client does that transparently to NiFi, but I am not totally sure.
Messages that have already been pulled into a NiFi node will stay there until the node is back up and processing.