About bbende

bbende · ‎11-14-2016

I guess it depends what you are trying to achieve. You are right that you would get an email per node in your cluster, but each email would be for the errors on that node only, so its not like you are getting 8 of the same email. If you really want only 1 email... It would probably be easiest to use something in between as a buffer for all your errors. For example, have a Kafka topic like "errors" and have each node in your cluster publish using PublishKafka. Then have a ConsumeKafka that runs only on primary node, merges together some amount of errors, and sends to a PutEmail (or maybe Merge's together first). Could do the same thing JMS, or a shared filesystem, or anywhere you put the errors and then retrieve from.

bbende · ‎11-14-2016

ConvertCsvToAvro attempts to convert each record... if at least one record was converted successfully then it transfers that flow file to "success", if any records could not be converted then it also transfers a flow file to "incompatible" which has all of the original CSV in the content and a summary of the incompatible records in an attribute called "errors". I think you need to implement a custom version of this processor that only transfers the flow file to success when the error count equals 0.

bbende · ‎11-11-2016

In existing releases you can not do CRON scheduling with Primary Node Only. An enhancement to support this was merged to master recently and will be included in the Apache NiFi 1.1 release, the JIRA is this ticket: https://issues.apache.org/jira/browse/NIFI-401

bbende · ‎11-10-2016

ConsumeKafka is implemented with "at-least once" guarantees, so it writes the data to a flow file and commits the session for that flow file, before committing the offsets to ensure no data loss. If an error happens committing the offsets, as can happen with the rebalance, the flow file is already committed and transferred to success and can't be undone. If it was done in the reverse order (commit offsets first, then commit the flow file session) you could route to a failure if committing the offsets failed, but you would also risk data loss if NiFi crashed after commit offsets but before committing the flow file session...this would be "at-most once" guarantees.

bbende · ‎11-10-2016

ConsumeKafka commits the offsets to Kafka right after the data has been written to flow file and the session for that flow flow has been committed. This way there is no chance for the data to be lost before committing the offsets to Kafka because the data has already been persisted to NiFi's repositories. Currently there is not a concept of having a series of processors treated as one operation. Right now you can think of it as two separate transfers of data, the first being from Kafka to NiFi, the second from NiFi to HDFS.

bbende · ‎11-10-2016

If it fails to commit the batch, then the data is still in Kafka because the offset wasn't updated, so it would be pulled again next time after the rebalance.

bbende · ‎11-10-2016

ConsumeKafka keeps a pool of consumers behind the scenes, equal to the number of concurrent tasks for that instance. So in a simple case with ConsumeKafka having 1 concurrent task, the first time it executes it will ask the pool for a consumer, there will be none the firs time through so it will create a new one, consume the data from Kafka and then stick it back in the pool for next time. From reading Kafka's documentation (https://kafka.apache.org/documentation) I would expect that the session timeout and heartbeat apply to the consumer while it is sitting the pool. So with the above configuration you described, I think the consumer in the pool would send heartbeats and stay active for 5 mins, then when the processor executed 5 mins later, it would have to create a new consumer from scratch.

bbende · ‎11-08-2016

The easiest way is to use the processor archetype described here: https://cwiki.apache.org/confluence/display/NIFI/Maven+Projects+for+Extensions

bbende · ‎11-08-2016

In AttributesToJson, is the destination set as flow file content or flow file attribute? It would need be set as flow file content for it to work with PutHBaseJson.

bbende · ‎11-08-2016

The answer here describes what needs to be done: https://community.hortonworks.com/questions/63180/error-in-nifi-flow.html#answer-63240 There shouldn't be much difference between clustered and standalone, other than having to repeat the same steps on multiple nodes when clustered.

Online	Offline
Last Visited	‎09-10-2020 01:23 PM

Member Since	‎09-29-2015 04:02 PM
Last Visited	‎09-10-2020 01:23 PM
Posts	871
Kudos received	709

Cloudera Community

Re: Using nifi registry in a nifi cluster.

Re: Is there a way to enable a stateful status upd...

Re: Automated Start/Stop of a NiFi Processor

Re: PublishKafkaRecord_0_10 1.2.0.3.0.1.1-5 Error:...

Re: how to configure mergecontent processor

Re: Apache Nifi - How to generate Email alerts in ...

Re: I want to convert the entire file to avro only...

Re: Running processors on fixed-time-intervals on ...

Re: NiFi-Kafka: Commit failure notification due to...

Re: Nifi ConsumeKafka processor

Re: NiFi-Kafka: Commit failure notification due to...

Re: NiFi-Kafka: How are they connected?

Re: Best practice on building a new NiFi processor...

Re: Error while storing HL7 attributes in Hbase

Re: How to configure HDF 2.0 (cluster) to send to ...