Member since
09-29-2015
871
Posts
723
Kudos Received
255
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 4261 | 12-03-2018 02:26 PM | |
| 3203 | 10-16-2018 01:37 PM | |
| 4309 | 10-03-2018 06:34 PM | |
| 3165 | 09-05-2018 07:44 PM | |
| 2425 | 09-05-2018 07:31 PM |
11-14-2016
04:46 PM
3 Kudos
I guess it depends what you are trying to achieve. You are right that you would get an email per node in your cluster, but each email would be for the errors on that node only, so its not like you are getting 8 of the same email. If you really want only 1 email... It would probably be easiest to use something in between as a buffer for all your errors. For example, have a Kafka topic like "errors" and have each node in your cluster publish using PublishKafka. Then have a ConsumeKafka that runs only on primary node, merges together some amount of errors, and sends to a PutEmail (or maybe Merge's together first). Could do the same thing JMS, or a shared filesystem, or anywhere you put the errors and then retrieve from.
... View more
11-14-2016
02:58 PM
2 Kudos
ConvertCsvToAvro attempts to convert each record... if at least one record was converted successfully then it transfers that flow file to "success", if any records could not be converted then it also transfers a flow file to "incompatible" which has all of the original CSV in the content and a summary of the incompatible records in an attribute called "errors". I think you need to implement a custom version of this processor that only transfers the flow file to success when the error count equals 0.
... View more
11-11-2016
07:19 PM
2 Kudos
In existing releases you can not do CRON scheduling with Primary Node Only. An enhancement to support this was merged to master recently and will be included in the Apache NiFi 1.1 release, the JIRA is this ticket: https://issues.apache.org/jira/browse/NIFI-401
... View more
11-10-2016
08:59 PM
ConsumeKafka is implemented with "at-least once" guarantees, so it writes the data to a flow file and commits the session for that flow file, before committing the offsets to ensure no data loss. If an error happens committing the offsets, as can happen with the rebalance, the flow file is already committed and transferred to success and can't be undone. If it was done in the reverse order (commit offsets first, then commit the flow file session) you could route to a failure if committing the offsets failed, but you would also risk data loss if NiFi crashed after commit offsets but before committing the flow file session...this would be "at-most once" guarantees.
... View more
11-10-2016
05:34 PM
ConsumeKafka commits the offsets to Kafka right after the data has been written to flow file and the session for that flow flow has been committed. This way there is no chance for the data to be lost before committing the offsets to Kafka because the data has already been persisted to NiFi's repositories. Currently there is not a concept of having a series of processors treated as one operation. Right now you can think of it as two separate transfers of data, the first being from Kafka to NiFi, the second from NiFi to HDFS.
... View more
11-10-2016
01:52 PM
If it fails to commit the batch, then the data is still in Kafka because the offset wasn't updated, so it would be pulled again next time after the rebalance.
... View more
11-10-2016
01:51 PM
ConsumeKafka keeps a pool of consumers behind the scenes, equal to the number of concurrent tasks for that instance. So in a simple case with ConsumeKafka having 1 concurrent task, the first time it executes it will ask the pool for a consumer, there will be none the firs time through so it will create a new one, consume the data from Kafka and then stick it back in the pool for next time. From reading Kafka's documentation (https://kafka.apache.org/documentation) I would expect that the session timeout and heartbeat apply to the consumer while it is sitting the pool. So with the above configuration you described, I think the consumer in the pool would send heartbeats and stay active for 5 mins, then when the processor executed 5 mins later, it would have to create a new consumer from scratch.
... View more
11-08-2016
07:59 PM
3 Kudos
The easiest way is to use the processor archetype described here: https://cwiki.apache.org/confluence/display/NIFI/Maven+Projects+for+Extensions
... View more
11-08-2016
05:24 PM
In AttributesToJson, is the destination set as flow file content or flow file attribute? It would need be set as flow file content for it to work with PutHBaseJson.
... View more
11-08-2016
03:30 PM
The answer here describes what needs to be done: https://community.hortonworks.com/questions/63180/error-in-nifi-flow.html#answer-63240 There shouldn't be much difference between clustered and standalone, other than having to repeat the same steps on multiple nodes when clustered.
... View more