About bbende

bbende · ‎10-11-2016

This is most likely a dependency issue between your custom NAR and the controller service. Please see this Wiki page: https://cwiki.apache.org/confluence/display/NIFI/Maven+Projects+for+Extensions#MavenProjectsforExtensions-LinkingProcessorsandControllerServices And this example: https://github.com/bbende/nifi-dependency-example

bbende · ‎10-07-2016

The Sandbox comes with Java 7 and HDF 2.0 requires Java 8. I can't speak for all the components in HDF, but for NiFI, if you extract a JDK 8 somewhere on the sandbox, you can set NiFi's JAVA_HOME in bin/nifi-env.sh

bbende · ‎10-06-2016

The way to set this up in Apache NiFi 1.0.0 is to use a single NiFi cluster with process groups for each team, and restrict the permissions appropriately so that members of a team can only work with in their given process group. See the "Multi-Tenancy" section of this post for an example: http://bryanbende.com/development/2016/08/17/apache-nifi-1-0-0-authorization-and-multi-tenancy

bbende · ‎10-06-2016

I'm not sure I understand the question.. Are you asking if you can run two NiFi clusters on the same physical hardware? If so, what would be the reason for doing this?

bbende · ‎10-06-2016

There could be many other things wrong, but in your ZooKeeper properties, the servers should be the same on every node. If instance #1 is pointing to 2888:3888 for all three ZK servers, then instance #2 and #3 should also be pointing to the same thing.

bbende · ‎10-05-2016

Yup, if you specify the Message Demarcator it will write all of the messages received in a single poll from Kafka to a single FlowFile, so it would be a maximum of "Max Poll Records" per flow file, but could be less if it received less from Kafka.

bbende · ‎10-05-2016

I think there are two different things, one is understanding the performance, and another is validating the data/correctness. For performance, you should be able to get a good idea of the performance by looking at the various statistics in NiFI, there are stats on each processor, process groups, and from the global menu Summary page, they all show things like FlowFiles in/out and bytes in/out. For validating the data, I don't know there is a great way to do this other than checking what ends up in HDFS. Most of the time a dataflow is a continuous stream of data and not really a discrete set where you know how many records. One last thing to mention, do you really want to write 1 million small files to HDFS? You would probably get a lot better performance by using the batching capability on ConsumeKafka to write a couple of hundred, or even thousand, messages to a single flow file.

bbende · ‎10-05-2016

There was a discussion about this at one point which resulted in this JIRA: https://issues.apache.org/jira/browse/NIFI-1924 It was determined that rather than creating new processors, it should be possible to change the scheme of the filesystem from hdfs:// to webhdfs:// and still use the existing processors. It is unclear to me whether this ended up fully working or not.

bbende · ‎10-04-2016

This blog has an explanation of how to scale ConsumeKafka across a cluster: http://bryanbende.com/development/2016/09/15/apache-nifi-and-apache-kafka If you have a 4 node NiFi cluster with 4 ConsumeKafka processors for the same topic and group, then you have 16 consumers for a single topic with 3 partitions, so only 3 out of 16 are actually doing anything. If you are going to stick with 3 partitions, then you should have 1 ConumeKafka processor which means 4 in total. If you want to write to 4 different HDFS directories you can connect the success relationship of the 1 ConsumeKafka to 4 different PutHDFS processors. if you wanted more parallelism you would have to increase the number of partitions.

bbende · ‎10-03-2016

This is known issue in the 1.0.0 release when running a single-node cluster: https://issues.apache.org/jira/browse/NIFI-2777 You can set nifi.cluster.is.node to false to change to standalone mode.

Online	Offline
Last Visited	‎09-10-2020 01:23 PM

Member Since	‎09-29-2015 04:02 PM
Last Visited	‎09-10-2020 01:23 PM
Posts	871
Kudos received	709

Cloudera Community

Re: Using nifi registry in a nifi cluster.

Re: Is there a way to enable a stateful status upd...

Re: Automated Start/Stop of a NiFi Processor

Re: PublishKafkaRecord_0_10 1.2.0.3.0.1.1-5 Error:...

Re: how to configure mergecontent processor

Re: Custom Nifi Processor can't see existing Contr...

Re: I am getting . "org/apache/nifi/bootstrap/RunN...

Re: NIFI : Cluster with Multiple Instance

Re: NIFI : Cluster with Multiple Instance

Re: NIFI : Cluster with Multiple Instance

Re: Best way to check nifi cluster performance.

Re: Best way to check nifi cluster performance.

Re: How to write files into WebHDFS with Nifi?

Re: Best way to check nifi cluster performance.

Re: Nifi: how to set the cluster node identifier?