Member since
09-29-2015
871
Posts
723
Kudos Received
255
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 4257 | 12-03-2018 02:26 PM | |
| 3196 | 10-16-2018 01:37 PM | |
| 4304 | 10-03-2018 06:34 PM | |
| 3162 | 09-05-2018 07:44 PM | |
| 2423 | 09-05-2018 07:31 PM |
10-11-2016
01:28 PM
3 Kudos
This is most likely a dependency issue between your custom NAR and the controller service. Please see this Wiki page: https://cwiki.apache.org/confluence/display/NIFI/Maven+Projects+for+Extensions#MavenProjectsforExtensions-LinkingProcessorsandControllerServices And this example: https://github.com/bbende/nifi-dependency-example
... View more
10-07-2016
08:50 PM
1 Kudo
The Sandbox comes with Java 7 and HDF 2.0 requires Java 8. I can't speak for all the components in HDF, but for NiFI, if you extract a JDK 8 somewhere on the sandbox, you can set NiFi's JAVA_HOME in bin/nifi-env.sh
... View more
10-06-2016
01:25 PM
The way to set this up in Apache NiFi 1.0.0 is to use a single NiFi cluster with process groups for each team, and restrict the permissions appropriately so that members of a team can only work with in their given process group. See the "Multi-Tenancy" section of this post for an example: http://bryanbende.com/development/2016/08/17/apache-nifi-1-0-0-authorization-and-multi-tenancy
... View more
10-06-2016
12:36 PM
I'm not sure I understand the question.. Are you asking if you can run two NiFi clusters on the same physical hardware? If so, what would be the reason for doing this?
... View more
10-06-2016
12:21 PM
There could be many other things wrong, but in your ZooKeeper properties, the servers should be the same on every node. If instance #1 is pointing to 2888:3888 for all three ZK servers, then instance #2 and #3 should also be pointing to the same thing.
... View more
10-05-2016
03:19 PM
Yup, if you specify the Message Demarcator it will write all of the messages received in a single poll from Kafka to a single FlowFile, so it would be a maximum of "Max Poll Records" per flow file, but could be less if it received less from Kafka.
... View more
10-05-2016
02:26 PM
I think there are two different things, one is understanding the performance, and another is validating the data/correctness. For performance, you should be able to get a good idea of the performance by looking at the various statistics in NiFI, there are stats on each processor, process groups, and from the global menu Summary page, they all show things like FlowFiles in/out and bytes in/out. For validating the data, I don't know there is a great way to do this other than checking what ends up in HDFS. Most of the time a dataflow is a continuous stream of data and not really a discrete set where you know how many records. One last thing to mention, do you really want to write 1 million small files to HDFS? You would probably get a lot better performance by using the batching capability on ConsumeKafka to write a couple of hundred, or even thousand, messages to a single flow file.
... View more
10-05-2016
01:53 PM
1 Kudo
There was a discussion about this at one point which resulted in this JIRA: https://issues.apache.org/jira/browse/NIFI-1924 It was determined that rather than creating new processors, it should be possible to change the scheme of the filesystem from hdfs:// to webhdfs:// and still use the existing processors. It is unclear to me whether this ended up fully working or not.
... View more
10-04-2016
04:43 PM
2 Kudos
This blog has an explanation of how to scale ConsumeKafka across a cluster: http://bryanbende.com/development/2016/09/15/apache-nifi-and-apache-kafka If you have a 4 node NiFi cluster with 4 ConsumeKafka processors for the same topic and group, then you have 16 consumers for a single topic with 3 partitions, so only 3 out of 16 are actually doing anything. If you are going to stick with 3 partitions, then you should have 1 ConumeKafka processor which means 4 in total. If you want to write to 4 different HDFS directories you can connect the success relationship of the 1 ConsumeKafka to 4 different PutHDFS processors. if you wanted more parallelism you would have to increase the number of partitions.
... View more
10-03-2016
03:58 PM
3 Kudos
This is known issue in the 1.0.0 release when running a single-node cluster: https://issues.apache.org/jira/browse/NIFI-2777 You can set nifi.cluster.is.node to false to change to standalone mode.
... View more