About bbende

bbende · ‎04-27-2017

When to use "primary node only" depends on whether the operation is something that makes sense to happen on all nodes, or whether its something that only makes sense to happen once. Here are some examples... ListHDFS - this should be primary node only because otherwise you are going to perform the same listing on all nodes ConsumeKafka - this can be run on all nodes because each one will be consuming different data GetFile - this can be run on all nodes because each node will pick up different data from a local directory In your Kafka scenario, instances of a processor equate to what you see on the graph times the # of nodes in the cluster, so if you have a two node cluster with one ConsumeKafka_0_10 on the canvas, then there are two instances of ConsumeKafka_0_10. If you increase concurrent tasks to 3, then there are 3 threads executing each instance on each node, so 6 total. Since you have 6 partitions, each of these 6 threads should consume from a separate partition.

bbende · ‎04-25-2017

You may also want to check out ListDatabaseTables which periodically perform a listing of all the database tables to query: https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.standard.ListDatabaseTables/index.html Each flow file will get an attribute "db.table.name" and you would have to figure out how to create the appropriate SQL for each table and pass it to ExecuteSQL referencing ${db.table.name} in the SQL.

bbende · ‎04-21-2017

I believe your understanding is correct... The flow file repository is like a transaction log that provides the state of where every flow file is in the flow. The session object provided to processors lets them perform a transaction. For processors in the middle of the flow, they have a queue leading into them... when they execute they take 1 or more flow files off the queue, operate on them, and transfer them to a relationship. If everything worked successfully then session.commit() is called which updates the flow file repository and places these flow files into their next queue. If an error happened then the session is rollbacked and the flow files end up back in the original queue. If NiFi shuts down or crashes while the processor is operating on flow files but before session.commit, then they end up back in the original queue like any other error. For source or destination processors, a lot of depends on the source or destination system and what protocol is being used to exchange data. NiFi can only provide guarantees that are as good as the protocol provides. For example, in the ListenTCP case, if NiFi crashes at the exact moment that it has read a message off the socket, but before it has written to a flow file, then this message is lost because as far as the TCP connection is concerned it was successful. This is why application protocols like RELP were built on top of TCP to offer two-phase commits so that the receiver can acknowledge that not only did it read it off the socket, but it also performed any additional operations successfully.

bbende · ‎04-20-2017

This is normal behavior, typically you just call session.get() and if the flow file returned is null then just return from the processor: https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/PutFile.java#L181-L184 The most common reason for this is when concurrent tasks is set to more than 1 for a processor with an input queue. In this case, the framework might trigger both threads to execute, but thread 1 might grab the only flow file, or all the available flow files, and by the time thread 2 executes there is nothing left.

bbende · ‎04-20-2017

Basically just looking for a thread that could be stuck and preventing something from happening. Unfortunately there isn't something specific I could say to look for, maybe a stack that looks very different than all the other threads, or one that always seems to be in the exact same spot every time you run dump. You could look for RouteOnAttribute in the dump, but since you said the processor itself doesn't show any threads running then I doubt that will show up, but worth a try.

bbende · ‎04-20-2017

When its stuck can you run "bin/nifi.sh dump" and get the entire thread dump from nifi-bootstrap.log and attach it? Also, what version of NiFi?

bbende · ‎04-20-2017

Using ReplaceText with the Replacement Strategy set to Prepend and Evaluation Mode set to Entire Text, will put the Replacement Value at the beginning of the content. Same thing could be done when using Replacement Strategy of Append to place the replacement at the end. Alternatively, if you are using MergeContent (I can't remember) then you can use the Delimiter Strategy of Text and using the Header or Footer to enter a new line. You can use shift+enter as the property value for the Header or Footer to create a new line.

bbende · ‎04-20-2017

I wrote this on a different question yesterday, but related to the same question.. Regarding PutHDFS and appending, I believe this expected behavior... PutHDFS has no idea what it is writing to HDFS, its just writing bytes, which may or may not represent text. If you were appending parts of an image or video, there would be no such thing as new lines. If you want a new line when you start appending, then you need the previously written data to end with a new line, or the next data to start with a new line. This should be easily done by manipulating the data in the flow before PutHDFS.

bbende · ‎04-19-2017

This is described in the user guide here: https://nifi.apache.org/docs/nifi-docs/html/user-guide.html#Controller_Services Controller services for processors must be defined from the context palette on the left.

bbende · ‎04-12-2017

This means you set -Xms (the initial heap size) larger than -Xmx (the max heap size).

Online	Offline
Last Visited	‎09-10-2020 01:23 PM

Member Since	‎09-29-2015 04:02 PM
Last Visited	‎09-10-2020 01:23 PM
Posts	871
Kudos received	709

Cloudera Community

Re: Using nifi registry in a nifi cluster.

Re: Is there a way to enable a stateful status upd...

Re: Automated Start/Stop of a NiFi Processor

Re: PublishKafkaRecord_0_10 1.2.0.3.0.1.1-5 Error:...

Re: how to configure mergecontent processor

Re: NiFi: Isolated Processors in a clustered

Re: Dynamic Creation of Processors in NiFi

Re: Questions on data recovery/integrity after sta...

Re: Does any one know why or under what circumstan...

Re: RouteOnAttribute Processor will not Process da...

Re: RouteOnAttribute Processor will not Process da...

Re: Nifi's PutHDFS processor attribute Conflict Re...

Re: Nifi's PutHDFS processor attribute Conflict Re...

Re: Difference between Controller Services in UI

Re: HDF 2.1 NiFi 1.1 Won't Start