Member since
07-30-2019
3406
Posts
1621
Kudos Received
1006
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 27 | 12-17-2025 05:55 AM | |
| 88 | 12-15-2025 01:29 PM | |
| 43 | 12-15-2025 06:50 AM | |
| 199 | 12-05-2025 08:25 AM | |
| 339 | 12-03-2025 10:21 AM |
03-29-2018
11:34 AM
@Carrick That property does support Expression Language, but you need to understand what this property is doing with the EL you provided. ${trigger.type} <-- This is an expression language statement. It is asking NiFi to check for the existence of an attribute on the incoming FlowFile, assigned to a Process group, found in the NiFi registry file, found in a JVM property and finally found in NiFi user env variable. (check in that order, first match ends search.) If not found nothing is returned. If found the value assigned to that attribute is returned. So in your case what is being returned is 'mouse' or 'keyboard'. So now the merge content processor is using mouse or keyboard as the correlation attribute name. And in you case there is no attribute named mouse or keyboard so all the flowfiles with null values are ending up in the same bin. As soon as you removed the EL and simply defined the attribute to use for correlation it worked. Thanks, Matt
... View more
03-27-2018
01:12 PM
@Carrick NiFi will merge a bin that has met minimum as part of a thread execution. Lets assume a steady stream of FlowFiles is entering the incoming connection queue feeding the MergeContent processor. When MergeContent runs (obtains a thread) in looks at incoming queue and grabs from the active queue only those Flowfiles which are there at that exact moment in time. That thread will not know about any FlowFiles that entered that queue after that moment in time. Upon placing those Flowfiles in one or more bins that same thread will asses if a bin has satisfied the minimum requirements and if so the bin will be merged. Now consider a MergeContent processor with a run schedule of 0 sec (default). It will be requesting tasks as fast as possible, which means each executed thread may only see as few as one new FlowFile in the incoming Queue when it runs. Which means you would end up with Merged Flowfiles that consist of only one FlowFIle. Now lets assume you have MergeContent only run every 1 minute. In between those two executions of 10,000 new FlowFiles queue on the incoming Connection. On next run the mergeContent thread now sees 10,000 new Flowfiles. It will allocate 6,000 to 1 bin (because you set a max) and play the other 4,000 in another bin. At end of the thread both those bins are eligible to be merged because they both met the min, but as you can see 1 did end up with 6,000 FlowFiles in it. - If the intent is never to have 1 FlowFile in a merge, do not set min to 1. - If the Flow feeding the MergeContent is slow, change the run schedule so it does not run as often allowing more FlowFiles to queue between executions. - If setting min settings to any value beyond 1, make sure you set max bin age. This setting makes sure that a bin will eventually be merged even if it never meets the minimum configured values. Hope this clarifies how this processor works. Thanks, Matt
... View more
03-27-2018
12:33 PM
@Jayendra Patil You currently have 120 set as your "maximum Timer Driven Thread Count". Multiply that by the number of nodes in your NiFi cluster to see maximum number of useable threads cumulative across you cluster. Then look at the info bar across the top of your canvas. Does it look like your dataflows is using all these threads you have allocated? You may need to make adjustments to your processor configurations to maximize the thread usage. Look for where you have bottlenecks in your dataflow (queues built up in front of processors). What kind of processors reading from these built up queues? How have they been configured? Just because you allocated more available threads does not mean NiFi processors are going to automatically start using them or even be allowed to use them.
... View more
03-23-2018
02:31 PM
@Mark Lin
Did you look at the MonitorActivity processor? You can set a threshold where it will send out a inactive email message (example message: Have not seen any failed FlowFiles for x amount of time) Then later when data starts failing it will trigger an activity.restored email message (example message: Seeing failed flowfiles now) This processor can be configured to create above messages only once. could fit in to failure loop as shown above. Thanks, Matt
... View more
03-23-2018
12:35 PM
1 Kudo
@Mark Lin Your putMongo processor could route to failure for many reason (may not even be an issue with the MongoDB). For example, a network outage of network issue during transfer. With a stopped dummy processor, you end up stalling delivery of files that would other wise be successful on a retry. I suggest using a little more complicated failure loop flow design. One where you retry the FlowFiles x number of times before triggering an email or directing the Flowfiles to a holding queue. Inside the "Retry Check Loop" process group I have the following flow: Simply leave the "Reset retry counter" updateAttribute processor stopped so that FlowFiles will queue in front of it after 3 attempts to deliver have been made. Running that processor will reset the counter to zero and pass those Flowfiles back out to PutMongo processor again. Here is a template of the above "Retry Check Loop" process group. retry-loop-example.xml You can import this template directly to your NiFi by clicking on the following icons: Hope this helps, Matt
... View more
03-22-2018
08:44 PM
1 Kudo
@Sami Ahmad You will notice a small number in the upper right corner of the putHiveStreaming processor. This indicates that there is an active thread in progress. "IN" shows the number of Flowfiles that were processed off an inbound connection in the last 5 minutes. A number will not be reported here until the process complete (successfully or otherwise). FlowFiles remain on the inbound connection until they have been successfully processed (This is so NiFi can recover if it is dies mid processing). You can collect a nifi thread dump to analyze what is going on with this putHiveStreaming thread. ./nifi.sh dump <name of dumpfile> Thanks, Matt
... View more
03-22-2018
02:55 PM
@ANKIT PATEL A NiFi FlowFile (this is what moves from processor to processor in NiFi) consists of two parts. FlowFile content (Actual data) and FlowFile Attributes (Key/value metadata about the FLowFile). Different processors that create FlowFile generate different attributes which are assigned to the FlowFile. Attributes will contain some things like filename, fileSize, UUID, path, etc... - According to the documentation for the GetFTP processor the following FlowFile Attributes will be written on each FlowFile produced: - The NiFi Expression Language (EL) can then be used to do things with these key/value pairs. For example: setting a specific target directory to write a FlowFile's content. - Assuming you are writing FlowFiles out of NiFi using say putFile: The above EL "${path}" will return the value from the flowfile attribute named "path" and use it as the target directory path to write the content. You can even add to that path if you like: for example: /mynewdir/nifi/${path} The above appends the value of path to the end of /mynewdir/nifi/... Thank you, Matt
... View more
03-22-2018
12:53 PM
1 Kudo
@Utkal Sinha When you start Nifi it kicks off the bootstrap process which then kicks off the NiFi service (there are actually two java processes associated to a running NiFi). Java process one is the bootstrap and it logs to the nifi-bootstrap.log. It is responsible for kicking off the other java process and monitoring it to make sure it has not died. If it does die, this process will restart it automatically. The Main java process logs via the nifi-app.log and if you tail this log after starting Nifi you will see all the things this process does during startup.... (Reading configuration files, building repos, unpacking nars, loading up your dataflows, starting processor components, etc....). The NIFi UI will not be available until all this has completed successfully. The key lines you are looking for in the nifi-app.log are: 2018-03-22 12:46:05,076 INFO [main] org.apache.nifi.web.server.JettyServer NiFi has started. The UI is available at the following URLs:
2018-03-22 12:46:05,076 INFO [main] org.apache.nifi.web.server.JettyServer http://<hostname>:9090/nifi
2018-03-22 12:46:05,083 INFO [main] org.apache.nifi.BootstrapListener Successfully initiated communication with Bootstrap
2018-03-22 12:46:05,083 INFO [main] org.apache.nifi.NiFi Controller initialization took 99393967121 nanoseconds (99 seconds). There may be multiple URLs listed depending on configuration you have listed. Once you see above line telling you NiFi is available at following URL, then you can try to access the NiFi UI. Thank you, Matt
... View more
03-22-2018
12:27 PM
1 Kudo
@ANKIT PATEL Not sure what NiFi processors you may be using in your dataflow here, but take a look at the FlowFile attributes being created on the FlowFiles containing the files from your Database. You could likely use one of those attributes to re-create the directory structure you are looking for when writing the files back out of NiFi. Thank you, Matt
... View more
03-21-2018
06:47 PM
@Mark Lin Zookeeper is responsible for electing both the cluster coordinator and primary node for a nifi cluster. Reason why ZK may elect a new primary node include: Current primary node has not heartbeated to ZK: - possibly because network issues? - possibly because current Cluster Coordinator and/or Primary node is having issues preventing heartbeat form being sent such as Java garbage collection. As a stop-the-world event heartbeats would not be sent out while GC is running. by the time GC ends, ZK may have already elected a new Cluster coordinator and/or primary node. Node would notified of change next time it did successfully talk to ZK So you may never see node actually become disconnected from cluster. Node send heartbeats to current elected cluster coordinator. As long as those heartbeats are making it within configured timeouts, nodes will stay connected in cluster. Thanks you, Matt
... View more