Member since
07-30-2019
3406
Posts
1622
Kudos Received
1008
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 152 | 12-17-2025 05:55 AM | |
| 213 | 12-15-2025 01:29 PM | |
| 149 | 12-15-2025 06:50 AM | |
| 264 | 12-05-2025 08:25 AM | |
| 443 | 12-03-2025 10:21 AM |
03-04-2020
09:57 AM
@MahipalRathore The bulletin is not going to include anything more than what would be found in the nifi-app.log. The Bulletin if produced as a result of some failure while processing a FlowFile, will have details about the FlowFiles assigned UUID and filename as well as it size relative to content claim in which the FlowFiles content can be found. You could the use NiFi data provenance to obtain details on the FlowFile including all its FlowFile attributes. Some bulletins are for exceptions that occur unrelated to a FlowFile directly and thus will contain no FlowFile info. But if a FlowFile is routed to failure as a result of for example an exception thrown by the putHDFS processor, details on that FlowFile record should be included in the bulletin. Note: If the bulletin is being produced by a processor that creates a FlowFile, no FlowFile will have been created, so there is no FlowFile created from which to get FlowFile details. Hope this helps, Matt
... View more
03-04-2020
09:42 AM
@saivenkatg55 Analyzing thread dump output is a cumbersome and time consuming process for which I cannot do for you here. If you have a support contract with Cloudera, I recommend opening a support case with them and they can help you get to the root cause of your issue. You can obtain a thread by executing the nifi.sh command as follows: ./nifi.sh dump /<path to write dump output>/nifinode<num>-dump-<num>.txt First you want to identify which node in your NiFi cluster has the long running or hung thread. To do this you can use the NiFi Summary Ui found under the global menu. Here is an example: In above I can see I have an ExecuteStreamCommand processor that is in state of stopped, but shows (2) active threads. If i click on the three stacked boxes icon to the far right side of that row, I will get a per node break down for the processor: From above break down I can see both active threads are on my cluster node 44. This is the node where I would take my 5 thread dumps from for inspection. Hope this helps you identify where your issue is. Matt
... View more
03-04-2020
09:28 AM
@NickH If the listSFTP processor fails during listing, no FlowFiles should have been output and the state should not have been updated. Are you seeing failure during the listSFTP processor execution? If you are seeing FlowFiles getting routed to one of the failure relationships from the FetchSFTP processor, you can always loop that connection back to the same FetchSFTP processor so another attempt is made to fetch the content for that FlowFile. There currently does not exist a may to clear just a single cached entry from the DistributedMapCacheServer controller service. I encourage you to open an Apache NiFi Jira for a new processor that can remove cache entries. https://issues.apache.org/jira You could try looking at this example for removing a cache entry via a script: https://gist.github.com/ijokarumawak/14d560fec5a052b3a157b38a11955772 Hope this helps, Matt
... View more
03-03-2020
02:19 PM
@Gubbi Bottom line is that NiFi processor in a dataflow do not execute sequentially. They each execute based on their configured run schedule. Each processor that is given a thread to execute can potentially utilize a cpu until that thread completes. Generally speaking most threads are very short lived resulting in ver minimal impact on your systems CPU. In your dataflow, I would expect that the FetchFile (actually retrieving the content of your 3 FlowFiles) and the putS3 (reading and sending content of your 3 FlowFiles) would hold threads the longest. While both were executing at the same time, it could be using 200% (2 cpus). Also keep in mind that NiFi core is using threads as well. so seeing NiFI use over 100% is pretty much what I would expect anytime it is not sitting idle. Hope the information provided helps you, Matt
... View more
03-03-2020
02:02 PM
@Alexandros Securing NiFi and NiFi-registry will always require TLS certificates. There are then numerous options for authentication in to those secured sevices. Both NiFi and NiFi Registry both offer: 1. User based certificate authentication. You would need to create a user certificate for each user who will access NiFi or NiFi-registry 2. Spnego - This requires that you have a KDC and your users have Spnego enabled in their browser 3. LDAP/AD user authentication. You would need to have your own LDAP/AD setup which you can use to authenticate your users. 4. kerberos login provider. This would require you to setup your own KDC as well. NiFi also supports OpenID connect compatible service based authentication; however, the same is not offered in NiFi-Registry. The jira for adding OpenID connect capability to NiFi-Registry is still open here: https://issues.apache.org/jira/browse/NIFIREG-313 So based on options above and depending on the number of users you want to give access to, your best options are either by issuing each of your users a user/client certificate or setting up a simple LDAP server or KDC server. Hope this helps, Matt
... View more
03-03-2020
01:48 PM
@domR Option 2: The good news is that as of Apache NiFi 1.10 you can create remote input and output ports at any process group level (https://issues.apache.org/jira/browse/NIFI-2933). This is probably your best option as it handles distribution across all available nodes in your cluster. IF a node goes down the Remote Process Group (RPG) will not only distribute FlowFiles to remaining nodes that are available. Option 3: Here you would use a PostHTTP processor and ListenHTTP processor. The downside to this option over option 2 is that the PostHTTP processor can only be configured to send to one URL endpoint. So if target NiFi node is down, it will fail to send. If course you could connect failure from one PostHTTP to another and so on adding as many postHTTP processors as you have nodes, but this does not scale well. Hope this helps, Matt
... View more
03-03-2020
01:25 PM
1 Kudo
@asfou NiFi does not contain any processors that support Hive version 2.x. The latest versions of Apache NiFi offer Hive 1.x and Hive 3.x client based processor components. To support Hive 2.x version, you may need to build your own custom processors built using the Hive 2.x client. Matt
... View more
03-03-2020
01:20 PM
@saivenkatg55 By "hung", you mean the processor shows active thread(s) in the upper right corner continuously and never produce any ERROR or WARN log output? Active threads show as a small number in parenthesis (1). If that is the case, you would need to get a series of thread dumps from the NiFi node where the active thread is executing. 5 dumps taken with 5 minutes between each thread dump is usually pretty good. Then you will want to inspect those thread dumps for the active thread associated to your hung processor type. Look to see if the thread changes between any of the thread dumps. If the thread output in the dumps is changing then the thread is not hung, but rather just taking long time to execute. If the thread is identical across all thread dumps, you'll want to look at the thread to see fi it identifies what the thread is waiting on. Hope this helps, Matt
... View more
03-03-2020
01:14 PM
@TVGanesh The following statement is not accurate: "I read that typically the number of concurrent tasks is roughly equal to 2 or 4 times the cores" The general recommendation is that the "Max Timer Driven Thread Count" is set to 2 to 4 times the number of cores. This setting is all relative to the other process running on your server (or you mac in this case). The "Max Timer Driven Thread Count" setting establishes the max number of threads that can be handed out to requesting components that want to execute. (This is a soft limit, there are some scenarios where a thread can be obtained even when active threads executing has reached this configured max count). The "Max Timer Driven Thread Count" is configured under the NiFi Global Menu --> Controller Settings --> General (tab). When you adjust this value, monitor your cpu usage and adjust accordingly. Keep in mind that adding additional concurrent tasks to your processor will not improve the processing of a single FlowFile. The concurrency allows the processor to work on different FlowFiles pulled from the inbound connection queue concurrently. In the case of the ExecuteStreamCommand processor, the ability to execute the same command concurrently also is dependent of the command you are executing. A small number will be displayed in the upper right corner of the processor illustrating the number of currently active threads in use by that processor at time of last browser refresh (NiFi browser auto refresh default is every 30 seconds). Hope this helps, Matt
... View more
03-02-2020
07:03 AM
@saivenkatg55 My first suggestion is that you always make sure you have configured the "validation query" property when using the DBCPConnectionPool controller service. When you enable the DBCPConnectionPool it does not immediately establish the configured number of connections for the pool. Upon receiving the first request from a processor for a connection from the DBCPConnectionPool, the entire pool will be created (by default configuration pool would consist of 8 connection). Subsequent requests by processor to for a connection results in one of the previously established connections being handed off to the requesting processor. If and only if the validation query is configured, the connection is first validated to make sure it is still good before being passed to requesting processor. So often times we see this issue when validation query is not set and the connections are no longer any good. Connections can go bad for many reasons (network issues, idle to long, etc...) You should keep your validation query very simple so it can execute very fast. "select 1" is often used to validate a connection. Also make sure you are using the latest version of NiFi. There have been numerous bug fixes around the processors and DBCPConnectionPool between Apache 1.0 and 1.11. Hope this helps, Matt
... View more