Member since
07-30-2019
3469
Posts
1641
Kudos Received
1018
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 157 | 05-06-2026 09:16 AM | |
| 247 | 05-04-2026 05:20 AM | |
| 236 | 05-01-2026 10:15 AM | |
| 467 | 03-23-2026 05:44 AM | |
| 352 | 02-18-2026 09:59 AM |
10-04-2021
11:12 AM
@Phanikondeti When NiFi is the bootstrap process starts a child process that may take a little time to fully start depending on size fo flowfile_repository and the size of the flow.xml.gz being loaded. You will want to search the nifi-app.log for the following lines: 2021-10-04 18:05:57,212 INFO [main] org.apache.nifi.web.server.JettyServer NiFi has started. The UI is available at the following URLs:
2021-10-04 18:05:57,212 INFO [main] org.apache.nifi.web.server.JettyServer https://<nifi-hostname or IP>:<nifi port>/nifi
2021-10-04 18:05:57,216 INFO [main] org.apache.nifi.BootstrapListener Successfully initiated communication with Bootstrap Until you see these lines, NiFi is still coming up and the UI will not yet be reachable. If you do see these lines, you will want to make sure that your host where you have launched your browser can reach the hostname/IP logged in the above message. You should also check to see which network interface your NiFi bound to on startup if you have multiple interfaces available. If you found this response assisted with your query, please take a moment to login and click on "Accept as Solution" below this post. Thank you, Matt
... View more
09-30-2021
01:17 PM
@Ven5 The individual components are responsible for handling any kind fo timeout related to execution of the components code. The NiFi Connections hold NiFi FlowFiles that the downstream processor component will consume from. Typically the FlowFile being executed upon by the downstream component is not removed from the inbound connection until the execution by that downstream connection is complete and its execution has resulted in the creation of its outbound FlowFile (which may be same as ingest FlowFile depending on what that processor does. This is done to protect against dataloss in the even the NiFi service crashes or shut down while a component is still executing. On NiFi service starting back up, the FlowFile would get loaded back in to the same upstream connection allowing the downstream processor to start over executing on the same FlowFile(s). From you description, it sounds like these custom (not part of Apache NiFi or Cloudera distributions) components are becoming hun while executing against a FlowFile from the upstream connection? When your flow is in this hung state, does the MarkLogic processor show a small number in the upper right corner of the processor indicating it has an active thread? Or are you saying the the MarkLogic processor is not actively executing a thread and is not getting a thread to work on the next queued FlowFile? Did you execute "nifi.sh dump <dump-filename>" to verify there is no MarkLogic processor class threads executing? Is it possible that your entire "Max Timer Driven Thread pool" is consumed by other components on your canvas at times preventing this processor from being able to get a thread to run? If this turns out to not be a thread starvation issue, you may need to reach out to the author of these custom components for suggestions. If you found this response assisted with your query, please take a moment to login and click on "Accept as Solution" below this post. Thank you, Matt
... View more
09-30-2021
06:10 AM
2 Kudos
@DSan You run in to some unique challenges with your specific use case since the properties you want to enable NEL expect Java regular expressions. The reserved characters that would tell NiFi that a NEL statement is being used are special characters in Java Regular expressions. You may want to raise your change/improvement request as a Jira in the Apache NiFi Jira project: https://issues.apache.org/jira/browse/NIFI There may be others in the community that like this idea and have suggestions for working around the challenge I shared above. If you found this response assisted with your query, please take a moment to login and click on "Accept as Solution" below this post. Thank you, Matt
... View more
09-30-2021
05:48 AM
@TRSS_Cloudera Your use case is not completely clear to me. Each Node in a NiFi cluster executes its own copy of the dataflow against its own set fo FlowFiles (FlowFiles are what the NiFI components execute upon). NiFi components can be processors, controller services, reporting tasks, input/output ports, RPG, etc. Each node maintains its own set of repositories. Two of those repositories (flowfile_repository and content_repository) hold the parts that make up a FlowFile. In a NiFi cluster a node will always get elected as the Cluster Coordinator or Primary Node (sometimes one node is elected for both these roles) Which node is elected to either role can change at anytime. Your GenerateFlowFIle processor you have configured to execute on "Primary Node" only will produce FlowFile(s) only on the currently elected primary node. From your description, you dod not cover how your dataflow writes the files to the server that you will then run an ExecuteStreamCommand Python script on. What is the best way to handle producing files that can be accessed by all nodes? Answer: Since each node operates on its own FlowFiles, one node will not have access to FlowFiles on the other nodes. A clearer use case as to why you would want every node processing the same FlowFile might be helpful here. Is there a way to specify the node for a process will be run on? (using “run on primary” is not working as the primary node cycles over the process) Answer: Only processors that are responsible for creating the FlowFile should ever be scheduled to execute on the "Primary Node". Any processor that accepts and inbound connection should always be executing on all nodes. So if Node A is current Primary Node and a FlowFile is produced by a primary node only configured processor, the FlowFile would still be processed downstream in the dataflow even if a primary node change happens. If you found this response assisted with your query, please take a moment to login and click on "Accept as Solution" below this post. Thank you, Matt
... View more
09-27-2021
10:57 AM
1 Kudo
@paygb You are absolutely correct, the Cloudera Manager configuration property within NiFi configurations fro logback.xml does require you to input the entire logbck.xml and then add your modifications. Adding a new logger for the ControllerStatusReportingTask would require adding a logger like this: <logger name="org.apache.nifi.controller.ControllerStatusReportingTask" level="DEBUG"/> You can add this logger in line along with other loggers already in the logback.xml The result would be all logging from this class at a DEBUG level and below would go in to the nifi-app.log. You could optionally add an additional appender (NiFi has three existing appender now for the nifi-app.log, nifi-user.log and nifi-bootstrap.log files). You can copy and modify one of those and the your logger would need to look like this: <logger name="org.apache.nifi.controller.ControllerStatusReportingTask" level="DEBUG" additivity="false">
<appender-ref ref="NEW_APPENDER"/>
</logger> If you found this response assisted with your query, please take a moment to login and click on "Accept as Solution" below this post. Thank you, Matt
... View more
09-27-2021
10:44 AM
@DSan Not all component (processor, controller service, reporting task, etc.) support the use of the NiFi Expression Language (NEL). Dynamic properties added to the ExtractText processor do not support NEL. For any existing property, you can float your cursor over the question mark next to the Property name to see if it support Full NEL, Partial NEL, or no NEL. When adding a dynamic property to a processor that supports this capability, you will also see icon depicting NEL support: These dynamic properties in the ExtractText use Java Regular Expressions to extract text from the content of the inbound FlowFile. If you found this response assisted with your query, please take a moment to login and click on "Accept as Solution" below this post. Thank you, Matt
... View more
09-24-2021
11:14 AM
@Submar MiNiFi is not capable of receiving data over S2S being sent to it by another NiFi instance running a Remote Process Group (RPG). It is not something that is supported nor recommended even if it was. S2S was built as a way to smartly distribute FlowFiles being sent from one NiFi instance to a cluster of NiFi instances. Since MiNiFi is not a cluster, the overhead associated with fetching S2S details makes no sense. Now you could set up a RPG on your MiNiFi that pulls from a remote outputPort on your NiFi, but again, that is not an ideal or recommended solution. It also would not result in same data landing on every MiNiFi since data is consumed by which ever MiNiFi connects first to the NiFi output port. If the goal here is to send "FlowFiles" (FlowFile metadata/attributes and FlowFile content) from NiFi to MiNiFi, the better option would be to use the ListenHTTP processor on each of your MiNiFi instance and then have a separate PostHTTP processor for each of your 3 MiNiFi instances. Then have the same FlowFiles routed to each of those three postHTTP processors. The PostHTTP processor also has a property "Send as FlowFile" which when set to true will send the entire FlowFile and not just the content to the target MiNiFi. The flows on all 3 of your MiNiFi instances would still be identical with each listening on the same configured listenHTTP port. If you found this response assisted with your query, please take a moment to login and click on "Accept as Solution" below this post. Thank you, Matt
... View more
09-24-2021
10:48 AM
@samarsimha The zookeeper client version used in Apache NiFi versions prior to 1.13 do not support TLS. You'll need to upgrade your NiFi 1.13 from 1.10 to take advantage of the new TLS connectivity to Zookeeper. If you found this response assisted with your query, please take a moment to login and click on "Accept as Solution" below this post. Thank you, Matt
... View more
09-24-2021
10:34 AM
1 Kudo
@Cloud_era Let me see if I understand your use case fully. You are using the listFTP and FetchFTP processor or the GetFTP processor to pull in files from your FTP server. In the ListFTP or GetFTP processor you have configured the path as "/user/Mahesh/test" and have set "Search Recursively" to true so that you pull files found in sub directories including " 202100923 and 20210924". If you are using the GetFTP processor, you should switch to the List and Fetch processor if running on a NiFi cluster. Also keep in mind that the List/Fetch FTP processors are much newer and provide more configuration options/capabilities not found in the legacy GetFTP processor. The GetFTP processor creates a FlowFile Attribute "absolute.path" that contains the full path to the file that is consumed. The ListFTP processor creates a FlowFile Attribute "path" that contains the full path to the file that will consumed by fetchFTP. So you end up in your example with the above attributes set with: /user/Mahesh/test/202100923 /user/Mahesh/test/20210924 using the NiFi Expression Language statement "${absolute.path:getDelimitedField('4','/')}" and above examples, what you would have returned is "test" since that is the 4 delimited field. Field 1 = blank Field 2 = user Field 3 = Mahesh Field 4 = test Field 5 = 20210924 Field 1 is blank because you set your delimiter as "/" and the string starts with a "/". So setting this to "${absolute.path:getDelimitedField('5','/')}" based on your examples would return either "202100923 or 20210924". The problem here is what if your absolute.path values are not always 4 directories deep. for example: 1. /user/Mahesh/test/202100923/subdir1 2. /user/Mahesh/test/20210924/subdir1 3. /user/Mahesh/test/202100923/subdir1/subdir2 4. /user/Mahesh/test/20210924/subdir1/subdir2 Your expression would still return just "202100923 or 20210924". I don't know how or where you are using this folder information later in your dataflow(s), so hard to give recommendations on what to do. But assuming new example i gave above, here are some other NEL options: ${absolute.path:substringAfterLast('/')} would return: 1. subdir1 2. subdir1 3. subdir2 4. subdir2 ${absolute.path:substringAfter('/user/Mahesh/test')} 1. /202100923/subdir1 2. /20210924/subdir1 3. /202100923/subdir1/subdir2 4. /20210924/subdir1/subdir2 If you found this response assisted with your query, please take a moment to login and click on "Accept as Solution" below this post. Thank you, Matt
... View more
09-24-2021
09:24 AM
1 Kudo
@hegdemahendra That is a possibility. The 'Maximum Timer driven thread count' settings sets a thread pool that is used by the NiFi controller to hand out threads to dataflow components when they execute. The general recommendation is setting this value to 2 to 4 times the number fo cores present on a single NiFi instance (If you are running a NiFi cluster, this is setting is applied per node and not a max across entire cluster). This does not mean that you can not set the thread pool much higher like you have, but you need to do that cautiously and monitor CPU usage over extended periods of time as your dataflows may fluctuate between periods of high and low CPU demand. It is the cycles of high CPU usage that can become problematic. What you have in your scenario is 8 cores trying to service threads (up to 300) for your dataflows, NiFi core level threads (not part of that thread pool), and threads associated to any other services on the host and the OS. So i suspect you have many thread often in CPU wait, waiting on their time on a core. You could also have a scenario where one thread is WAITING on another thread which is also WAITING on something else. So as the system cycles through all these threads you end up with periods of time of what appears to be a hung system Your dataflow components used and how they are configured along with volumes of data play in to the overall CPU usage and length of time a thread is actively executing. Interesting that you stated that all logging stops as well. The fact that all logging stops, makes we wonder if with so many threads, some core thread get left in CPU wait so long they impact logging. Have you tried getting thread dumps from NiFi when it is in this hung state? Examining a series of thread dumps might help pinpoint if you get in to state were you have threads waiting on other threads that are not progressing. You may also want to take a close look at disk IOPS for all NiFi repos which can affect performance with regards to how long a thread takes to complete. Also keep in mind that large dataflows and large volumes of FlowFiles can lead to a need for many open file handles. Make sure your NiFi Service user has access to a LOT of file handles (999,999 fo example). Your dataflows may also be spinning off a lot of processes, so make sure your NiFi service user also has a high process limit. Hope this helps you look for areas to dig in to your issue, Matt
... View more