Member since
07-30-2019
2908
Posts
1443
Kudos Received
845
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
16 | 04-22-2024 06:13 AM | |
138 | 04-17-2024 11:30 AM | |
98 | 04-16-2024 05:36 AM | |
64 | 04-15-2024 05:31 AM | |
135 | 04-03-2024 05:59 AM |
10-21-2022
12:56 PM
@rangareddyy What is important to understand is that the NiFi component processors are not being executed by the user authenticated (assuming secured NiFi) into NiFi, but rather by the NiFi service user. So let's say that your NiFi service is owned by a "nifiservice" linux account. Whatever umask is configured for that user will be applied to directories and files create by that user. Now if your script is using sudo, it is changing the user that executes your script resulting in different user ownership and permission from the "nifiservice" user. Subsequent component processors will also execute as the "nifiservice" user and then not have access to those files and directories. So you'll need to take this in to account as you built your scripts. Make sure that your scripts are adjusting permissions on the directory tree and files as needed so your "nifiservice" user or all users can access the files needed downstream in your dataflows. So in yoru case it sounds like your script executed by ExecuteScript processor is creating a sh file not owned by the "nifiservice" user or does not have execute permission set on it. The ExecuteStreamCommand processor will attempt to execute the sh command on disk as the "nifiservice" user only. If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped. Thank you, Matt
... View more
10-21-2022
12:41 PM
1 Kudo
@Jagapriyan As a daily job, i may suggest you tackle this differently. You know your source files are written between 8am - 9am each day. So i would configure your listSFTP to run on a cron schedule so it runs every second from 9am-10am to make sure all files are listed. Then knowing that your files may number 90+ (unknown on max) , I would configure your "Min Num of Entries" to some value you know the count will never reach. Make sure "Max Num Entries" is set to a value higher than that. Then configure the "Max Bin Age" to some time 30 mins? What this does is allow MergeContent to continue to allocated FlowFiles to a bin for 30 minutes at which time the bin is forced to merge even if the min value has not be reached. Doing this makes sure you get only one FlowFile out per bin per node. That single FlowFile can then be used to trigger your putEmail used for notification. Additionally, the merged FlowFile will have an attribute "merge.count" added that you can use in your email body to report number of FlowFiles that were ingested. If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped. Thank you, Matt
... View more
10-21-2022
12:28 PM
@Fredi A screenshot of the configuration of your UpdateAttribute processor including main configuration and configuration in the "Advanced" UI would be very helpful in understanding your setup and issue. Thanks, Matt
... View more
10-21-2022
12:23 PM
1 Kudo
@DGaboleiro That is not me as the assignee to jira https://issues.apache.org/jira/browse/NIFI-8043. But that Matt is an awesome guy @mburgess. Thanks, Matt
... View more
10-21-2022
12:21 PM
@RRosa I am not clear on what you mean by "migrating the flow files"? A NiFi FlowFile is the object that is traversed via connections between NiFi component processors on the NiFi canvas. Are you talking about migrating your actively queued FlowFiles from NiFi cluster 1 (Apache NiFi 1.12.1) to NiFi cluster 2 (Apache NiFi 1.17.0)? Or are you talking about migrating the flow.xml.gz file (contains everything you have configured on the canvas of yoru NiFi) from old cluster to new? General guidance for upgrading Apache NiFi can be found in the admin guide here: https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#upgrading_nifi The only thing I see NOT covered in that guidance is the preservation of component state. Within a cluster Component state may be stored depending on component in either a local state directory on each node (each node holds only state for that node) or stored in cluster state (written to ZK and shared across all nodes). Now if you are installing the new version of NiFi on the same hosts where the old NiFi nodes were running, simply preserve the state configuration and the new nodes when started with a copy of the flow.xml.gz will continue to read and use same state. Same goes for new nodes using the same external ZK that the previous nodes used (stop old before starting new hosts). While the Documentation recommends that you process out all queued FlowFiles from cluster 1 before starting cluster 2 that is not required. If new nodes point to same content, flowfile, and provenance repositories as previous node that data will get loaded back in on startup and processing continue where it left off. Remember that each nodes repositories are unique to that node (meaning you can't combine them and they don't all contain the same content). Another thing to review is the release notes. https://cwiki.apache.org/confluence/display/NIFI/Release+Notes You'll want to review all the release notes between 1.12 and 1.17. Apache NiFi is known to deprecate and remove some components (processors, controller services, reporting tasks, etc) from time to time. You'll want to check to see if any components you use in your current dataflows are being removed. Additional some components may have changed typically resulting in additional properties being added. When you start with the newer version of NiFi, it will load your existing flow.xml.gz (1.17 will actually generate a flow.json.gz file from your flow.xml.gz) and upgrade all your components to use the newer 1.17 version of the component classes. So you'll want to review you flow after upgrade to make sure none of your components that were previous valid have become invalid because new property exist that must be configured. NOTE: 1.17 will use start using the flow.json.gz once upgrade, as the flow.xml.gz format is deprecated. If you found this response assisted you with your query, please take a moment to login and click on "Accept as Solution" below this response. Thank you, Matt
... View more
10-19-2022
08:25 AM
@orekxl @biblio_gr The following community article will help you understand what really happens when a user chooses to click on "terminate" on a stopping NiFi processor with active threads" https://community.cloudera.com/t5/Community-Articles/Understanding-NiFi-s-quot-Terminate-quot-option-on-running/ta-p/355433 If you found this assisted you with your query, please take a moment to login and click "ACCEPT as Solution" below this response. Thank you, Matt
... View more
10-19-2022
08:20 AM
2 Kudos
The intent of this article is cover exactly what happens when a user clicks the "terminate" button on a processor component that has an actively running task. Before we can discuss the "terminate" option, we need to understand a few basics about the NiFi application and a bit of history: 1. NiFi is a java application and the execution of any component (processors, controller service, reporting tasks, funnel, input/output ports, etc) happens within that single Java Virtual Machine (JVM) process. NiFi does not create a child process fro the execution of each component. 2. Since NiFi operates within a single JVM, it is not possible to "kill" a thread for an individual component without killing the entire JVM. 3. NiFi consists of well over 400 unique components and many of them are not executing native NiFi code. Many use client libraries not managed or controlled by NiFi. Others can be configured to execute command external to NiFi (ExecuteStreamCommand, ExecuteProcess, ExecuteScript, etc). Processors that invoke something external to NiFi's code base will result in a child process being created with its own pid. Keep in mind that processors of this type do not limit what externally is being invoked so take a generic approach to handling those child processes. The JVM invokes the external command and waits for it to respond complete. 4. Historically NiFi did not offer a terminate option since killing a thread in the NiFi JVM wis not possible. So when a component misbehaved (usually do to an issue external to NiFi code like network, client library hung, external command hung, etc), that NiFi component processor would get stuck just with the JVM thread waiting on that client library or external process to return. As such, the processor's concurrent task JVM thread is blocked. While you could select to stop the processor that would not help users get past the hung or long running thread. NiFi processors transition to a "stopping" state where it will remain until that library or task it is waiting on completes. Until that happens, users would not be able to modify the configuration or restart the component. This meant for truly hung issues the component would be blocked until the NiFi JVM was restarted. 5. As a result of the inconvenience/impact a hung thread causes, NiFi introduced the "terminate" option on a "stopped" component with an active thread. What Actually happens when a user clicks "terminate": 1. "Terminate" is only possible when after a processor has been asked to stop and that stopped processor still has associated JVM thread running. 2. Since we know that killing a JVM thread is not possible without killing the entire JVM process (NiFi), the "terminate" option takes a different approach. When a processor executes, it is doing so typically in response to inbound queued FlowFile as the trigger. That means the inbound FlowFile is tied to the JVM thread that is executing. When the thread completes, that FlowFiles (or modified, cloned, new FlowFile depending on processor function) is moved to the appropriate outbound relationship of the processor. 3. So what the "terminate" function really does is releases the FlowFile associated to that running JVM thread back to the inbound connection, makes request to client library or external command to abort/exit, and then isolates that thread so that if it does actually complete post terminate, all returns are just sent to null. 4. When "terminate" has been selected, the UI will render the processors active threads differently to indicate if the processor has JVM threads that have been terminated but are still active. NOTE: The number within the parenthesis denotes the current number of terminated threads still active. 5. If the client or external command responds to the request to exit, the active "terminated" thread will disappear. If not, it will continue to exist until thread finally completes or the entire NiFi JVM is restarted. NOTE: A terminated thread has little impact on resources since a hung thread isn't consuming CPU. Now a long running CPU intensive thread may have impact. 6. Now that this "terminate" JVM thread has been isolated and any FlowFile(s) tied to that thread have been released to originating connection, users can modify the processor configuration and start the component processor again. When started again, the processor will execute again on the FlowFile(s) that once belonged to the terminated thread. So no dataloss is incurred as a result of using "terminate". The "terminate" capability allows users to move on without needing to restart their NiFi JVM, thus reducing downtime and impact to other dataflows running on the NiFi canvas. If you have a processor that constantly has hung process issues or has very long running threads, it is time to start looking at your source FlowFile(s), processor configuration, external command, or external service the processor may be waiting for a response from as possible sources of the issue. Reference: Apache NiFi Terminate documentation
... View more
Labels:
10-03-2022
05:54 AM
1 Kudo
@leandrolinof I see no reason why using UpdateAttribute to establish the needed path and filename values for FetchSFTP processor would not work. FetchSFTP has no dependency on using ListSFTP. ListSFP just serves as a mechanism for obtaining a list of Files from a target SFTp server and recording state. ListSFTP simply creates a FlowFile with the needed attributes set for each File found on the target. So if you have another method build that can get set those attributes, then you are good to go. If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped. Thank you, Matt
... View more
09-30-2022
02:44 PM
@Jagapriyan Your described flow above does not mention the mergeContent processor which is what would be needed to merge multiple FlowFiles with matching attributes values into 1 output FlowFile. Share your MergeContent processor configuration. Additionally the ListSFTP processor does not download the content of the files form the remote server. It is only used to list the files on the remote server and set attributes on the FlowFile that would be used by the FetchSFTP processor to actually download the content. How do you know when you have all the files for a given state? Is this a continues feed of files? Is this a daily job? While file count is different per state, is count same per state? What is the highest count and lowest count? Thanks, Matt
... View more
09-30-2022
02:35 PM
@Kushisabishii What are you seeing in the nifi-user.log when you make this import attempt? You may be getting the 403 because the user is not authorized properly to perform the import call. If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped. Thank you, Matt
... View more