Member since
07-30-2019
1975
Posts
1173
Kudos Received
545
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
124 | 04-12-2021 06:35 AM | |
91 | 04-12-2021 06:16 AM | |
71 | 04-12-2021 05:49 AM | |
240 | 03-29-2021 11:35 AM | |
313 | 03-24-2021 06:35 AM |
04-16-2021
06:06 AM
@Vickey The file filter property of the unpackContent processor takes a java regular expression and can be used when unpacking tar or zip file. In your unpackContent processor, set the "Packaging format" to either "ZIP" or "TAR" based on what package format is used by your source file. The set a java regular expression such as below to extract only files within that package where the filename ends with the .csv, .txt, or .xml extensions: .*\.(txt|xml|csv) Hope this helps, Matt
... View more
04-13-2021
10:50 AM
@Law While Jolt transforms are not something NiFi specific and not something I am strong with myself, you may find these links helpful to you: https://intercom.help/godigibee/en/articles/3096940-simple-if-else-with-jolt https://community.cloudera.com/t5/Community-Articles/Jolt-quick-reference-for-Nifi-Jolt-Processors/ta-p/244350 Hope this helps, Matt
... View more
04-12-2021
06:35 AM
1 Kudo
@ram_g With all 100 FlowFiles committed to the success relationship of your custom processor at the same time, how do we want NiFi to determine their priority order? If you can out put some attributes on each FlowFile that your custom processor is creating, those attribute values could be used set processing order downstream. Hope this helps, Matt
... View more
04-12-2021
06:16 AM
1 Kudo
@john I have a HDF 3.4.1.1 cluster (Based off NiFi 1.11.4) setup and with PGs version controlled and can change processors from started to stopped to disabled without it triggering a local change. However, HDF 3.4.1.1 ships with NiFi-Registry 0.3 and not 0.8. I have another HDF 3.5.2 cluster (based off NiFi 1.12.1) and ships with NiFi-Registry 0.8. In that cluster, I can also change a processor from start to stop to disabled and it does trigger a local change. I see someone filled a Jira about this change in behavior: https://issues.apache.org/jira/browse/NIFI-8160 The tracking of Enabled and Disabled State in NiFi-Registry was added as part of: https://issues.apache.org/jira/browse/NIFI-6025 Hope this helps, Matt
... View more
04-12-2021
05:49 AM
@AnkushKoul Since the 30 seconds since last execution has past, the processor is available to be immediately scheduled once a thread becomes available. So second thread would not wait till 60 seconds. This setting is minimum wait between executions. Other factors come int play that can affect component execution scheduling. NiFi hands out threads to processors from the Max Timer Driven Thread Count resource pool set via Controller Settings under the global menu in the upper right corner. Naturally you will have more components on your canvas then the size of this resource pool (which should be set initially to only 2-4 times the number fo cores you have on a single node since setting applies per node). NiFi will hand these available threads out to processors requesting CPU time to execute. Most component threads are in the range of milliseconds of execution, bit some can be more resource intensive and take longer to complete. Before increasing this resource pool, you should monitor the CPU impact/usage with all your dataflows running. Then make small increments if resource exist. Hope this answers your questions. If, so please take. moment to accept the answer(s) that helped. Matt
... View more
04-12-2021
05:35 AM
@Masi The exception does not appear related to Load Balance connections in NiFi. LB Connections utilize NiFi S2S in the background which does not use MySQL. Matt
... View more
04-12-2021
05:28 AM
1 Kudo
@Jarinek With the little details provided, it sounds like this exception is related to storing the peers details returned by a Remote Process Group (RPG) fetching Site-To-Site (S2S) details from a target NiFi cluster. Did you run out of disk space on any of your local disk or on the disk of the target NiFi cluster of your RPG? If so, did you free up space and restart your NiFi to see if the repository could checkpoint and correct the issue? Hope this helps< Matt
... View more
04-08-2021
08:51 AM
@John_Wise @TimA Let me make sure I understand exactly what change you are making. I have Process Groups (PG) that are version controlled in my NiFi Registry. I have both a NiFi 1.11.4 and NiFi 1.12.1 clusters setup. If I import a flow from registry and then modify the state (start, stop, disable, enable) of any processor, my PGs do not change to say local changes exist. The state of a processor does not track as a local change. I suspect some other local change is being made in addition to state change. If you right click on the PG and under "Version" from displayed context menu select "show local changes" what are the tracked changes being reported? Hope this helps, Matt
... View more
04-08-2021
08:41 AM
@AnkushKoul Since you only have 1 concurrent task configured, while that concurrent task thread is in use, another thread can not be started. So even with a runs schedule of 0 secs, another task can't start until the thread tied to that concurrent task is released making it possible for another execution to happen. At 30 secs it will only be allowed to execute again 30 secs later if there is an available concurrent task not in use already on the processor. Setting 30 seconds can create an artificial delay in your dataflow when tasks takes less than 30 seconds to complete. Note: While the processor is executing a task you will see a small number displayed in the upper right corner of the processor.
... View more
04-08-2021
08:35 AM
1 Kudo
@ram_g @Magudeswaran Guaranteeing order in NiFi can be challenging. As far as the prioritizers on the connection go: FirstInFirstOutPrioritizer: Given two FlowFiles, the one that reached the connection first will be processed first. This looks at timestamp recorded for FlowFile when the FlowFile entered this connection. In your case, you have a custom processor that takes in 1 FlowFile and may output 1 or more FlowFiles. Typically with such processors all output FlowFiles are committed to the downstream connection at the same time which makes using this prioritizer a challenge if that is the case. But generally processors that produce multiple FlowFiles from a single FlowFile also set FlowFile attributes that identify the fragments. Take a look at the attributes written by the SplitRecord processor as an example. OldestFlowFileFirstPrioritizer: Given two FlowFiles, the one that is oldest in the dataflow will be processed first. 'This is the default scheme that is used if no prioritizers are selected'. This looks at the FlowFile creation timestamp. In your case, you have a custom processor that takes in 1 FlowFile and may output 1 or more FlowFiles. Are all output FlowFiles created as new? Now you may want to look at the following prioritizer: PriorityAttributePrioritizer: Given two FlowFiles, an attribute called “priority” will be extracted. The one that has the lowest priority value will be processed first. Note that an UpdateAttribute processor should be used to add the "priority" attribute to the FlowFiles before they reach a connection that has this prioritizer set. If only one has that attribute it will go first. Values for the "priority" attribute can be alphanumeric, where "a" will come before "z" and "1" before "9" If "priority" attribute cannot be parsed as a long, unicode string ordering will be used. For example: "99" and "100" will be ordered so the flowfile with "99" comes first, but "A-99" and "A-100" will sort so the flowfile with "A-100" comes first. Assuming your custom processor writes some unique attribute(s) to the FlowFiles it outputs, you may be able to use those attributes to enforce ordering downstream via above prioritizer. *** Also keep in mind that NiFi connection are "soft" limits. If your ere to set backpressure object threshold on connection outbound from your custom processor to 1 and on execution of your processor it produced 6 FlowFiles, they would all get committed to that connection. Only then does backpressure kick in and prevent your custom processor from being scheduled again until queue drops to below the backpressure threshold again. This is a good way of making sure only one "batch" of FlowFiles lands in the downstream connection at a time, but will not help enforce the order of the FlowFiles in that batch. Hope this helps, Matt
... View more
04-08-2021
08:02 AM
@AnkushKoul By only having 1 concurrent task configured, you are affectively forcing that task to complete before the next can execute. With your RunSchedule set to "30 sec" NiFi will only schedule this component to execute every 30 seconds. So if task1 takes only 20 seconds to complete, task 2 would not get started until 10 seconds later. If you set RunSchedule to default 0 secs, that tells NiFi to schedule this component to execute as often as possible. So as soon as task 1 completes task 2 will then execute. You can think of concurrent tasks as a way to parallelize execution within a single component. So instead of having two processors you have one with 2 concurrent tasks. Each task gets schedule independent (parallel) of the other concurrent task(s). Each concurrent task will work on different FlowFile(s) from inbound connection(s). Some components will not support multiple concurrent tasks (the component source code would limit it to 1) So to me it sounds like you want tasks to kick off as fast as possible one after another. IN that case leave RunSchedule at 0 secs and concurrent tasks to 1. If you found this answer addressed your question, please take a moment to accept the answer. Hope this helps, Matt
... View more
04-05-2021
05:16 AM
@Masi There have been many bug fixes in NiFi 1.9, 1.10, and 1.11 to LB connections. No particular bug comes to mind that explains what you are seeing. Is you NiFi secured? If so are you have authorization or SSL exceptions in your NiFi logs that may explain issue with 1 node sending FlowFiles to another node?
... View more
04-01-2021
02:42 PM
@nmargosian The swap file in questions would contain FlowFiles that belong to a connection with the UUID of: 7cde3c5c-016b-1000-0000-00004c82c4b2 From your Flow Configuration history found under global menu icon in upper right corner, can you search for that UUID to see fi there is any history on it? - Do you see int existing at some point in time? Do you see a "Remove" event on it? - If you see it in history, but there is no "Remove" action, but it is now gone, then the flow.xml.gz loaded on restart did not have this connection in it. If this connection no longer exists in the canvas, NiFi can not swap these FlowFiles back in. Everything you see on the canvas resides in heap memory and is also written to disk within a flow.xml.gz file. When you stop and start or restart NiFi, NiFi loads the flow back in to heap memory from the flow.xml.gz (each node has a copy of this flow.xml.gz and all nodes must have matching flow.xml.gz files or nodes will not rejoin the cluster. Things I suggest you verify... 1. Make sure that NiFi can successfully write to the directory where the flow.xml.gz file is located. Make a change on the canvas am verify the existing flow.xml.gz was moved to the archive directory and a new flow.xml.gz was created. If this process fails then when NiFi is restarted any changes you made would be lost. For example the connection was created and data was queued on it, but NiFi failed to write new flow.xml.gz because it could not archive current flow.xml.gz (space issues, permissions/ownership issues...etc). This would block NiFi from creating a new flow.xml.gz, but the flow in memory would have your current flow still. All these directories and files should be owned and readable/writable by your NiFi service user. 2. Did some point in history did your cluster nodes fllows mismatch. For example, a change was made on the canvas of a node that was currently disconnected from the cluster. Then that nodes flow was copied to the other nodes to make all nodes in sync. 3. Was an archived flow reloaded back to NiFi at some point. This requires manual user action to copy a flow.xml.gz out of archive and used to replace the existing flow.xml.gz. NiFi restarts will not just remove connections from your dataflows. Some other condition occurred and it may not have even been recent. If you hav enough app.log history covering multiple restarts, do you see this same exact warn log line with each of those restarts. Hope this helps, Matt
... View more
03-29-2021
11:35 AM
@Garyy You are correct. Since NiFi does not use sessions as mentioned in my last response, the client must authenticate every action performed. When you "login" to NiFi, the result is a bearer token being issued to the user which your browser stores and reuses in all subsequent request to the NiFi endpoints. At the same time a server side token for your user is also stored on the specific NIFi node you logged in to. The configuration in your NiFi login provider dictates how long those bearer tokens are good for. With your setting of 1 hour, you would be forced to re-login again every hour. Thanks, Matt
... View more
03-29-2021
11:30 AM
@vi The more details you provide, the more likely you are to get responses in the community. Since i know you are dealing with GetFTP and files being consumed by that processor eating away at your limited network bandwidth, I can offer the following feedback: I assume the ~60 GB of files consumed by your GetFTP every hour is many files? The GetSFTP processor is deprecated in favor of the ListSFTP --> FetchSFTP processor design. SFTP protocol is not a cluster friendly protocol for a NiFi cluster (and you should always have a NiFi cluster for redundancy and load handling). Running the GetSFTP or ListSFTP on all nodes in the cluster would result in every node competing fo the same files. These processor would always be scheduled for "primary node" only (primary node option does not exist in a standalone NiFi setup). The ListSFTP processor does not return the content of the listed files from the SFTP processor. It simply generates a list of files that need to be fetched from the target SFTP server. Each of those listed files becomes its own FlowFile in NiFi. The ListSFTP is then connected to a FetchSFTP processor which will fetch the content for each of the FlowFiles produced by the ListSFTP. The connection between the ListSFTP and FetchSFTP processor would be configured to load balance the FlowFiles to all nodes in your cluster. This spread out the work load of returning that content across all your cluster nodes. While there is not configuration option in the GetSFTP or FetchSFTP processor to limit bandwidth (feel free to open an apache NiFi Jira in the community for such an improvement), the listSFTP to FetchSFTP processor does give you some control. You can configure the run schedule on the FetchSFTP to some value other then default 0 secs (which means run as often as possible) to some other value which would place a pause between each execution (between each FlowFile fetching its content). While the fetch of the Content will still happen as fast as allowed, this would place a break between each fetch giving other operations time on your constrained network. Hope this helps, Matt
... View more
03-29-2021
11:24 AM
@nmargosian If you search your NiFi canvas for uuid: 7cde3c5c-016b-1000-0000-00004c82c4b2 Do you find that connection? This is the connection that this swap file would get swapped back in to. If this connection does not exist, then the swap file cannot be loaded back in to it. Any chance someone removed a connection from the canvas while this node was not connected to the cluster? Did you recently upgrade from an older NIFi version? Did you copy a flow.xml.gz from a different node in your cluster to this node because of a flow mismatch exception? Just looking for reason as to why this connection would be missing. Does the NiFi flow archive directory exist? Does the NiFi service user have proper permissions to read and write to that archive directory? Does NiFi have proper ownership and permissions to write to the flow.xml.gz file? When you make a change on the canvas, NiFi makes that change in the in memory flow, archives the current flow.xml.gz and then writes a new flow.xml.gz. I am wondering if perhaps the above connection was added to the canvas and flow enabled, but for some reason was unable to write out a new copy of the in memory flow to a flow.xml.gz. On NiFi restart, the flow from the flow.xml.gz is what is loaded back in to memory. Hope this helps, Matt
... View more
03-24-2021
06:35 AM
@Garyy So the first place you may want to start is opening developer tools in your browser and then trying to connect your NiFi UI and take note of what calls are taking the longest to return and which it eventually times out on. You may also want to enable garbage collection logging within your NiFi JVM (do this by adding GC logging java args in the NiFI bootstrap.conf file). If your JVM is encountering long and/or frequent GC (all GC events are stop-the-world events), this can result in timeouts made to the UI. It is also not entirely clear to me what you mean by your UI access get terminated. NiFi does not use sessions. What are you observing? Can you provide more detail and screenshots? Matt
... View more
03-24-2021
06:09 AM
@dzbeda Have you tried specifying the index in your configured Query within the GetSplunk NiFi processor component? NiFi is not going to be able to provide you a list of indexes from your Splunk to choose from, you would need to know what indexes exist in your Splunk server. Hope this helps, Matt
... View more
03-22-2021
05:40 AM
1 Kudo
@Garyy That depends on what your definition is of "Standalone" NiFi server. A one node NiFi cluster and a standalone NiFi are two different things and I have seen many users with a one Node NiFi cluster refer to that as standalone. A true standalone NIFi does not use Zookeeper and does not need to replicate requests to other nodes in the NiFi cluster, so none of the "cluster" configuration properties are used. Additionally, a standalone NiFi has no dependency on zookeeper at the core level. NiFi clusters use ZK for Cluster Coordinator and Primary node election (these roles only exist in a cluster setup) and Cluster wide state storage (Standalone has no need to share state with other nodes, so all state is simply stored locally). You can tell if your NiFi is a true standalone by checking the following property in the nifi.properties file: nifi.cluster.is.node If it is set to true, then this NiFi is configured to operate as a cluster even if their is only one node If it is set to false, then node is truly standalone. Hope this helps, Matt
... View more
03-17-2021
06:02 AM
@sambeth NiFi authorization lookups require an exact case sensitive match between the resulting authentication user string or associated group string (NIFi user group providers configured in the authorizers.xml are responsible for determining associations between user strings and group strings within NiFi) and the user/group strings which the the authorized policies are assigned to. So if the User identity string that results after Authentication process is: "CN=John, OU=Doe", then that exact case sensitive string must be what the policies are authorized against. NiFi does provide the ability to use Java regular expressions post authentication to manipulate the authentication string before it is passed on for authorization. These Identity mapping pattern, value, and transform properties can be added to the nifi.properties file. https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#identity-mapping-properties For example: nifi.security.identity.mapping.pattern.dn=^CN=(.*?), OU=(.*?)$ nifi.security.identity.mapping.value.dn=$1 nifi.security.identity.mapping.transform=LOWER Now if Authentication resulted in string "CN=John, OU=Doe", the above Regex would match and the resulting user client string would be "john" (capture group 1 is value used and transformed to all lowercase) You can create as many of these mapping pattern sets of properties as you like as long as each property name is unique in its last field... nifi.security.identity.mapping.pattern.dn2=
nifi.security.identity.mapping.pattern.kerb=
nifi.security.identity.mapping.pattern.kerb2=
nifi.security.identity.mapping.pattern.username=
etc.... IMPORTANT note: These "patterns" are evaluate against every authenticated string (this includes Mutual TLS authentication such as those between NIFi nodes using the NiFi keystore) in alpha-numeric order. The first java regular expression to match will have its value applied and transformed. So making sure you properties are build in order of most complex regex to most generic regex is very important. Hope this helps you, Matt
... View more
03-15-2021
04:04 PM
2 Kudos
@sambeth The hash (#) character is reserved as a delimiter to separate the URI of an object from a fragment identifier. Registry has a number of different fragment Identifiers. The fragment identifier represents a part of, fragment of, or a sub-function within, an object. The fragment identifier follows the "/#/" in the URL and can represent fragments in text documents by line and character range, or in graphics by coordinates, or in structured documents using ladders. For example the "grid-list" of flows displayed when you access the NiFi UI. No, you cannot remove the # from the URL. Are you encountering an issue? Hope this helps, Matt
... View more
03-15-2021
03:53 PM
@sambeth Your NiFi instance is proxying the request for your NiFi user to NiFi-Registry when you try to start version control, change version, etc on a process group on the NiFi canvas. All your NiFi hosts must be authorized for the following in NiFi-Registry: You can authorize these Special Privileges from teh NiFi-Registry UI by clicking on the wrench icon in upper right corner and clicking the pencil (edit) icon to the right of your NiFi host in the list of users. This allows your NiFi hosts to read all the buckets, but your NiFi user who the request is being proxied for must still be authorized for whichever specific buckets you want that user to have access to. Typically an admin user will be responsible for creating buckets in NiFi-Registry and the authorizing specific user access to those buckets. Again through the wrench icon, the admin would create that new bucket and then click on the pencil (edit) icon which shows the following: Above is example for a bucket "user-bucket" I created using the NiFi-Registry Admin user. I then authorized my users group and nifisme1 user ability to read (can import flows from bucket and see flows in bucket), write (can version control new PG to this bucket and commit new version of existing version controlled PG), and delete (can deleted version controlled flows from bucket via NiFi-Registry UI). If you found this solution helped resolve your issue, please take a moment to login and accept it. Hope this helps, Matt
... View more
03-15-2021
03:35 PM
@alexwillmer NiFi does not support using wildcards in all scenarios. Access decisions would include authorization against specific endpoints. Not access decisions that may not work with wildcards may include some buttons remaining greyed out. So if you encounter a NiFi Resource Identifier is not giving you the expected result with a wildcard, try setting the policy explicitly and see if desired outcome is observed. The following article provides insight in to the expected access provided by each NiFi Resource Identifier: https://community.cloudera.com/t5/Community-Articles/NiFi-Ranger-based-policy-descriptions/ta-p/246586 NiFi actually downloads the policy definitions from Ranger and all authorizations are done based on the last downloaded set of policies (NiFi runs a background thread to check for updated policy definitions from Ranger). NiFi does not send a request to verify authorization to Ranger itself. Hope this helps, Matt
... View more
03-09-2021
10:01 AM
@nishantgupta101 There is no reason you could not write your own custom script that connects to a FTPS endpoint to retrieve a file which can be called via the ExecuteStreamCommand processor. There are also other script based processors that you can use.
... View more
03-08-2021
12:48 PM
1 Kudo
@pacman In addition to what @ckumar already shared: NiFI purposely leaves components visible to all on the canvas. but unless authorized to view those components, they will display as "ghost" implementations. "Ghosted" components will not show any component names or classes on them. They will only show stats. Unauthorized users will be unable to view or modify the configuration. User will also be unable to list or view data in connections (only see numbers of FlowFiles queued on a connection). The reason NiFi shows these ghosted components is to prevent multiple users from building their dataflows on top of one another. It is very common for users to have multiple teams building their own dataflows, but then also have monitoring teams that may be authorized as "operators" across all dataflows. Or they may have some users that are members of multiple teams. That means these users who can see more, would be left with potentially components layered on top of one another making management very difficult. The stats are there so even if a user can not view or modify a component, they can see where FlowFile backlogs are happening. Since NiFi operates within a single JVM and every dataflow, no matter which user/team built them, is executed as the NiFi service user, everything must share the same system resources (various repos, Heap memory, disk I/O, cpu, etc). These stats provide useful information that one team can use to communicate to another team should resource utilization become an issue. NiFi's authorization model allows users to make very granular access decisions for every component. Authorizations are inherited from the parent process group unless more granular policies are setup on a child component (processor, controller service, input/output port, sub-process group, etc..). Hope this helps, Matt
... View more
03-03-2021
06:28 AM
1 Kudo
@Pavitran This does present a challenge. Typically the ListFile is used to list files from a local file system. That processor component is designed record state (default based on last modified timestamp) so that only newer files are consumed. But the first run would result in listing all files by default. Also looking at your example, your latest directory does not correspond to current day. The listFile (does not actually consume to content) generates a 0 byte FlowFile for each file listed along with some attributes/metadata about the source file. The FetchFile processor would then be used to fetch the actual content, this allows with large listing to redistribute these 0 byte FlowFiles across all nodes in your cluster before consuming the content (provided same local file system is mounted across all nodes. If different files per node, do not load balance between processors). So you could make a first run which lists everything and just delete those 0 byte files. That would establish state. Then from that point on the ListFile would only list the newest files created. Pros: 1. State allows this processor to be unaffected by outages, the processor will still consume latest all non previously listed files after an outage. Cons: 1. You have this initial run which would create potentially a lot of 0 byte FlowFiles to get rid of in order to establish state. 2. With an extended outage, on restart of the flow it may consume more than just the latest since it will be consuming all files with newer timestamps than timestamp last stored in state. Other options: A: The ListFile processor has an optional property that sets the " Maximum File Age" which limits the listing of Files to those not olde then x set amount of time. Pros to. setting this property: 1. Reduces or eliminates the massive listing on first run Cons to setting this property: 2. Under an extended outage, where outage exceeds configured " Maximum File Age", a file you wanted listed may be skipped. B: Since the FetchFile uses attributes/metadata from to fetch the actual content, you could craft a source FlowFile on your own and send it to the FetchFile processor. For example, use a ExecuteStreamCommand processor execute a bash file on disk to get the list of Files only from the latest directory. Then use UpdateAttribute to add the other required attributes needed by FetchFile to get the actual content. Then use SplitFile to split that listing of files in to individual FlowFiles before the FetchFile processor. Pros: 1. You are in control of what is being listed. Cons: 1. Depending on how often a new directory is created and how often you run your ExecuteStreamCommand processor, you may end up listing the same source files over again since you will not have a state option with ExecuteStreamCommand. But you may be able to handle this via detectDuplicate processor in your flow design. 2. If the listed Directory has some new file added to it post previous listing by ExecuteStreamCommand, next run will list all previous files again along with new ones for same directory. Again, might be able to handle this with detectDuplicate processor. Hope this helps give you some ideas, Matt
... View more
03-01-2021
05:53 AM
@IAMSID I think you are asking two different questions here. In order for the community to help, it would be useful if you gave fo detail around each of your issues. Your example is not clear to me. NOt knowing anything about your source data, what characters are you not expecting? Providing the following always helps: 1. A dataflow template showing what you have done. 2. Sample input file 3. Desired output file base on above sample For query 2, 1. How is data being ingested in to NiFi? 2. Configuration of processor components used to ingest data (ConsumeKafka<version>, ConsumeKafkaRecord<version>, recordWriter, etc...)? 3. What other processors does the FlowFile pass through in this dataflow (flow template)? Thanks, Matt
... View more
02-11-2021
07:57 AM
@adhishankarit When moving on to a new issue, I recommend always starting a new query for better visibility. (for example, someone else in the community may have more experience with new issue then me). As far as your new query, your screenshots do not show any stats on the processor to get an idea of what we are talking about here in terms of performance. How many fragments are getting merged? How large are each of these fragments? NiFi nodes are only aware and have access to the FlowFiles on the individual node. So if node a is "out" (not sure what that means), any FlowFiles still on node "a" that are part of the same fragment will not yet be transferred to node b or c to get binned for merge. The Bin can not be merged until all fragments are present on the same node. Since you mention that bin eventually happens after 10 minutes, tells me that eventually all fragments eventually make it on to the same node. I suggest the first thing to address here is your space issue on your nodes. Also keep in mind that while you have noticed that node "a" has always been your elected primary node, there is no guarantee that will always be the case. A new Cluster Coordinator and Primary node can be elected by Zookeeper at anytime. If you shutdown or disconnect currently elected primary node "a" you should see another node get elected as primary node. Adding node "a" back in will not force ZK to elect it back as primary node. So don't build your flow around a dependency on any specific node being primary node all the time. Matt
... View more
02-10-2021
10:27 AM
1 Kudo
@bsivalingam83 The ability to "ignore" properties in various NiFi config files was added with the CFM 1.0.1 release. With older CFM versions (1.0.0) you can set a safety valve to overwrite the current set java.arg.13 value with something else: Above simply defines a key=value pair which would simply not be used by NiFi bootstrap. End result, NiFi no longer using G1GC and instead using the default Garbage Collector for your version of Java being used. Hope this helps, Matt
... View more
02-10-2021
09:55 AM
@has The listFile processor does not accept an inbound connection. If you know the filename of the file being created and the path where that file is created, all you need is the FetchFile processor. handleHTTPRequest ---> <flow to execute jar> --> updateAttribute (set path and filename attributes) --> FetchFile --> handleHttpRequest --> <rest of dataflow> The handleHttpRequest processor does not return content. It simply returns a response code back to for the original request which has not yet been responded to. The only thing you have control over in the response is the status code sent and sending custom headers in the response. Hope this helps, Matt
... View more