Member since
07-30-2019
2901
Posts
1438
Kudos Received
844
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
18 | 04-17-2024 11:30 AM | |
26 | 04-16-2024 05:36 AM | |
26 | 04-15-2024 05:31 AM | |
118 | 04-03-2024 05:59 AM | |
127 | 04-02-2024 01:22 PM |
03-28-2024
10:10 AM
1 Kudo
@DeepakDonde The issue you are describing was caused by a change in Apache NiFi InvokeHTTP processor that tries to URL encode the URL entered. https://issues.apache.org/jira/browse/NIFI-12513 The fix for this is in https://issues.apache.org/jira/browse/NIFI-12785 which will be part of the Apache NiFi 1.26 and Apache NiFi 2.0.0-M3 releases. Since the change that caused this issue was added to Apache NiFi 1.25 and Apache NiFi 2.0.0-M2, you could use and earlier version like Apache NiFi 1.24 or Apache NiFi 2.0.0-M1 to get around the issue until the two above mentioned versions are released. Please help our community thrive. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt
... View more
03-28-2024
07:43 AM
@TreantProtector Everything the user adds to the canvas including controller service and reporting tasks are auto-saved in the flow.json.gz. Each time a change is made the current flow.json.gz is archived and new flow.json.gz is generated. Within the flow.json.g are all components (processors, connections, controller services, reporting tasks, funnels, process groups, ports, parameters, etc.) and their configurations. Any configuration property that is "sensitive" (passwords) are going to be encrypted in the flow.json.gz file. So in order to load that flow.json.gz in another NiFi, you would need to know the nifi.sensitive.props.algorithm and nifi.sensitive.props.key used by the original NiFi which it came from. Encrypted Passwords in Flows If you don't have that info, the flow.json.gz can still be loaded on another NiFi after manually editing the file to remove all the "enc{...}" values. Once flow.json.gz loads, an authorized user would need to re-enter all passwords in all components where it is needed via the NiFi UI. Please help our community grow. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt
... View more
03-28-2024
07:27 AM
@C1082 The DBCPConnectionPool is a controller service that an end user would have added via the NiFi UI. The configuration of this controller service is done by the user and one of the properties specifies the user defined location of the Database Driver which the user must provide and is not included with NiFi. The Dataflow components added to the NiFi canvas have not relationship to UI access issues. The "javax.net.ssl.SSLException: Connection reset" exception when trying to access the UI is an issue with the TLS exchange between your client (browser) and NIFi. You'll need to look closer at the nifi-app.log and nifi-user.log for this exception and review the entire stack trace that goes with it. Without knowing the specific of your NiFi setup, I can't say whether your NiFi is enforcing a Mutual TLS exchange or only a one-way TLS exchange. A securely configured NiFi depending on configuration will either "REQUIRE" the client to provide a trusted clientAuth certificate in the TSL response or "WANT" a trusted clientAuth certificate in the response. A Connection Reset may happen if the TLS exchange was not successful which could be a trusted chain issue, network issue, or missing clientAuth certificate when NiFi configuration required it in the TLS response. If you found any of the suggestions/solutions provided helped you with your issue, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt
... View more
03-28-2024
06:46 AM
@s198 Great to hear suggestions i provided solutions your question in this community question. We encourage our community members to start new threads for unrelated questions to avoid confusion on what solved the issue in a question remains clear to other community users that may come across this thread. That being said, my understanding of this new query is how you take a dataflow that starts from a single FlowFile produced by your squoop job that then becomes many FlowFiles, but requires only a single FlowFile post PutSFTP for downstream processing of job completion. That could be solved using the Wait and Notify processors which can be complicated to setup or using the "FlowFile Concurrency" capability on a Process Group. I shared a similar solution in a few other community post on how this works: https://community.cloudera.com/t5/Support-Questions/How-to-detect-all-branches-in-a-NiFi-flow-have-finished/m-p/383475#M244918 https://community.cloudera.com/t5/Support-Questions/NiFi-Trigger-a-Processor-once-after-the-Queue-gets-empty-for/m-p/381801#M244416 Please help the community grow and assist other in finding solutions that help or solve issues by taking a moment to login and click "Accept as Solution" below any response(s) that helped you. Thank you, Matt
... View more
03-27-2024
12:49 PM
2 Kudos
@s198 The List<abc> type processor are source based processors that do not accept inbound connections since they are designed to create FlowFiles and designed to modify existing FlowFiles. I am not clear on what "So we used Sqoop completion" does to create a FlowFile in your NiFi dataflow which is then passed to RouteOnAttribute (assuming this is processor you are referring to by "Router Attribute") via a connection. What Attributes exist on the FlowFile being processed by the RouteOnAttribute processor. Any FlowFile attributes on this FlowFile about the specific file needing to fetched by the FetchHDFS processor (like filename and path)? ----- If sqoop job output produced 1 FlowFile for each HDFS file to be fetched and each of those FlowFiles has attributes for path and filename of the HDFS file content to be fetched, you could do following: Set the default NiFi expression language statement "${path}/${filename}" in the "HDFS File Name" Property of the FetchHDFS processor. Those two FlowFile attributes are expected to be in the format: Attribute Name: Attribute value: filename The name of the file that will be read from HDFS. path The path is set to the absolute path of the file's directory on HDFS. For example, "/tmp/abc/1/2/3". Attribute names are case sensitive. ----- If the sqoop job simply outputs 1 FlowFile from which you expect to fetch a lot of HDFS files, that is not how FetchHDFS functions. FetchHDFS expects one FlowFile for each HDFS file content being fetched. FetchHDFS does create new FlowFiles, it only adds content to an existing FlowFile. If this matches your scenario, you may be able to use the GetHDFSFileInfo processor that does accept and inbound connection. It can be configured with just a path. If you set "Group Results = None" and "Destination = Attributes", you could send the produced FlowFiles to FetchHDFS to get the content for each FlowFile output. You would still need your RouteOnAttribute processor to make sure only FlowFiles where "${hdfs.type} = file" were routed to FetchHDFS and others types are discarded. You would probably also want an UpdateAttribute processor so you could set the filename of the FlowFile to the hdfs.objectName (done by adding dynamic property filename = ${hdfs.objectName}). Then feed those FlowFiles to your FetchHDFS processor configured to use the ${hdfs.path}${hdfs.objectName} NiFi Expression statement in the "HDFS File Name" Property. ------ If you found any of the suggestions/solutions provided helped you with your issue, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt
... View more
03-27-2024
12:30 PM
@jpalmer From the image you shared the bottleneck is actually in the custom non Apache NiFi out-of the-box PutGeoMesa 4.0.4 processor. A connection has backpressure settings to limit the amount of FlowFiles that can queue be queued (it is a soft limit which means back pressure gets applied once Connection backpressure threshold is reached or exceeded). Once backpressure is applied it will not be release until queue drops back below the configured thresholds. Backpressure when applied prevents the upstream processor from being scheduled to execute until that backpressure is removed. The connection turns red when backpressure is being applied and since the connection after PutGeoMesa 4.0.4 is not red, no backpressure is being applied on that processor. So you issue is the PutGeoMesa 4.0.4 is not able to process the FlowFiles being queued to it fast enough thus causing the backup in every upstream connection leading to the source processor. Since it is a custom processor I can't speak to its performance capabilities or tuning capabilities. I also don't know it the PutGeoMesa 4.0.4 processor will support concurrent executions either, but you could try: If you right click on the PutGeoMesa 4.0.4 processor and select configure, you can select the SCHEDULING tab. Within the Scheduling tab you can set "CONCURRENT TASKS". The default is 1 and this custom processor might ignore this property. What concurrent task does is allow the processor execute multiple times concurrently (so think of it as for each additional concurrent task, you are creating another identical processor). A processor component is scheduled to request a thread to execute base on the configured Run Schedule (for Timer Driven Scheduling Strategy the default 0 secs means schedule as fast as possible). So when it is scheduled the processor request a thread from the NiFi Timer Driven thread pool. That thread is then used to execute the processors code against a source connection FlowFile(s). The scheduler will the try to schedule it again based on run schedule and if concurrent task is still set to 1 and the previous execution is still running. it will not execute again until the in use thread finishes. But if you set concurrent tasks to say 3, the processor could potentially execute 3 threads concurrently (each thread working on different FlowFile(s) from source connection). Again I don't know if this custom processor will ignore this property or support it. Nor do I know if this processor was coded in a thread safe manor meaning that concurrent thread executions would not cause issues. so even if this appears to improve throughput, verify your data integrity coming out of the processor. Also keep in mind that adding concurrent tasks to a processor (especially a processor like this that appears to have long running threads. We can see it only processed 23 FlowFiles using 4.5 minutes of CPU time which is pretty slow) can quickly lead to this processor using all the available threads from the Max Timer Driven Thread pool resulting in other processors appearing to perform slower as they get an available thread to execute less often. You can increase the size of the Max Timer Driven Thread pool from the NiFi global menu in upper right corner, but need to do so carefully while monitoring CPU load average and memory usage as you slowly increase the setting. If you found any of the suggestions/solutions provided helped you with your issue, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt
... View more
03-27-2024
07:17 AM
@2ir I doubt it is related to Java version being used. Of course we would always recommend using the latest update version for a NiFi supported Java version. As far as using G1GC, it is commented out because Java 8 has many issues when using G1GC and the Java community decided to address those bugs and improvements in Java 9+ versions. Since you are using Java 11, G1GC would be a better option. With that lin commented out, NiFi does not specify and GC for yoru Java and whatever the Default GC defined within your java release would be used. That line allows you to override yoru java default and specify the GC you want to use. Memory issues are often attributed to issues in custom components added to the NiFi deployment or dataflow design choices. Hence all the dataflow related input i provided previously. You never mentioned if you were encountering any out of memory (OOM) error logs in your NiFi logs? If not, do you see any OOMs if you decrease your Heap memory setting which you have set rather high already. Also recommend setting both xms and xmx to same value. If you found any of the suggestions/solutions provided helped you with your issue, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt
... View more
03-27-2024
06:58 AM
1 Kudo
@jpalmer We'll need some more details to help here: 1. Is this a standalone single NiFi instance or a NiFi multi-instance cluster setup? 2. How many partitions on your source NiFi Kafka topic? 3. How do you have your MergeContent processor configured? 4. When you say connection quickly fills up, what are the settings on the connection? 5. With your flow running and processing FlowFiles through the dataflow connections, what is the CPU load average. You can find these details from within NiFi's UI from either the cluster UI under global menu in upper right corner or the system diagnostics Ui found in the controller UI also under the global menu. 6. Do you have a lot of other dataflows also running within this same NiFi? MergeContent can be CPU and Heap memory intensive depending on its configuration There are likely ways to improve your dataflow once we know the above details. If you found any of the suggestions/solutions provided helped you with your issue, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt
... View more
03-27-2024
06:47 AM
@s198 The FetchHDFS is by default designed to be used in conjunction with the ListHDFS processor. The ListHDFS processor is designed to connect to HDFS and generate a NiFi FlowFile for each file listed from HDFS without getting the content of that HDFS file. The produced 0 byte FlowFiles contain FlowFile attributes that are then used by the FetchHDFS processor to obtain the actual content and insert it into the FlowFile's content. NiFi has numerous list/fetch sets of processors. They were designed for sources that are not NiFi cluster friendly (meaning that the client does not support a distributed fetch capability that would not result in data duplication). So in a NiFi cluster the List<abc> processor would get configured to run on the NiFi cluster primary node only so that only one node in the NiFI cluster would get the metadata about all the source files to be ingested by the NiFi clusyter. The List<abc> processor would then be connected via a Nifi connection to the Fetch<abc> processor. The connection between these two processor would be configured to load balance the 0 byte FlowFiles across all nodes in the NiFi cluster. Then the Fetch<abc> processor could run on all nodes. Since each node in the cluster has a subset of the listed files, there is no duplication and the load/work is now distributed across the NiFi cluster. If you are not using a NiFi cluster and only a standalone single NiFi instance, you could use the GetHDFS processor instead. But if you plan to ever expand to a NIFi cluster it is best to build your dataflows now with that in mind to avoid extra work later. If you found any of the suggestions/solutions provided helped you with your issue, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt
... View more
03-25-2024
12:55 PM
@saquibsk Additional settings? With a Secured NiFi (which you should always be using) there is authentication and authorization involved with any rest-api request. The simplest approach is to generate a clientAuth certificate that is trusted via the truststore your secured NiFi is configured to use in nifi.properties file. Then that certificate is added to a keystore. The invokeHTTP processor can be configured to use a StandardRestrictedSSLContextService that you configure with the keystore you created and the truststore that NiFi already uses that can trust that certificate. On NiFi side, you would need to add that client as a user entity so you can assign authorization policies to. You can then authorize that client/user identity to the policies needed to start and stop specific processor components. That policy would be the "operate the component" policy that you can set just on the QueryDatabaseTable processor or any other specific processor you want to automate. component-level-access-policies Yes, there are some initial steps to setup the keystore and truststores needed, but then those can be used over and over for all automation within NiFi you want to achieve. NiFi processors execute based on the individual processor's configured scheduling. There is no other option to stop or start individual processors except manually through the UI by an authorized user of via the rest api. NiFi was designed with an always running type architecture in mind. Stepping our of that architecture would require extra steps or redesigning your dataflows to operate within that architecture style. If your executions always happen at set times you could use cron scheduling, but that is not going to be an optimal design for performance. If you found any of the suggestions/solutions provided helped you with your issue, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt
... View more