Member since
07-30-2019
3391
Posts
1618
Kudos Received
999
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 248 | 11-05-2025 11:01 AM | |
| 485 | 10-20-2025 06:29 AM | |
| 625 | 10-10-2025 08:03 AM | |
| 402 | 10-08-2025 10:52 AM | |
| 460 | 10-08-2025 10:36 AM |
04-26-2024
06:02 AM
@s198 Back Pressure thresholds are configured on NiFi connections between processors. There are two types of back pressure thresholds 1. Object Threshold - Back pressure is applied once the Number of FlowFiles reaches or exceeds the setting (default is 10,000 FlowFiles). Applied per node and not across all nodes in a NiFi cluster. 2. Size Threshold - Back pressure is applied once the total data size of queued FlowFiles reaches or exceeds the setting (default is 1 GB). Applied per node and not across all nodes in a NiFi cluster. When Back pressure is being applied on a connection, it prevents the immediate processor that feeds data into that connection form being scheduled to execute until the back pressure is no longer being applied. Since back pressure is a soft limit, this explains you two different scenarios: 1. 20 FlowFiles being transferred to connection feeding your mergeContent processor. Initially that connection is empty so no back pressure is applied. The preceding processor that starts adding FlowFiles to that connection until the "Size Threshold" of 1 GB was reached and thus back pressure is then applied preventing the preceding processor from being scheduled and processing the remaining 6 files. The max bin age set on your mergeContent processor then forces the bin containing the first 14 FlowFiles to merge after 5 minutes thus removing the back pressure that allowed nect 6 files to be processed by upstream processor. 2. The connection between the FetchHDFS and PutSFTP processor has no back pressure being applied (neither object threshold or size threshold has been reached or exceeded), so the FetchHDFS is scheduled to execute. The execution resulted in a single FlowFile larger then the 1 GB size threshold, so back pressure would be applied as soon as that 100 GB file was queued. As soon as the putSFTP successfully executed and moved the FlowFile to one of it's downstream relationships, the FetchHDFS would have been allowed to get scheduled again. There are also processor that do execute on batches of files in a single execution. The list and split based processors like listFile and splitContent are good examples. It is possible that the listFile processor performs a listing execution containing in excess of 10,000 object threshold. Since no back pressure is being applied that execution will be successful and list create all 10,000+ FlowFiles that get transferred to the downstream connection. Back pressure will then be applied until the number of FlowFiles drops back below the threshold. That means as soon as it drops to 9,999 back pressure would be lifted and the listFile processor would be allowed to execute. In your mergeContent example you did the proper edit to object size threshold to allow more FlowFiles to queue in the upstream connection to your mergeContent. If you left the downstream connection containing the "merged" relationship with default size threshold, back pressure would have been applied as soon as the merged FlowFile was added to the connection since its merged size exceeded the 1 GB default size threshold. PRO TIP: You mentioned that your daily merge size may vary from 10 GB to 300GB for your mergeContent. How to handle this in the most efficient way depends really on the number of FlowFiles and no so much on the size of the FlowFiles. Only thing to keep in in mind with size thresholds is the content_repository size limitations. The total disk usage by the content repository is not equal to the size of the actively queued FlowFiles on the canvas due to the fact the content is immutable once created and how NiFi stores FlowFile's content in claims. NiFi holds FlowFile attributes/metadata in NiFi's heap memory for better performance (swapping thresholds exist to help prevent Out of Memory issues but impact performance when swapping is happening). NiFi sets object threshold at 10,000 because swapping does not happen at that default size. When merging batches of FlowFiles in very large number you can get better performance from two MergeContent processors in series instead of just one. To help you understand above more, I recommend reading the following two articles: https://community.cloudera.com/t5/Community-Articles/Dissecting-the-NiFi-quot-connection-quot-Heap-usage-and/ta-p/248166 https://community.cloudera.com/t5/Community-Articles/Understanding-how-NiFi-s-Content-Repository-Archiving-works/ta-p/249418 Please help our community thrive. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt
... View more
04-23-2024
07:56 AM
Thank you, @MattWho for providing timely responses and quick solutions to the queries. You are really helping the community grow. Hats off to you. Appreciate it
... View more
04-22-2024
12:58 PM
@MattWho You will see this behavior even if you don't use docker only nifi standard binaries. I validated this copying those dirs below from 1.25.0 installation to 2.0.0-M2 installation both without using docker. conf/ content_repository/ database_repository/ flowfile_repository/ provenance_repository/ state/ Thank you again
... View more
04-22-2024
06:13 AM
@manishg Not sure what version of Apache NiFi you are using here. I would not recommend using the InferAvroSchema processor. Depending on your use case there may be better options. Most record reader like (CSVReader) have that ability in infer schema From the output provided you have a CSV file that is 44 bytes in size. According to the InferAvroSchema processor documentation: When inferring from CSV data a "header definition" must be present either as the first line of the incoming data or the "header definition" must be explicitly set in the property "CSV Header Definition". A "header definition" is simply a single comma separated line defining the names of each column. The "header definition" is required in order to determine the names that should be given to each field in the resulting Avro definition. Does your content here meet the requirements of the InferAvroSchema processor? Do you see same issue if you try to infer schema via the CSVReader controller service? These two different components do not infer schema in the same way. The InferAvroSchema is not part of the Apache NiFi and utilizes the Kite SDK which is no longer being maintained. Please help our community thrive. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt
... View more
04-18-2024
06:15 AM
1 Kudo
@whoknows While there s no exact date yet in the Apache NiFi community, I have seen discussions around it as recent as Apr 8th that suggests it will be happening very soon. Possibly within the next week or two. Thank you, Matt
... View more
04-17-2024
02:41 PM
1 Kudo
Thank you @MattWho Your recommendation worked for me. I have updated bootstrap.conf file in nifi. Was able to successfully enable DB controller service and persist data into Ignite database from Nifi. Details: Copied jvm parameters available in the file \apache-ignite-2.16.0-bin\bin\include\jvmdefaults.sh to \nifi-2.0.0-M2\conf\bootstrap.conf file Here is the format and exact list of java arguments added in nifi bootstrap java.arg.21=--add-opens=java.base/jdk.internal.access=ALL-UNNAMED
java.arg.22=--add-opens=java.base/jdk.internal.misc=ALL-UNNAMED
java.arg.23=--add-opens=java.base/sun.nio.ch=ALL-UNNAMED
java.arg.24=--add-opens=java.base/sun.util.calendar=ALL-UNNAMED
java.arg.25=--add-opens=java.management/com.sun.jmx.mbeanserver=ALL-UNNAMED
java.arg.26=--add-opens=jdk.internal.jvmstat/sun.jvmstat.monitor=ALL-UNNAMED
java.arg.27=--add-opens=java.base/sun.reflect.generics.reflectiveObjects=ALL-UNNAMED
java.arg.28=--add-opens=jdk.management/com.sun.management.internal=ALL-UNNAMED
java.arg.29=--add-opens=java.base/java.io=ALL-UNNAMED
java.arg.30=--add-opens=java.base/java.nio=ALL-UNNAMED
java.arg.31=--add-opens=java.base/java.net=ALL-UNNAMED
java.arg.32=--add-opens=java.base/java.util=ALL-UNNAMED
java.arg.33=--add-opens=java.base/java.util.concurrent=ALL-UNNAMED
java.arg.34=--add-opens=java.base/java.util.concurrent.locks=ALL-UNNAMED
java.arg.35=--add-opens=java.base/java.util.concurrent.atomic=ALL-UNNAMED
java.arg.36=--add-opens=java.base/java.lang=ALL-UNNAMED
java.arg.37=--add-opens=java.base/java.lang.invoke=ALL-UNNAMED
java.arg.38=--add-opens=java.base/java.math=ALL-UNNAMED
java.arg.39=--add-opens=java.sql/java.sql=ALL-UNNAMED
java.arg.40=--add-opens=java.base/java.lang.reflect=ALL-UNNAMED
java.arg.41=--add-opens=java.base/java.time=ALL-UNNAMED
java.arg.42=--add-opens=java.base/java.text=ALL-UNNAMED
java.arg.43=--add-opens=java.management/sun.management=ALL-UNNAMED
java.arg.44=--add-opens=java.desktop/java.awt.font=ALL-UNNAMED We just need to make sure java.arg.<> numbers are unused in the bootstrap file that we are working on. Thanks again!
... View more
04-16-2024
06:30 PM
@tmarkfeld Has the reply helped resolve your issue? If so, please mark the appropriate reply as the solution, as it will make it easier for others to find the answer in the future. Thanks.
... View more
04-15-2024
09:34 AM
1 Kudo
@Ytch All components on the NiFi canvas are executed as the NiFi service user and not as the user currently authenticated into the NiFi service. So what you should do is from each host in your NiFi cluster (do on every host since any one of the hosts can be elected as the primary node at any given time), open a command prompt window/console window, become the user that owns the NiFi process, and manually ssh/sftp to the target SFTP server. You will likely be prompted to add the target SFTP server to your known_hosts file for the NiFi service user. NiFi SFTP processor has no way of doing this interactive step. After successfully adding the SFTP to the known_hosts file for the NiFi service user, go back and try to start the GetSFTP or ListSFTP processors again to see if your issue is resolved. If not, please share your GetSFTP and ListSFTP processor component configurations. Also check the nifi-app.log for any exceptions or log output related to these processors. If no log output, you could also try enabling debug in the NiFi logback.xml for these processor classes to see what additional log output may be produced that could be useful here. classes for these processors are: org.apache.nifi.processors.standard.GetSFTP
org.apache.nifi.processors.standard.ListSFTP new log lines would look like this that you would add to logback.xml: <logger name="org.apache.nifi.processors.standard.GetSFTP" level="DEBUG"/>
<logger name="org.apache.nifi.processors.standard.ListSFTP" level="DEBUG"/> Simply add them in logback.xml where you see similar lines already. Please help our community thrive. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt
... View more
04-15-2024
05:31 AM
@EddyChan NiFi should only be generating a keystore and truststore on startup if you have not manually configured NiFi's nifi.properties file to use your personally generated keystore and truststore files. Even if they are generated, NiFi would still use your configured keystore and truststore files. Please help our community thrive. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt
... View more
04-09-2024
05:23 AM
Hi Matt, we configured 24GB as xms and xmx parameters. For now we have a normal use of memory without OOM errors. Thank you, Fortunato
... View more