Member since
07-30-2019
3470
Posts
1642
Kudos Received
1018
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 295 | 05-06-2026 09:16 AM | |
| 481 | 05-04-2026 05:20 AM | |
| 354 | 05-01-2026 10:15 AM | |
| 522 | 03-23-2026 05:44 AM | |
| 391 | 02-18-2026 09:59 AM |
12-16-2022
11:07 AM
@PradNiFi1236 NiFi is designed to be data agnostic. So content that NiFi ingested is preserved in binary format wrapped in a NIFi FlowFile. It then becomes the responsibility of an individual processor that needs to operate against the content to understand the content type. The mergeContent processor does not care about the content type. This processor numerous merge strategies: - Binary concatenation simply writes the binary data from one FlowFile to the end of the binary data from another. There is no specific handling based on the content type. So combining two PDF files in this manor is going to leave you with unreadable data which explains why even with the larger content type of the merged FlowFile, you still only see the first PDF in the merged binary. - Tar and zip combines multiple pieces of content in to a single tar file. You can then later untar or unzip this to extract the multiple separate pieces of content it contains. So would preserve both independent PDF files. - FlowFile stream is unique to NiFi and merges multiple NiFi FLowFiles (A FlowFile consist of content and FlowFile metadata. This strategy is only used to preserve that NiFi metadata with the content for future access by another NiFi. - Avro expects the content being merged is already of Avro format. This will properly merge Avro type data in to single new Avro content. So the question here is first, how would you accomplish the merging of two PDF outside of NiFi. Then investigate how to accomplish the same within NiFi, if possible. TAR and ZIP will work to get you one output file; however, if your use case is to produce 1 new PDF form 2 original PDFs, mergeContent is not going to do that. If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped. Thank you, Matt
... View more
12-16-2022
08:53 AM
1 Kudo
@Mosoa Before upgrading Apache NiFi you should read through the migration guide to include all releases between your current release and the release you are upgrading to. https://cwiki.apache.org/confluence/display/NIFI/Migration+Guidance The Hive nar was removed as of Apache NiFi 1.17.0, so I am guessing your previous version was at least 1.16 or older. As far as downloading and adding additional nars to NiFi, it is very easy. 1. Go to https://search.maven.org/ in your browser. 2. Search for Apache "NiFi Hive nar" 3. A list of artifacts will be shown, you'll need to click on all those you need, but I would start with "nifi-hive-nar" and "nifi-hive-services-api.nar" by clicking on the version number below each. 4. From the nar specific page you will see a "Downloads" option in the upper right corner of the page: 5. When you click on it three option appear. Select "nar". 6. Place the downloaded nar files in to the "lib" directory of your NiFi 1.19.1 installation. You'll notice that this directory already contains nar for other component classes already included with the base download. 7. Make sure ownership and permissions on these new nar files match other nars in the "lib" directory. 8. Start your NiFi 1.19.1 Now you will see the Hive components available in your NiFi UI: If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped. Thank you, Matt
... View more
12-13-2022
07:07 AM
@sathish3389 Routing based on a sensitive value is an unusual use case. I'd love to hear more about this use case. Ultimately the RouteOnAttribute processor expects a boolean NiFi Expression Language Statement. So you want to have a sensitive parameter value that is evaluated against something else (another attribute on the inbound FlowFile) and if true route to a new relationship. Is what you are comparing this sensitive parameter value against also sensitive? If so, how are you protecting it as Attributes on FlowFiles are not sensitive and stored in plaintext. The ability to use Sensitive Parameters in dynamic properties (non password specific component properties) was added via https://issues.apache.org/jira/browse/NIFI-9957 in Apache NiFi 1.17.0. While this change created the foundation for such dynamic Property support for sensitive parameters, individual components need to be updated to utilize this new capability. As you can imagine with well over 300+ components available to NiFi, this is a huge undertaking. So what i see in the apache community are changes based on specific use case requests. I'd recommend creating an Apache NiFi Jira detailing your use case and working with the Apache Community to adopt that use case change to the RouteOnAttribute processor to support dynamic property support for Sensitive parameters. If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped. Thank you, Matt
... View more
12-13-2022
06:35 AM
@MaarufB You must have a lot of logging enabled that you expect multiple 10MB app.log files per minute. Was NiFi ever rolling files? Check your NiFi app.log for any Out of Memory (OOM) exceptions. Does not matter what class is throwing the OOM(s), once the NiFi process is having memory issues, it impacts everything within that service. If this is the case, you'll need to make changes to your dataflow(s) or increase the NiFi heap memory. Secondly, check to make sure you have sufficient file handles for your NiFi process user. For example; - If your NiFi service is owned by the "nifi" user, make sure the open file limit is set to a very large value for this user (999999). A restart of the NiFi service before the change to file handles will be applied. If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped. Thank you, Matt
... View more
12-12-2022
06:01 AM
@Onkar_Gagre The Max Timer Driven Thread pool setting is applied to each node individually. NiFi nodes configured as a cluster ate expected to be running on same hardware configuration. The guidance of 2 to 4 times the number of cores as starting point is based on the cores of a single node in the cluster and not based off the cumulative cores across all NiFi cluster nodes. You can only reduce wait time as you reduce load on the CPU. In most cases, threads given out to most NiFi processors execute for only milliseconds. But some processors operating against the content can take several seconds or much longer depending on function of the processor and/or size of the content. When the CPU is saturated these threads will take even longer to complete as the CPU is giving time to each active thread. Knowing the only 8 threads at a time per node can actually execute concurrently, a thread only gets a short duration of time before giving some to another. The pauses in between are the CPU wait time as thread queued up wait for their turns to execute. So reducing the max Timer Driven Thread count (requires restart to reduction to be applied) would reduce maximum threads sent to CPU concurrently which would reduce CPU wait time. Of course the means less concurrency in NiFi. Sometimes you can reduce CPU through different flow designs, which is a much bigger discussion than can be handle efficiently via the community forums. Other times, your dataflow simply needs more CPU to handle the volumes and rates you are looking to achieve. CPU and Disk I/O are the biggest causes of slowed data processing. If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped. Thank you, Matt
... View more
12-09-2022
02:00 PM
@F_Amini @Green_ Is absolutely correct here. You should be careful when increasing concurrent tasks as just blindly increasing it everywhere can have the opposite effect on throughput. I recommend stopping setting the concurrent tasks back to 1 or maybe 2 on all the processor where you have adjusted away from the default of 1 concurrent task. Then take a look at the processor further downstream in your dataflow where it has a red input connection but black (no backpressure) outbound connections. This processor s @Green_ mentioned is the processor causing all your upstream backlog. Your'll want to monitor your CPU usage as you make small incremental adjustments to this processors concurrent tasks until you see the upstream backlog start to come down. If while monitoring CPU, you see it spike pretty consistently at 100% usage across all your cores, then your dataflow has pretty much reached the max throughput it can handle for yoru specific dataflow design. At this point you need to look at other options like setting up a NiFi cluster where this work load can be spread across multiple servers or designing your datafow differently with different processors to accomplish same use case that may have a lesser impact on CPU (not always a possibility). Thanks, Matt
... View more
12-09-2022
01:28 PM
@Onkar_Gagre Let's take a look at concurrent task here.... You have a an 8 core machine. You have a ConsumeKafka configured with 8 concurrent tasks and 4 nodes. I hope this means your Kafka topic has 32 partitions because that processor creates a consumer group with the 8 consumers from each node as part of that consumer group. Kafka will only assign one consumer from a consumer group to 1 partition. So having more consumer then partitions gains you nothin, but can cause performance issues caused by rebalance. Then you have a QueryRecord with 40 Concurrent tasks per node. Each allocated thread across your entire Dataflow needs time on the CPU. So just between these two processor alone, you are scheduling up to 48 concurrent threads that must be handled by only 8 cores. Based on your description of data volume, it sounds like a lot of CPU wait when enable this processor as each thread is only get a fraction of time on the CPU and thus taking long to complete its task. Sounds like you need more Cores to handle your dataflow and not necessarily an issue specific to the use of the QueryRecord processor. While you maybe scheduling concurrent tasks too high for your system on the QueryRecord processor, The scheduled thread come from the Max Timer Driven Thread pool set in yoru NiFi. The default is 10 and I assume you increased this higher to accommodate the concurrent tasks you have been assigning to your individual processors. The general starting recommendation for the Max Timer Driven Thread pool setting is 2 to 4 Times the number of cores on your node. So with an 8 core machine that recommendation would be 16 - 32. The decision/ability to set that even higher is all about your dataflow behavior along with your data volumes. It requires you to monitor cpu usage ad adjust the pool size in small increments. Once CPU is maxed there is nothing much we can do with create more CPU. If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped. Thank you, Matt
... View more
12-09-2022
01:09 PM
@Techie123 Can you provide more detail around your requirement for "the FFs order is also important". My initial thought here would be a two phase merge. In the first Merge you utilize a correlation FlowFile attribute you create on each FlowFile based off the employees ID extracted from the record. Setting min number of entries to 7 and max to 10. Then you take these employee merged records and merge them together in to larger FlowFiles using MergeRecord. The question is if 100 records per FlowFile is a hard limit or not which it does not. The MergeRecord processor Max number of records is soft limit. Let's assume we set this to 100. So lets say one of your merged employee records comes to the MergeRecord and has 7 records in it for that employee ID, yet the bin already has 98 records in it. Since bin min has not been met yet, this merged FlowFile still gets added and results in merged FlowFile with 105 records. If you must keep it under 100 records per FlowFile set the max records to 94. If at end of adding a set of merged employee records it is less than 94 another merge employee record would be added and since you stated each set of merged employee records could be up to 7, this keeps you below or at 100 in that single merged record. If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped. Thank you, Matt
... View more
12-06-2022
12:48 PM
@Onkar_Gagre 1. What is the CPU and Memory usage of your NiFi instances when the QueryRecord processor is stopped? 2. How is your QueryRecord processor configured to include scheduling and concurrent task configurations? What other processors were introduced as part of this new dataflow? 3. What does disk I/O look like while this processor is running? NiFi documentation does not mention any CPU or Memory specific resource considerations when using this processor. Thanks, Matt
... View more
12-05-2022
11:33 AM
1 Kudo
@Ghilani NiFi stores Templates in the flow.xml.gz file. the flow.xml.gz is just a compressed copy of dataflow(s) which reside inside NiFi's heap memory while NiFi is running. It is not recommended to keep templates in your NiFi. NiFi templates are also deprecated and will go away in next major release. It is recommended to use NiFi-registry to store version controlled flows. If not using NiFi-Registry, Flow definitions should be downloaded instead of creating templates and stored safely somewhere outside of NiFi itself. A flow definition can be downloaded by right clicking on a process group in NiFi and selecting "Download flow definition". This json file will be generated of that flow and downloaded. Flow definitions can be uploaded to NiFi by dragging the create Process Group icon to the canvas and selecting option to upload flow definition. If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped. Thank you, Matt
... View more