Member since
02-07-2019
1949
Posts
131
Kudos Received
26
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
276 | 02-01-2024 10:51 PM | |
2341 | 01-22-2024 08:42 PM | |
891 | 10-18-2023 10:07 PM | |
1315 | 07-24-2023 10:27 PM | |
2357 | 05-08-2023 12:28 AM |
04-01-2024
02:36 AM
1 Kudo
@frbelotto, @ZainK Did the response assist in resolving your query? If it did, kindly mark the relevant reply as the solution, as it will aid others in locating the answer more easily in the future.
... View more
03-28-2024
08:30 AM
Thank you @MattWho . You are awesome!
... View more
03-27-2024
12:30 PM
@jpalmer From the image you shared the bottleneck is actually in the custom non Apache NiFi out-of the-box PutGeoMesa 4.0.4 processor. A connection has backpressure settings to limit the amount of FlowFiles that can queue be queued (it is a soft limit which means back pressure gets applied once Connection backpressure threshold is reached or exceeded). Once backpressure is applied it will not be release until queue drops back below the configured thresholds. Backpressure when applied prevents the upstream processor from being scheduled to execute until that backpressure is removed. The connection turns red when backpressure is being applied and since the connection after PutGeoMesa 4.0.4 is not red, no backpressure is being applied on that processor. So you issue is the PutGeoMesa 4.0.4 is not able to process the FlowFiles being queued to it fast enough thus causing the backup in every upstream connection leading to the source processor. Since it is a custom processor I can't speak to its performance capabilities or tuning capabilities. I also don't know it the PutGeoMesa 4.0.4 processor will support concurrent executions either, but you could try: If you right click on the PutGeoMesa 4.0.4 processor and select configure, you can select the SCHEDULING tab. Within the Scheduling tab you can set "CONCURRENT TASKS". The default is 1 and this custom processor might ignore this property. What concurrent task does is allow the processor execute multiple times concurrently (so think of it as for each additional concurrent task, you are creating another identical processor). A processor component is scheduled to request a thread to execute base on the configured Run Schedule (for Timer Driven Scheduling Strategy the default 0 secs means schedule as fast as possible). So when it is scheduled the processor request a thread from the NiFi Timer Driven thread pool. That thread is then used to execute the processors code against a source connection FlowFile(s). The scheduler will the try to schedule it again based on run schedule and if concurrent task is still set to 1 and the previous execution is still running. it will not execute again until the in use thread finishes. But if you set concurrent tasks to say 3, the processor could potentially execute 3 threads concurrently (each thread working on different FlowFile(s) from source connection). Again I don't know if this custom processor will ignore this property or support it. Nor do I know if this processor was coded in a thread safe manor meaning that concurrent thread executions would not cause issues. so even if this appears to improve throughput, verify your data integrity coming out of the processor. Also keep in mind that adding concurrent tasks to a processor (especially a processor like this that appears to have long running threads. We can see it only processed 23 FlowFiles using 4.5 minutes of CPU time which is pretty slow) can quickly lead to this processor using all the available threads from the Max Timer Driven Thread pool resulting in other processors appearing to perform slower as they get an available thread to execute less often. You can increase the size of the Max Timer Driven Thread pool from the NiFi global menu in upper right corner, but need to do so carefully while monitoring CPU load average and memory usage as you slowly increase the setting. If you found any of the suggestions/solutions provided helped you with your issue, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt
... View more
03-26-2024
11:57 AM
Hi @datafiber it seems like your Namenode is in Safe mode, not sure why it went into safe mode, but you can try taking it out manually and then retry the operation and monitor the logs. run the below commands from NN. # hdfs dfsadmin -safemode leave # hdfs dfsadmin -safemode status
... View more
03-26-2024
11:51 AM
Hi, @user2024 I don't the canary file is gonna cause this issue, the blocks that are corrupt/missing are now lost and cannot be recovered, you can manually delete those blocks by identifying them using the below command and run the hdfs balancer on HDFS so that NN will balance the new blocks across the cluster. # hdfs fsck -list-corruptfileblocks You can also refer to the below article. https://stackoverflow.com/questions/19205057/how-to-fix-corrupt-hdfs-files
... View more
03-24-2024
11:26 PM
@bowen, Did the response assist in resolving your query? If it did, kindly mark the relevant reply as the solution, as it will aid others in locating the answer more easily in the future.
... View more
03-22-2024
06:21 AM
1 Kudo
@Chetankc From a NiFi perspective there is not much guidance that can be given with such little information. What does "10 Billion Load" mean? Is it the number if unique files being ingested to NiFi? What is size average? What is rate of ingest? What is "15,000 process"? Is this the number of NiFi processors added to the NiFi canvas? What types of processors are being used? Does your dataflow(s) do a lot of content modification? Have you done testing on throughput performance and done any performance tuning? 15,000 processors is a lot of execution scheduling against your CPU cores. In your load testing what was you CPU load average? What was your memory impact? You also have custom NiFi components. Are you referring to these custom components as using many threads or the totality of the 15,000 components using a lot of threads? What does a lot of threads mean here? Are any of these long running threads or are they all millisecond thread executions? What kind of performance and throughput are you achieving now? and onn what type of setup (how many nodes in your NiFi cluster, number of CPU cores, JVM Heap settings, type of disk, etc) currently? Thank you, Matt
... View more
03-22-2024
06:08 AM
1 Kudo
@hidden Welcome to the world of Apache NiFi. The first recommendation I'd make is to download the latest version of Apache NiFi 1.x branch. The 1.12 branch is more then 5 years old now and there have been so many improvements, bug fixes, and security updates since its release. The new Apache NiFi 2.x branch has also been released recently. Since you are new to NiFi you may also consider utilizing the 2.x version instead to avoid hassle down the road of migrating to this new major release branch. The 1.x branch will cease to release new versions soon. When sharing exceptions for help is is best to make sure you have also inspected the NiFi-Registry logs produced in the log directory you have configured in the logback.xml file. They may provide more detailed stack traces and/or logging to help fully understand the issue you encountered. Thank you, Matt
... View more
03-22-2024
03:30 AM
1 Kudo
@lv_antel what is the complete command you are using? can you please share the complete stack trace?
... View more
03-20-2024
01:05 PM
@EFasdfSDfaSDFG Took sometime to do a feasibility study and found a couple of things from our Internal resources. 1. FYI: The native Ozone REST API is completely superseded by the S3 REST API. Therefore, there is no native REST API to manage Ozone. 2. Since I am unsure of your Use-case, I would suggest you test this either by Java API or HttpFS Gateway Interface.[0a] [0a]https://ozone.apache.org/docs/1.4.0/interface.html HTH. If you find this answering your question, Please mark this as Accept as Solution. Also, You may thank me by clicking the thumbs-up! Cheers!
... View more