Support Questions

Find answers, ask questions, and share your expertise

Nifi process consuming >100%CPU while executing dataflow

avatar
Contributor

i'm new to NiFi and have developed simple dataflow which would read 3 files from on-prem and upload into S3 bucket. i am using ListFile, FetchFile and PutS3 processor groups for this case.

When i trigger dataflow, CPU consumption is spiking >100% and NiFi gets crashed.After few minutes. it automatically comes down to normal. This is happening frequently. My onprem server is Linux and has 8 CPUs. Any idea what's going on  with this.

2 ACCEPTED SOLUTIONS

avatar
Super Guru

A couple things you should do:

 

1) set the schedule of listfile to 1 min or 5 min.  If it’s 0 sec it’s always running which isn’t necessary during testing.

2) tail /var/log/nifi/nifi-app.log while trouble shooting the flow to see errors. Address them individually.

 

I also suspect the Nifi node does not have enough resources (ram/cores/disk) so look into 3 dedicated nifi nodes with enough core and ram and disk configure for each repository type (see documentation) to allow you to operate Nifi in a stable manner.

View solution in original post

avatar
Super Guru

@Gubbi please share you nifi info:

 

How many nodes?

How much ram and how many cores per node?

What is min/max memory setting in NiFi?

 

 

Have you done anything with configuring nifi for performance?  For example

  1. Increasing Min/Max Ram?
  2. Disk Partitioning?
  3. Changing the Max Thread Count in Admin->Controller Services?
  4. Setting Concurrent Tasks in processor scheduling tab?

View solution in original post

12 REPLIES 12

avatar
Master Mentor

@Gubbi 

 

It is impossible to say what is going on here without more detail.
From where are you determining NiFi is using  greater than 100% CPU?

If you are you are looking at top, with 8 cpu you would have 800% cpu available (100% for each of the 8 cpu).   So 100%+ my be normal expected for the NiFi Java process.  You have the NiFi core process running, plus each processor can execute concurrently.  For example, FetchFile may be fetching the content of FLowFile 2 while PutS3 is putting FlowFile 1 to S3 at the same time.  Since NiFi is designed to multi threaded, it is possible that multiple threads being executed concurrently, with each of those threads being handled by a different CPU.  By default, the configured "Max Timer Driven Thread Count" in NiFi is set to 10.  This means that across all processors 10 threads can be requested concurrently.  This is a soft limit, so there are scenarios where the number of active threads can extend beyond the max configured thread count. Plus core level thread are used which do not come from this thread pool which is used only by NiFi components added to the canvas.

What other processes are consuming CPU on your system?

How is your NiFi configured, what are all the processors on your canvas, how are they configured, how large are the 3 files you are processing, what do the tasks/time stats on your processors show when this dataflow is executing against your 3 FlowFiles, etc?

Hope this helps give you some direction to investigate.

Matt

avatar
Contributor

@MattWho : There are no other processes running in our server. i an using below processor groups in my canvas ListFile--> FetchFile--> PutS3-->LogAttribute.

All 3files combined its not more than 6MB.   i check CPU consumption using top command. 

Also my NiFi crashes when CPU% >100% and after few minutes it comes up. Once i trigger workflow when cpu% is down, entire task execution doesn't take more time to copy file from On prem to S3.

 

So i am not sure how to fix this issue, i dont want NiFi to crash everytime i trigger dataflow.

Below is error message i get when its down:

Unable to communicate with NiFi
  • home
Please ensure the application is running and check the logs for any errors.

avatar
Contributor

Also adding to it, noticed Tasks/Time is varying drastically when new file is placed. Tasks spiked upto 300. Not sure what this means!!!!

avatar
Master Mentor

@Gubbi 

 

Not sure what you mean by spikes up to 300. A screenshot(s) maybe helpful to understand what you are observing.

avatar
Contributor

ListFile.jpg

Since you suggested to check on Tasks/Time taken by ListFile processor group i had mentioned 300.

If you notice above screenshot Tasks shows 297. 

Nifi is still crashing and i am clueless on what's going on here.

avatar
Master Mentor

@Gubbi 


What that stat is telling you is that the processor executed a total of 297 threads in the past 5 minutes.  You can see that the cumulative time for all 297 threads that executed was only 0.044 seconds.  So we can see that these 297 threads consumed in total very little time on the CPU.  What this also tells us is that the processor executed about every 1 sec.  

This is either because your configured run schedule is every 1 sec (0 sec is default) or there where no new files found when it did each execution.  To prevent a processor from consuming excessive CPU when the run schedule is set to 0 sec, NiFi will yield a processor after a thread runs which produces no results.  

What you have shown looks to be as expected behavior.

Thanks,

Matt

avatar
Master Mentor

@Gubbi 

 

Bottom line is that NiFi processor in a dataflow do not execute sequentially.  They each execute based on their configured run schedule.  Each processor that is given a thread to execute can potentially utilize a cpu until that thread completes.  Generally speaking most threads are very short lived resulting in ver minimal impact on your systems CPU.   In your dataflow, I would expect that  the FetchFile (actually retrieving the content of your 3 FlowFiles) and the putS3 (reading and sending content of your 3 FlowFiles) would hold threads the longest.  While both were executing at the same time, it could be using 200% (2 cpus).  Also keep in mind that NiFi core is using threads as well.  so seeing NiFI use over 100% is pretty much what I would expect anytime it is not sitting idle.

Hope the information provided helps you,

Matt

avatar
Super Guru

A couple things you should do:

 

1) set the schedule of listfile to 1 min or 5 min.  If it’s 0 sec it’s always running which isn’t necessary during testing.

2) tail /var/log/nifi/nifi-app.log while trouble shooting the flow to see errors. Address them individually.

 

I also suspect the Nifi node does not have enough resources (ram/cores/disk) so look into 3 dedicated nifi nodes with enough core and ram and disk configure for each repository type (see documentation) to allow you to operate Nifi in a stable manner.

avatar
Contributor

@stevenmatison  @MattWho  : I am still facing issue where NiFi is crashing when CPU >100%. While checking the logs i get only this error. And once CPU % comes down , GUI comes back again. But when i trigger the flow, same issue occurs. Btw, dataflow is to transfer files ( <50MB) from Linux to S3 with no complex logic.

Could you please suggest.

2020-04-01 23:39:02,002 ERROR [Framework Task Thread Thread-4] o.a.nifi.groups.StandardProcessGroup Failed to synchronize StandardProcessGroup[identifier=0170100e-e28a-1807-249c-3f5fee9fdb0e] with Flow Registry because could not retrieve version 1 of flow with identifier 42614870-ace3-42fa-a719-01148884d252 in bucket 0509399b-3d5c-4c8e-8a43-a898b8c58163 due to: Connection refused (Connection refused)
2020-04-01 23:39:02,002 ERROR [Framework Task Thread Thread-1] o.a.nifi.groups.StandardProcessGroup Failed to synchronize StandardProcessGroup[identifier=01701029-e28a-1807-8de5-812b10e812f4] with Flow Registry because could not retrieve version 1 of flow with identifier 043dc40c-7f7a-4ed3-b3b5-dffcc716a332 in bucket 0509399b-3d5c-4c8e-8a43-a898b8c58163 due to: Connection refused (Connection refused)
2020-04-01 23:39:02,003 ERROR [Framework Task Thread Thread-3] o.a.nifi.groups.StandardProcessGroup Failed to synchronize StandardProcessGroup[identifier=0170101b-e28a-1807-55d0-ba1fc7c21ae5] with Flow Registry because could not retrieve version 1 of flow with identifier 555068bb-a5b6-4506-bbe4-25023010c5e5 in bucket 0509399b-3d5c-4c8e-8a43-a898b8c58163 due to: Connection refused (Connection refused)
2020-04-01 23:39:02,002 ERROR [Framework Task Thread Thread-2] o.a.nifi.groups.StandardProcessGroup Failed to synchronize StandardProcessGroup[identifier=01701000-e28a-1807-386c-9c1498a492f7] with Flow Registry because could not retrieve version 1 of flow with identifier 13163161-190b-40fb-8dd9-b9adf6c55245 in bucket 0509399b-3d5c-4c8e-8a43-a898b8c58163 due to: Connection refused (Connection refused)
2020-04-01 23:39:02,345 INFO [main] o.a.nifi.wali.LengthDelimitedJournal 21.25% of the way finished recovering journal ./flowfile_repository/journals/2460243665.journal, having recovered 69338 updates