Support Questions

Find answers, ask questions, and share your expertise

yarn out of memory issue

avatar
Explorer

Can anyone advice why we get the following issue even the systems are configured correctly?

 

java.lang.OutOfMemoryError: Unable to create new native thread. 

The yarn failed and when I checked the logs I see this error.

 

Thanks

4 REPLIES 4

avatar
Guru

Hi @saihadoop ,

 

The error "java.lang.OutOfMemoryError: unable to create new native thread" usually indicates that the OS could not meet the new thread creation request from the process (in your case is YARN). You may want to check if there was any spikes in the load on the nodes.

 

Please check /etc/security/limits.conf file to find out what is the configured maximum limit of nproc and nofile and whether the value is sufficient to handle the load for the service user. If the value is too low, you can try to edit the /etc/security/limits.conf on all nodes in the cluster, and add the following at the bottom of the file:

user soft nproc 30000
user hard nproc 30000
user hard nofile 30000
user soft nofile 30000

Where the "user" is the specific user running the YARN job. After that, save the file and try to run the job again.

 

Thanks and hope this helps,

Li

Li Wang, Technical Solution Manager


Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.

Learn more about the Cloudera Community:

Terms of Service

Community Guidelines

How to use the forum

avatar
Explorer

Thank you for your response. I will check the settings.

 

This is the first time we had this issue. We have jobs schedule to run everyday. Any idea why the yarn failed on a particular day.

 

Thanks

avatar
Guru

@saihadoop ,

 

If you can isolate the issue to a particular day, you may want to check whether there are any cron jobs running on that day. 

 

Thanks,

Li

Li Wang, Technical Solution Manager


Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.

Learn more about the Cloudera Community:

Terms of Service

Community Guidelines

How to use the forum

avatar
Contributor

These items could also help.

 

- Check the jobs that are running during the time of incident. You can also check via CLI in the master node using the command below and observe what are the jobs running.

   $ yarn top

- If yes, reduce the job running meaning you can do a job scheduling to manage the jobs running on a specific time.

- Another thing, if the cluster unable to manage your jobs you'll need to expand the cluster to have better performance.

 

Thanks.