Created on 01-16-2020 06:56 AM - last edited on 01-16-2020 09:15 AM by VidyaSargur
Can anyone advice why we get the following issue even the systems are configured correctly?
java.lang.OutOfMemoryError: Unable to create new native thread.
The yarn failed and when I checked the logs I see this error.
Thanks
Created 01-16-2020 01:06 PM
Hi @saihadoop ,
The error "java.lang.OutOfMemoryError: unable to create new native thread" usually indicates that the OS could not meet the new thread creation request from the process (in your case is YARN). You may want to check if there was any spikes in the load on the nodes.
Please check /etc/security/limits.conf file to find out what is the configured maximum limit of nproc and nofile and whether the value is sufficient to handle the load for the service user. If the value is too low, you can try to edit the /etc/security/limits.conf on all nodes in the cluster, and add the following at the bottom of the file:
user soft nproc 30000
user hard nproc 30000
user hard nofile 30000
user soft nofile 30000
Where the "user" is the specific user running the YARN job. After that, save the file and try to run the job again.
Thanks and hope this helps,
Li
Li Wang, Technical Solution Manager
Created 01-16-2020 03:19 PM
Thank you for your response. I will check the settings.
This is the first time we had this issue. We have jobs schedule to run everyday. Any idea why the yarn failed on a particular day.
Thanks
Created 01-22-2020 02:41 PM
If you can isolate the issue to a particular day, you may want to check whether there are any cron jobs running on that day.
Thanks,
Li
Li Wang, Technical Solution Manager
Created 01-22-2020 07:02 PM
These items could also help.
- Check the jobs that are running during the time of incident. You can also check via CLI in the master node using the command below and observe what are the jobs running.
$ yarn top
- If yes, reduce the job running meaning you can do a job scheduling to manage the jobs running on a specific time.
- Another thing, if the cluster unable to manage your jobs you'll need to expand the cluster to have better performance.
Thanks.