Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Yarn Resource Manager Halts with java.lang.OutOfMemoryError: unable to create new native thread

avatar
Explorer

YARN Resource Manager Halts with the OOM : Unable to create native thread and the Job fails over to standby Resource Manager in completing the Task.

How could i get this resolved ?

ERROR Message :

2018-03-22 02:30:09,637 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: container_e189_1521451854044_2288_01_000002 Container Transitioned from ALLOCATED to ACQUIRED

2018-03-22 02:30:10,413 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: container_e189_1521451854044_2288_01_000002 Container Transitioned from ACQUIRED to RUNNING

2018-03-22 02:30:10,695 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo: checking for deactivate...

2018-03-22 02:30:19,354 INFO org.apache.hadoop.yarn.server.webproxy.WebAppProxyServlet: hue is accessing unchecked http://server1:43045/ws/v1/mapreduce/jobs/job_1521451854044_2288 which is the app master GUI of application_1521451854044_2288 owned by edh_srv_prod

2018-03-22 02:30:30,212 INFO org.apache.hadoop.yarn.server.webproxy.WebAppProxyServlet: hue is accessing unchecked http://server1:43045/ws/v1/mapreduce/jobs/job_1521451854044_2288 which is the app master GUI of application_1521451854044_2288 owned by edh_srv_prod

2018-03-22 02:30:34,090 FATAL org.apache.hadoop.yarn.YarnUncaughtExceptionHandler: Thread Thread[2101925946@qtp-1878992188-14302,5,main] threw an Error. Shutting down now...

java.lang.OutOfMemoryError: unable to create new native thread

at java.lang.Thread.start0(Native Method)

at java.lang.Thread.start(Thread.java:714)

at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:1095)

at sun.security.ssl.SSLSocketImpl.performInitialHandshake(SSLSocketImpl.java:1375)

at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1403)

at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1387)

at org.mortbay.jetty.security.SslSocketConnector$SslConnection.run(SslSocketConnector.java:723)

at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)

2018-03-22 02:30:34,093 INFO org.apache.hadoop.util.ExitUtil: Halt with status -1 Message: HaltException

yarn application -status application_1521451854044_2288

Application Report :

Application-Id : application_1521451854044_2288

Application-Name : oozie:launcher:T=shell:W=OS_Changes_incremental_workflow:A=shell-b8b2:ID=0006766-180222181315002-oozie-oozi-W

Application-Type : MAPREDUCE

User : edh_srv_prod

Queue : root.edh_srv_prod

Start-Time : 1521710999557

Finish-Time : 1521711593154

Progress : 100%

State : FINISHED

Final-State : SUCCEEDED

Tracking-URL : https://server1:19890/jobhistory/job/job_1521451854044_2288

RPC Port : 40930

AM Host : server3

Aggregate Resource Allocation : 1809548 MB-seconds, 1181 vcore-seconds

Log Aggregation Status : SUCCEEDED

Diagnostics : Attempt recovered after RM restart

4 REPLIES 4

avatar
Expert Contributor

It's likely that the host has run out of PIDs, and that's why the RM can't create a new thread. Here are some commands that can help you identify whether this is the issue and increase the maximum number of PIDs allowed.

Check the number of threads running:

ps -elfT | wc -l

Check the current pid_max:

sysctl kernel.pid_max

Increase the pid_max:

sysctl -w kernel.pid_max=4194304

avatar
Expert Contributor

Another command that might be informative is checking the last PID assigned:

sysctl kernel.ns_last_pid

avatar
Explorer

Thank you... I shall make the required changes and keep an watch on the same

avatar
Expert Contributor

Please accept the answer if it fixes your problem.