Reply
Highlighted
Explorer
Posts: 20
Registered: ‎03-19-2018
Accepted Solution

Yarn Resource Manager Halts with java.lang.OutOfMemoryError: unable to create new native thread

YARN Resource Manager Halts with the OOM : Unable to create native thread and the Job fails over to standby Resource Manager in completing the Task.

 

Could you please let us know the root cause of the issue

 

ERROR Message :

 

2018-03-22 02:30:09,637 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: container_e189_1521451854044_2288_01_000002 Container Transitioned from ALLOCATED to ACQUIRED

2018-03-22 02:30:10,413 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: container_e189_1521451854044_2288_01_000002 Container Transitioned from ACQUIRED to RUNNING

2018-03-22 02:30:10,695 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo: checking for deactivate... 

2018-03-22 02:30:19,354 INFO org.apache.hadoop.yarn.server.webproxy.WebAppProxyServlet: hue is accessing unchecked http://server1:43045/ws/v1/mapreduce/jobs/job_1521451854044_2288 which is the app master GUI of application_1521451854044_2288 owned by edh_srv_prod

2018-03-22 02:30:30,212 INFO org.apache.hadoop.yarn.server.webproxy.WebAppProxyServlet: hue is accessing unchecked http://server1:43045/ws/v1/mapreduce/jobs/job_1521451854044_2288 which is the app master GUI of application_1521451854044_2288 owned by edh_srv_prod

2018-03-22 02:30:34,090 FATAL org.apache.hadoop.yarn.YarnUncaughtExceptionHandler: Thread Thread[2101925946@qtp-1878992188-14302,5,main] threw an Error.  Shutting down now...

java.lang.OutOfMemoryError: unable to create new native thread

               at java.lang.Thread.start0(Native Method)

               at java.lang.Thread.start(Thread.java:714)

               at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:1095)

               at sun.security.ssl.SSLSocketImpl.performInitialHandshake(SSLSocketImpl.java:1375)

               at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1403)

               at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1387)

               at org.mortbay.jetty.security.SslSocketConnector$SslConnection.run(SslSocketConnector.java:723)

               at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)

2018-03-22 02:30:34,093 INFO org.apache.hadoop.util.ExitUtil: Halt with status -1 Message: HaltException

 

 

yarn application -status application_1521451854044_2288

Application Report : 

               Application-Id : application_1521451854044_2288

               Application-Name : oozie:launcher:T=shell:W=OS_Changes_incremental_workflow:A=shell-b8b2:ID=0006766-180222181315002-oozie-oozi-W

               Application-Type : MAPREDUCE

               User : edh_srv_prod

               Queue : root.edh_srv_prod

               Start-Time : 1521710999557

               Finish-Time : 1521711593154

               Progress : 100%

               State : FINISHED

               Final-State : SUCCEEDED

               Tracking-URL : https://server1:19890/jobhistory/job/job_1521451854044_2288

               RPC Port : 40930

               AM Host : server3

               Aggregate Resource Allocation : 1809548 MB-seconds, 1181 vcore-seconds

               Log Aggregation Status : SUCCEEDED

               Diagnostics : Attempt recovered after RM restart

 

Posts: 1,664
Kudos: 325
Solutions: 262
Registered: ‎07-31-2013

Re: Yarn Resource Manager Halts with java.lang.OutOfMemoryError: unable to create new native thread

What CDH version are you using? If it is equal to or lower than 5.9.1 or 5.8.3, and you use a KMS service in the cluster (for HDFS Transparent Encryption Zone features), you may be hitting https://issues.apache.org/jira/browse/HADOOP-13838, which has been fixed in the bug-fix releases of CDH 5.8.4, 5.9.2, and 5.10.0 onwards.
Explorer
Posts: 20
Registered: ‎03-19-2018

Re: Yarn Resource Manager Halts with java.lang.OutOfMemoryError: unable to create new native thread

[ Edited ]

i'm currently on CDH - 5.8.2 with KMS service, but what's your thought about OS running out of PID  as the error message suggests to be likely so?

 

 

Posts: 1,664
Kudos: 325
Solutions: 262
Registered: ‎07-31-2013

Re: Yarn Resource Manager Halts with java.lang.OutOfMemoryError: unable to create new native thread

Thank you for confirming the CDH version. Do you also have a KMS service in the cluster? If yes, you're definitely hitting the aforementioned bug.

You're partially right about "OS running out of PID". More specifically, the YARN RM process runs into its 'no. of processes' (nproc) ulimit, which should be set to a high default (32k processes) if you are running Cloudera Manager. There's no reason YARN should normally be using threads counting upto 32k.
Explorer
Posts: 20
Registered: ‎03-19-2018

Re: Yarn Resource Manager Halts with java.lang.OutOfMemoryError: unable to create new native thread

Yes, we do have an KMS service in the cluster.Thanks for providing an clarity on "OS running out of PID"
Announcements