Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Resource manager is not working and process exiting unexpeted. Memeory issue

avatar
Explorer

Hi All,

 

I have a RHEL 7.5 AWS M5.xlarge instance. It is a 16GB, 4 core machine.

I have done the path B manual installation of CDH using CDH manager, the installation was successfull.

 

After, installation the resource manager in yarn is down, i have tried some memory settings but it is not working. I am continuesly facing yarn tuning issues, it works for couple daya and goes down.

 

Can someone help me with following memeory allocation for a 16GB single node cluster:

 

yarn.app.mapreduce.am.resource.mb

mapreduce.map.memory.mb

mapreduce.reduce.memory.mb

mapreduce.job.heap.memory-mb.ratio

Client Java Heap Size in Bytes

Java Heap Size of JobHistory Server in Bytes

memory.soft_limit_in_bytes

Java Heap Size of NodeManager in Bytes

Container Memory - yarn.nodemanager.resource.memory-mb
Java Heap Size of ResourceManager in Bytes
yarn.scheduler.minimum-allocation-mb
yarn.scheduler.increment-allocation-mb
yarn.scheduler.maximum-allocation-mb
 
Its a single cluster machine.
 
Thanks in advance.

 

12 REPLIES 12

avatar
Expert Contributor

Hi @Riteshk,

 

Can you post any log from Resource Manager?

 

Thanks.

 

Regards,

Manu.

avatar
Explorer

/************************************************************
STARTUP_MSG: Starting ResourceManager
STARTUP_MSG: user = yarn
STARTUP_MSG: host = ip-172-31-25-185.ap-south-1.compute.internal/172.31.25.185
STARTUP_MSG: args = []
STARTUP_MSG: version = 2.6.0-cdh5.15.0
STARTUP_MSG: classpath = /run/cloudera-scm-agent/process/276-yarn-RESOURCEMANAGER:/run/cloudera-scm-agent/process/276-yarn-RESOURCEMANAGER:/run/cloudera-scm-agent/process/276-yarn-
STARTUP_MSG: build = http://github.com/cloudera/hadoop -r e3cb23a1cb2b89d074171b44e71f207c3d6ffa50; compiled by 'jenkins' on 2018-05-24T11:19Z
STARTUP_MSG: java = 1.8.0_144
************************************************************/
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1532505341498_0002_000001 State change from ALLOCATED_SAVING to ALLOCATED on event = ATTEMPT_NEW_SAVED
2018-07-25 07:56:02,246 INFO org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Launching masterappattempt_1532505341498_0002_000001
2018-07-25 07:56:02,249 INFO org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Setting up container Container: [ContainerId: container_1532505341498_0002_01_000001, NodeId: ip-172-31-25-185.ap-south-1.compute.internal:8041, NodeHttpAddress: ip-172-31-25-185.ap-south-1.compute.internal:8042, Resource: <memory:2048, vCores:1>, Priority: 0, Token: Token { kind: ContainerToken, service: 172.31.25.185:8041 }, ] for AM appattempt_1532505341498_0002_000001
2018-07-25 07:56:02,249 INFO org.apache.hadoop.yarn.server.resourcemanager.security.AMRMTokenSecretManager: Create AMRMToken for ApplicationAttempt: appattempt_1532505341498_0002_000001
2018-07-25 07:56:02,249 INFO org.apache.hadoop.yarn.server.resourcemanager.security.AMRMTokenSecretManager: Creating password for appattempt_1532505341498_0002_000001

avatar
Explorer

Above is the resource manager log.

Should i provide any more logs?

avatar
Expert Contributor

Hi @Riteshk,

 

In the log that you provided there are not ERROR trace. You have started today the service Resource Manger and then not error logs have been registered.

 

You have any logs from RM with ERROR trace?

 

Regards,

Manu.

avatar
Explorer
Wed Jul 25 06:42:31 UTC 2018
JAVA_HOME=/usr/java/jdk1.8.0_144
using /usr/java/jdk1.8.0_144 as JAVA_HOME
using 5 as CDH_VERSION
using /opt/cloudera/parcels/CDH-5.15.0-1.cdh5.15.0.p0.21/lib/hadoop-yarn as CDH_YARN_HOME
using /opt/cloudera/parcels/CDH-5.15.0-1.cdh5.15.0.p0.21/lib/hadoop-mapreduce as CDH_MR2_HOME
using /run/cloudera-scm-agent/process/278-yarn-RESOURCEMANAGER as CONF_DIR
CONF_DIR=/run/cloudera-scm-agent/process/278-yarn-RESOURCEMANAGER
CMF_CONF_DIR=/etc/cloudera-scm-agent
Wed Jul 25 08:29:22 UTC 2018
JAVA_HOME=/usr/java/jdk1.8.0_144
using /usr/java/jdk1.8.0_144 as JAVA_HOME
using 5 as CDH_VERSION
using /opt/cloudera/parcels/CDH-5.15.0-1.cdh5.15.0.p0.21/lib/hadoop-yarn as CDH_YARN_HOME
using /opt/cloudera/parcels/CDH-5.15.0-1.cdh5.15.0.p0.21/lib/hadoop-mapreduce as CDH_MR2_HOME
using /run/cloudera-scm-agent/process/278-yarn-RESOURCEMANAGER as CONF_DIR
CONF_DIR=/run/cloudera-scm-agent/process/278-yarn-RESOURCEMANAGER
CMF_CONF_DIR=/etc/cloudera-scm-agent
#
# There is insufficient memory for the Java Runtime Environment to continue.
# Native memory allocation (mmap) failed to map 724828160 bytes for committing reserved memory.
# An error report file with more information is saved as:
# /run/cloudera-scm-agent/process/278-yarn-RESOURCEMANAGER/hs_err_pid16884.log
Wed Jul 25 08:29:24 UTC 2018
JAVA_HOME=/usr/java/jdk1.8.0_144
using /usr/java/jdk1.8.0_144 as JAVA_HOME
using 5 as CDH_VERSION
using /opt/cloudera/parcels/CDH-5.15.0-1.cdh5.15.0.p0.21/lib/hadoop-yarn as CDH_YARN_HOME
using /opt/cloudera/parcels/CDH-5.15.0-1.cdh5.15.0.p0.21/lib/hadoop-mapreduce as CDH_MR2_HOME
using /run/cloudera-scm-agent/process/278-yarn-RESOURCEMANAGER as CONF_DIR
CONF_DIR=/run/cloudera-scm-agent/process/278-yarn-RESOURCEMANAGER
CMF_CONF_DIR=/etc/cloudera-scm-agent
#
# There is insufficient memory for the Java Runtime Environment to continue.
# Native memory allocation (mmap) failed to map 724828160 bytes for committing reserved memory.
# An error report file with more information is saved as:
# /run/cloudera-scm-agent/process/278-yarn-RESOURCEMANAGER/hs_err_pid16973.log
Wed Jul 25 08:29:26 UTC 2018
JAVA_HOME=/usr/java/jdk1.8.0_144
using /usr/java/jdk1.8.0_144 as JAVA_HOME
using 5 as CDH_VERSION
using /opt/cloudera/parcels/CDH-5.15.0-1.cdh5.15.0.p0.21/lib/hadoop-yarn as CDH_YARN_HOME
using /opt/cloudera/parcels/CDH-5.15.0-1.cdh5.15.0.p0.21/lib/hadoop-mapreduce as CDH_MR2_HOME
using /run/cloudera-scm-agent/process/278-yarn-RESOURCEMANAGER as CONF_DIR
CONF_DIR=/run/cloudera-scm-agent/process/278-yarn-RESOURCEMANAGER
CMF_CONF_DIR=/etc/cloudera-scm-agent
#
# There is insufficient memory for the Java Runtime Environment to continue.
# Native memory allocation (mmap) failed to map 724828160 bytes for committing reserved memory.
# An error report file with more information is saved as:
# /run/cloudera-scm-agent/process/278-yarn-RESOURCEMANAGER/hs_err_pid17063.log
Wed Jul 25 08:29:29 UTC 2018
JAVA_HOME=/usr/java/jdk1.8.0_144
using /usr/java/jdk1.8.0_144 as JAVA_HOME
using 5 as CDH_VERSION
using /opt/cloudera/parcels/CDH-5.15.0-1.cdh5.15.0.p0.21/lib/hadoop-yarn as CDH_YARN_HOME
using /opt/cloudera/parcels/CDH-5.15.0-1.cdh5.15.0.p0.21/lib/hadoop-mapreduce as CDH_MR2_HOME
using /run/cloudera-scm-agent/process/278-yarn-RESOURCEMANAGER as CONF_DIR
CONF_DIR=/run/cloudera-scm-agent/process/278-yarn-RESOURCEMANAGER
CMF_CONF_DIR=/etc/cloudera-scm-agent
#
# There is insufficient memory for the Java Runtime Environment to continue.
# Native memory allocation (mmap) failed to map 724828160 bytes for committing reserved memory.
# An error report file with more information is saved as:
# /run/cloudera-scm-agent/process/278-yarn-RESOURCEMANAGER/hs_err_pid17153.log

avatar
Explorer

It is releated to memeory, logs shows that, i am not sure what values to set for the same.

avatar
Expert Contributor

Well @Riteshk,

 

You are clearly asking for a lot more than is physically available on your system.

Try with this configuration:

 

 yarn.app.mapreduce.am.resource.mb 2gb
 yarn.nodemanager.resource.memory-mb 2gb
 yarn.scheduler.minimum-allocation-mb 1gb
 yarn.scheduler.maximum-allocation-mb 2gb

 

If work fine but slow, increase these parameters viewing the top memory usage.

 

Regards, 

Manu.

avatar
Explorer

New Logs after changing the memory:

 

9:45:27.031 AM INFO RMAppImpl Application application_1532511906743_0001 failed 2 times due to AM Container for appattempt_1532511906743_0001_000002 exited with exitCode: 143 For more detailed output, check application tracking page:http://ip-172-31-25-185.ap-south-1.compute.internal:8088/proxy/application_1532511906743_0001/Then, click on links to logs of each attempt. Diagnostics: Container killed on request. Exit code is 143 Container exited with a non-zero exit code 143 Killed by external signal Failing this attempt. Failing the application. 9:45:27.032 AM INFO RMAppImpl application_1532511906743_0001 State change from FINAL_SAVING to FAILED on event = APP_UPDATE_SAVED 9:45:27.033 AM WARN RMAuditLogger USER=dr.who OPERATION=Application Finished - Failed TARGET=RMAppManager RESULT=FAILURE DESCRIPTION=App failed with state: FAILED PERMISSIONS=Application application_1532511906743_0001 failed 2 times due to AM Container for appattempt_1532511906743_0001_000002 exited with exitCode: 143 For more detailed output, check application tracking page:http://ip-172-31-25-185.ap-south-1.compute.internal:8088/proxy/application_1532511906743_0001/Then, click on links to logs of each attempt. Diagnostics: Container killed on request. Exit code is 143 Container exited with a non-zero exit code 143 Killed by external signal Failing this attempt. Failing the application. APPID=application_1532511906743_0001 9:45:27.035 AM INFO RMAppManager$ApplicationSummary appId=application_1532511906743_0001,name=hadoop,user=dr.who,queue=root.users.dr_dot_who,state=FAILED,trackingUrl=http://ip-172-31-25-185.ap-south-1.compute.internal:8088/cluster/app/application_1532511906743_0001,..., vCores:0>

avatar
Expert Contributor

Exit code 143 is related to Memory issues. Your default Mapper/reducer memory setting may not be sufficient to run the large data set. Try setting up higher AM, MAP and REDUCER memory when a large yarn job is invoked.

 

Regards, 

Manu.