Created on 07-25-2018 12:41 AM - edited 09-16-2022 06:30 AM
Hi All,
I have a RHEL 7.5 AWS M5.xlarge instance. It is a 16GB, 4 core machine.
I have done the path B manual installation of CDH using CDH manager, the installation was successfull.
After, installation the resource manager in yarn is down, i have tried some memory settings but it is not working. I am continuesly facing yarn tuning issues, it works for couple daya and goes down.
Can someone help me with following memeory allocation for a 16GB single node cluster:
yarn.app.mapreduce.am.resource.mb
mapreduce.map.memory.mb
mapreduce.reduce.memory.mb
mapreduce.job.heap.memory-mb.ratio
Client Java Heap Size in Bytes
Java Heap Size of JobHistory Server in Bytes
memory.soft_limit_in_bytes
Java Heap Size of NodeManager in Bytes
Created 07-25-2018 12:48 AM
Created on 07-25-2018 01:06 AM - edited 07-25-2018 02:28 AM
/************************************************************
STARTUP_MSG: Starting ResourceManager
STARTUP_MSG: user = yarn
STARTUP_MSG: host = ip-172-31-25-185.ap-south-1.compute.internal/172.31.25.185
STARTUP_MSG: args = []
STARTUP_MSG: version = 2.6.0-cdh5.15.0
STARTUP_MSG: classpath = /run/cloudera-scm-agent/process/276-yarn-RESOURCEMANAGER:/run/cloudera-scm-agent/process/276-yarn-RESOURCEMANAGER:/run/cloudera-scm-agent/process/276-yarn-
STARTUP_MSG: build = http://github.com/cloudera/hadoop -r e3cb23a1cb2b89d074171b44e71f207c3d6ffa50; compiled by 'jenkins' on 2018-05-24T11:19Z
STARTUP_MSG: java = 1.8.0_144
************************************************************/
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1532505341498_0002_000001 State change from ALLOCATED_SAVING to ALLOCATED on event = ATTEMPT_NEW_SAVED
2018-07-25 07:56:02,246 INFO org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Launching masterappattempt_1532505341498_0002_000001
2018-07-25 07:56:02,249 INFO org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Setting up container Container: [ContainerId: container_1532505341498_0002_01_000001, NodeId: ip-172-31-25-185.ap-south-1.compute.internal:8041, NodeHttpAddress: ip-172-31-25-185.ap-south-1.compute.internal:8042, Resource: <memory:2048, vCores:1>, Priority: 0, Token: Token { kind: ContainerToken, service: 172.31.25.185:8041 }, ] for AM appattempt_1532505341498_0002_000001
2018-07-25 07:56:02,249 INFO org.apache.hadoop.yarn.server.resourcemanager.security.AMRMTokenSecretManager: Create AMRMToken for ApplicationAttempt: appattempt_1532505341498_0002_000001
2018-07-25 07:56:02,249 INFO org.apache.hadoop.yarn.server.resourcemanager.security.AMRMTokenSecretManager: Creating password for appattempt_1532505341498_0002_000001
Created 07-25-2018 01:07 AM
Above is the resource manager log.
Should i provide any more logs?
Created 07-25-2018 01:19 AM
Hi @Riteshk,
In the log that you provided there are not ERROR trace. You have started today the service Resource Manger and then not error logs have been registered.
You have any logs from RM with ERROR trace?
Regards,
Manu.
Created 07-25-2018 01:32 AM
Wed Jul 25 06:42:31 UTC 2018 JAVA_HOME=/usr/java/jdk1.8.0_144 using /usr/java/jdk1.8.0_144 as JAVA_HOME using 5 as CDH_VERSION using /opt/cloudera/parcels/CDH-5.15.0-1.cdh5.15.0.p0.21/lib/hadoop-yarn as CDH_YARN_HOME using /opt/cloudera/parcels/CDH-5.15.0-1.cdh5.15.0.p0.21/lib/hadoop-mapreduce as CDH_MR2_HOME using /run/cloudera-scm-agent/process/278-yarn-RESOURCEMANAGER as CONF_DIR CONF_DIR=/run/cloudera-scm-agent/process/278-yarn-RESOURCEMANAGER CMF_CONF_DIR=/etc/cloudera-scm-agent Wed Jul 25 08:29:22 UTC 2018 JAVA_HOME=/usr/java/jdk1.8.0_144 using /usr/java/jdk1.8.0_144 as JAVA_HOME using 5 as CDH_VERSION using /opt/cloudera/parcels/CDH-5.15.0-1.cdh5.15.0.p0.21/lib/hadoop-yarn as CDH_YARN_HOME using /opt/cloudera/parcels/CDH-5.15.0-1.cdh5.15.0.p0.21/lib/hadoop-mapreduce as CDH_MR2_HOME using /run/cloudera-scm-agent/process/278-yarn-RESOURCEMANAGER as CONF_DIR CONF_DIR=/run/cloudera-scm-agent/process/278-yarn-RESOURCEMANAGER CMF_CONF_DIR=/etc/cloudera-scm-agent # # There is insufficient memory for the Java Runtime Environment to continue. # Native memory allocation (mmap) failed to map 724828160 bytes for committing reserved memory. # An error report file with more information is saved as: # /run/cloudera-scm-agent/process/278-yarn-RESOURCEMANAGER/hs_err_pid16884.log Wed Jul 25 08:29:24 UTC 2018 JAVA_HOME=/usr/java/jdk1.8.0_144 using /usr/java/jdk1.8.0_144 as JAVA_HOME using 5 as CDH_VERSION using /opt/cloudera/parcels/CDH-5.15.0-1.cdh5.15.0.p0.21/lib/hadoop-yarn as CDH_YARN_HOME using /opt/cloudera/parcels/CDH-5.15.0-1.cdh5.15.0.p0.21/lib/hadoop-mapreduce as CDH_MR2_HOME using /run/cloudera-scm-agent/process/278-yarn-RESOURCEMANAGER as CONF_DIR CONF_DIR=/run/cloudera-scm-agent/process/278-yarn-RESOURCEMANAGER CMF_CONF_DIR=/etc/cloudera-scm-agent # # There is insufficient memory for the Java Runtime Environment to continue. # Native memory allocation (mmap) failed to map 724828160 bytes for committing reserved memory. # An error report file with more information is saved as: # /run/cloudera-scm-agent/process/278-yarn-RESOURCEMANAGER/hs_err_pid16973.log Wed Jul 25 08:29:26 UTC 2018 JAVA_HOME=/usr/java/jdk1.8.0_144 using /usr/java/jdk1.8.0_144 as JAVA_HOME using 5 as CDH_VERSION using /opt/cloudera/parcels/CDH-5.15.0-1.cdh5.15.0.p0.21/lib/hadoop-yarn as CDH_YARN_HOME using /opt/cloudera/parcels/CDH-5.15.0-1.cdh5.15.0.p0.21/lib/hadoop-mapreduce as CDH_MR2_HOME using /run/cloudera-scm-agent/process/278-yarn-RESOURCEMANAGER as CONF_DIR CONF_DIR=/run/cloudera-scm-agent/process/278-yarn-RESOURCEMANAGER CMF_CONF_DIR=/etc/cloudera-scm-agent # # There is insufficient memory for the Java Runtime Environment to continue. # Native memory allocation (mmap) failed to map 724828160 bytes for committing reserved memory. # An error report file with more information is saved as: # /run/cloudera-scm-agent/process/278-yarn-RESOURCEMANAGER/hs_err_pid17063.log Wed Jul 25 08:29:29 UTC 2018 JAVA_HOME=/usr/java/jdk1.8.0_144 using /usr/java/jdk1.8.0_144 as JAVA_HOME using 5 as CDH_VERSION using /opt/cloudera/parcels/CDH-5.15.0-1.cdh5.15.0.p0.21/lib/hadoop-yarn as CDH_YARN_HOME using /opt/cloudera/parcels/CDH-5.15.0-1.cdh5.15.0.p0.21/lib/hadoop-mapreduce as CDH_MR2_HOME using /run/cloudera-scm-agent/process/278-yarn-RESOURCEMANAGER as CONF_DIR CONF_DIR=/run/cloudera-scm-agent/process/278-yarn-RESOURCEMANAGER CMF_CONF_DIR=/etc/cloudera-scm-agent # # There is insufficient memory for the Java Runtime Environment to continue. # Native memory allocation (mmap) failed to map 724828160 bytes for committing reserved memory. # An error report file with more information is saved as: # /run/cloudera-scm-agent/process/278-yarn-RESOURCEMANAGER/hs_err_pid17153.log
Created 07-25-2018 01:33 AM
It is releated to memeory, logs shows that, i am not sure what values to set for the same.
Created 07-25-2018 02:29 AM
Well @Riteshk,
You are clearly asking for a lot more than is physically available on your system.
Try with this configuration:
yarn.app.mapreduce.am.resource.mb 2gb
yarn.nodemanager.resource.memory-mb 2gb
yarn.scheduler.minimum-allocation-mb 1gb
yarn.scheduler.maximum-allocation-mb 2gb
If work fine but slow, increase these parameters viewing the top memory usage.
Regards,
Manu.
Created 07-25-2018 02:48 AM
New Logs after changing the memory:
9:45:27.031 AM INFO RMAppImpl Application application_1532511906743_0001 failed 2 times due to AM Container for appattempt_1532511906743_0001_000002 exited with exitCode: 143 For more detailed output, check application tracking page:http://ip-172-31-25-185.ap-south-1.compute.internal:8088/proxy/application_1532511906743_0001/Then, click on links to logs of each attempt. Diagnostics: Container killed on request. Exit code is 143 Container exited with a non-zero exit code 143 Killed by external signal Failing this attempt. Failing the application. 9:45:27.032 AM INFO RMAppImpl application_1532511906743_0001 State change from FINAL_SAVING to FAILED on event = APP_UPDATE_SAVED 9:45:27.033 AM WARN RMAuditLogger USER=dr.who OPERATION=Application Finished - Failed TARGET=RMAppManager RESULT=FAILURE DESCRIPTION=App failed with state: FAILED PERMISSIONS=Application application_1532511906743_0001 failed 2 times due to AM Container for appattempt_1532511906743_0001_000002 exited with exitCode: 143 For more detailed output, check application tracking page:http://ip-172-31-25-185.ap-south-1.compute.internal:8088/proxy/application_1532511906743_0001/Then, click on links to logs of each attempt. Diagnostics: Container killed on request. Exit code is 143 Container exited with a non-zero exit code 143 Killed by external signal Failing this attempt. Failing the application. APPID=application_1532511906743_0001 9:45:27.035 AM INFO RMAppManager$ApplicationSummary appId=application_1532511906743_0001,name=hadoop,user=dr.who,queue=root.users.dr_dot_who,state=FAILED,trackingUrl=http://ip-172-31-25-185.ap-south-1.compute.internal:8088/cluster/app/application_1532511906743_0001,..., vCores:0>
Created 07-25-2018 02:57 AM
Exit code 143 is related to Memory issues. Your default Mapper/reducer memory setting may not be sufficient to run the large data set. Try setting up higher AM, MAP and REDUCER memory when a large yarn job is invoked.
Regards,
Manu.