Created 07-06-2018 06:20 AM
I have a cluster of 6 data nodes and on starting the node manager it is getting down in every data nodes. i checked the logs at
/var/log/hadoop-yarn/yarn and there is no error message.
this is the warning message i got.
2018-07-06 01:53:37,256 WARN logaggregation.AppLogAggregatorImpl (AppLogAggregatorImpl.java:<init>(182)) - rollingMonitorInterval is set as -1. The log rolling mornitoring interval is disabled. The logs will be aggregated after this application is finished. 2018-07-06 01:53:39,261 WARN logaggregation.AppLogAggregatorImpl (AppLogAggregatorImpl.java:<init>(182)) - rollingMonitorInterval is set as -1. The log rolling mornitoring interval is disabled. The logs will be aggregated after this application is finished. 2018-07-06 01:53:44,913 WARN nodemanager.DefaultContainerExecutor (DefaultContainerExecutor.java:launchContainer(223)) - Exit code from container container_1530855405184_0121_02_000001 is : 143 2018-07-06 01:53:44,933 WARN nodemanager.NMAuditLogger (NMAuditLogger.java:logFailure(150)) - USER=dr.whoOPERATION=Container Finished - FailedTARGET=ContainerImplRESULT=FAILUREDESCRIPTION=Container failed with state: EXITED_WITH_FAILUREAPPID=application_1530855405184_0121CONTAINERID=container_1530855405184_0121_02_000001 2018-07-06 01:56:11,578 WARN logaggregation.AppLogAggregatorImpl (AppLogAggregatorImpl.java:<init>(182)) - rollingMonitorInterval is set as -1. The log rolling mornitoring interval is disabled. The logs will be aggregated after this application is finished.
Created 07-06-2018 07:38 AM
The log trace says, that the container has exited with code 143 and we don't have any trace of nodemanager going down. Do check the logs again or provide the same here.
Created 07-09-2018 09:32 AM
i checked the log again,this time i got this error. and code 143 error
2018-07-09 04:20:14,067 WARN containermanager.ContainerManagerImpl (ContainerManagerImpl.java:handle(1083)) - Event EventType: FINISH_APPLICATION sent to absent application application_1531115940804_0668 2018-07-09 04:20:15,985 WARN logaggregation.AppLogAggregatorImpl (AppLogAggregatorImpl.java:<init>(182)) - rollingMonitorInterval is set as -1. The log rolling mornitoring interval is disabled. The logs will be aggregated after this application is finished. 2018-07-09 04:20:16,705 WARN localizer.ResourceLocalizationService (ResourceLocalizationService.java:update(1023)) - { hdfs://ip-172-31-17-251.ec2.internal:8020/tmp/hive/root/_tez_session_dir/41b28f9d-f7ae-4652-b9a7-ffed72220a41/.tez/application_1531115940804_0607/tez.session.local-resources.pb, 1531123411553, FILE, null } failed: File does not exist: hdfs://ip-172-31-17-251.ec2.internal:8020/tmp/hive/root/_tez_session_dir/41b28f9d-f7ae-4652-b9a7-ffed72220a41/.tez/application_1531115940804_0607/tez.session.local-resources.pb 2018-07-09 04:20:16,719 WARN nodemanager.NMAuditLogger (NMAuditLogger.java:logFailure(150)) - USER=rootOPERATION=Container Finished - FailedTARGET=ContainerImplRESULT=FAILUREDESCRIPTION=Container failed with state: LOCALIZATION_FAILEDAPPID=application_1531115940804_0607CONTAINERID=container_1531115940804_0607_02_000001 2018-07-09 04:20:16,719 WARN ipc.Client (Client.java:call(1446)) - interrupted waiting to send rpc request to server 2018-07-09 04:20:52,186 WARN logaggregation.AppLogAggregatorImpl (AppLogAggregatorImpl.java:<init>(182)) - rollingMonitorInterval is set as -1. The log rolling mornitoring interval is disabled. The logs will be aggregated after this application is finished. 2018-07-09 04:25:34,693 WARN containermanager.AuxServices (AuxServices.java:serviceInit(130)) - The Auxilurary Service named 'mapreduce_shuffle' in the configuration is for class class org.apache.hadoop.mapred.ShuffleHandler which has a name of 'httpshuffle'. Because these are not the same tools trying to send ServiceData and read Service Meta Data may have issues unless the refer to the name in the config. 2018-07-09 04:25:34,761 WARN monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:serviceInit(154)) - NodeManager configured with 61.4 G physical memory allocated to containers, which is more than 80% of the total physical memory available (62.9 G). Thrashing might happen. 2018-07-09 04:25:35,592 WARN containermanager.ContainerManagerImpl (ContainerManagerImpl.java:handle(1067)) - Event EventType: KILL_CONTAINER sent to absent container container_1531115940804_0760_01_000001 2018-07-09 04:25:35,592 WARN containermanager.ContainerManagerImpl (ContainerManagerImpl.java:handle(1083)) - Event EventType: FINISH_APPLICATION sent to absent application application_1531115940804_0760
i have 6 DD( 62GB RAM, 16vcore).
mappred-sire.xml
mapreduce.map.java.opts -Xmx6656m mapreduce.map.memory.mb 10000 mapreduce.reduce.java.opts -Xmx12800m mapreduce.reduce.memory.mb 16000
yarn-site.xml
yarn.nodemanager.resource.memory-mb 62900 yarn.scheduler.minimum-allocation-mb 6656 yarn.scheduler.maximum-allocation-mb 62900
Created 07-08-2018 03:55 PM
yarn-hdfs-nodemanagerlog.tar.gzI have same problem with @Punkit Kumar. I have 5 VM (8GB RAM, 100GB HDD data, 4vcpu) amazon and install HDP 2.6 HDP 2.5 anything is fine exception one node management auto stop after few seconds.
The first time I think problem is ambari config wrong but after I try to manual installation Hadoop: 2.7.1; 2.7.3; 2.8.4 it's same a problem. Install Hadoop ok all namenode, datanode, resource management is running but when check with map reduce job (https://github.com/asmith26/python-mapreduce-examples) namenode auto stop. Plz, see config haddop.tar.gz file.
I had checked jdk1.7.0_67 & jdk1.8.0_112 same problem 😞