About punit9876231

punit9876231 · ‎07-09-2018

i have a 6 DN cluster and every second all the nodemanager is getting down. post the log in https://community.hortonworks.com/questions/202914/node-manager-is-getting-down-after-few-seconds.html and now reducer job is also getting failed. 2018-07-09 06:25:26,262 WARN logaggregation.AppLogAggregatorImpl (AppLogAggregatorImpl.java:<init>(182)) - rollingMonitorInterval is set as -1. The log rolling mornitoring interval is disabled. The logs will be aggregated after this application is finished. 2018-07-09 06:25:26,616 WARN nodemanager.DefaultContainerExecutor (DefaultContainerExecutor.java:launchContainer(223)) - Exit code from container container_1531130193317_0195_02_000001 is : 143 2018-07-09 06:25:26,624 WARN nodemanager.DefaultContainerExecutor (DefaultContainerExecutor.java:launchContainer(223)) - Exit code from container container_1531130193317_0197_01_000001 is : 143 2018-07-09 06:25:26,712 WARN nodemanager.NMAuditLogger (NMAuditLogger.java:logFailure(150)) - USER=dr.whoOPERATION=Container Finished - FailedTARGET=ContainerImplRESULT=FAILUREDESCRIPTION=Container failed with state: EXITED_WITH_FAILUREAPPID=application_1531130193317_0195CONTAINERID=container_1531130193317_0195_02_000001 2018-07-09 06:25:26,819 WARN nodemanager.NMAuditLogger (NMAuditLogger.java:logFailure(150)) - USER=dr.whoOPERATION=Container Finished - FailedTARGET=ContainerImplRESULT=FAILUREDESCRIPTION=Container failed with state: EXITED_WITH_FAILUREAPPID=application_1531130193317_0197CONTAINERID=container_1531130193317_0197_01_000001 2018-07-09 06:25:30,271 WARN logaggregation.AppLogAggregatorImpl (AppLogAggregatorImpl.java:<init>(182)) - rollingMonitorInterval is set as -1. The log rolling mornitoring interval is disabled. The logs will be aggregated after this application is finished. 2018-07-09 06:25:30,534 WARN nodemanager.DefaultContainerExecutor (DefaultContainerExecutor.java:launchContainer(223)) - Exit code from container container_1531130193317_0198_02_000001 is : 143 2018-07-09 06:25:30,600 WARN nodemanager.NMAuditLogger (NMAuditLogger.java:logFailure(150)) - USER=dr.whoOPERATION=Container Finished - FailedTARGET=ContainerImplRESULT=FAILUREDESCRIPTION=Container failed with state: EXITED_WITH_FAILUREAPPID=application_1531130193317_0198CONTAINERID=container_1531130193317_0198_02_000001 2018-07-09 06:25:31,258 WARN logaggregation.AppLogAggregatorImpl (AppLogAggregatorImpl.java:<init>(182)) - rollingMonitorInterval is set as -1. The log rolling mornitoring interval is disabled. The logs will be aggregated after this application is finished. 2018-07-09 06:25:31,422 WARN nodemanager.DefaultContainerExecutor (DefaultContainerExecutor.java:launchContainer(223)) - Exit code from container container_1531130193317_0200_02_000001 is : 143 2018-07-09 06:25:31,486 WARN nodemanager.NMAuditLogger (NMAuditLogger.java:logFailure(150)) - USER=dr.whoOPERATION=Container Finished - FailedTARGET=ContainerImplRESULT=FAILUREDESCRIPTION=Container failed with state: EXITED_WITH_FAILUREAPPID=application_1531130193317_0200CONTAINERID=container_1531130193317_0200_02_000001

punit9876231 · ‎07-09-2018

@Sandeep Nemuri i checked the log again,this time i got this error. and code 143 error 2018-07-09 04:20:14,067 WARN containermanager.ContainerManagerImpl (ContainerManagerImpl.java:handle(1083)) - Event EventType: FINISH_APPLICATION sent to absent application application_1531115940804_0668 2018-07-09 04:20:15,985 WARN logaggregation.AppLogAggregatorImpl (AppLogAggregatorImpl.java:<init>(182)) - rollingMonitorInterval is set as -1. The log rolling mornitoring interval is disabled. The logs will be aggregated after this application is finished. 2018-07-09 04:20:16,705 WARN localizer.ResourceLocalizationService (ResourceLocalizationService.java:update(1023)) - { hdfs://ip-172-31-17-251.ec2.internal:8020/tmp/hive/root/_tez_session_dir/41b28f9d-f7ae-4652-b9a7-ffed72220a41/.tez/application_1531115940804_0607/tez.session.local-resources.pb, 1531123411553, FILE, null } failed: File does not exist: hdfs://ip-172-31-17-251.ec2.internal:8020/tmp/hive/root/_tez_session_dir/41b28f9d-f7ae-4652-b9a7-ffed72220a41/.tez/application_1531115940804_0607/tez.session.local-resources.pb 2018-07-09 04:20:16,719 WARN nodemanager.NMAuditLogger (NMAuditLogger.java:logFailure(150)) - USER=rootOPERATION=Container Finished - FailedTARGET=ContainerImplRESULT=FAILUREDESCRIPTION=Container failed with state: LOCALIZATION_FAILEDAPPID=application_1531115940804_0607CONTAINERID=container_1531115940804_0607_02_000001 2018-07-09 04:20:16,719 WARN ipc.Client (Client.java:call(1446)) - interrupted waiting to send rpc request to server 2018-07-09 04:20:52,186 WARN logaggregation.AppLogAggregatorImpl (AppLogAggregatorImpl.java:<init>(182)) - rollingMonitorInterval is set as -1. The log rolling mornitoring interval is disabled. The logs will be aggregated after this application is finished. 2018-07-09 04:25:34,693 WARN containermanager.AuxServices (AuxServices.java:serviceInit(130)) - The Auxilurary Service named 'mapreduce_shuffle' in the configuration is for class class org.apache.hadoop.mapred.ShuffleHandler which has a name of 'httpshuffle'. Because these are not the same tools trying to send ServiceData and read Service Meta Data may have issues unless the refer to the name in the config. 2018-07-09 04:25:34,761 WARN monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:serviceInit(154)) - NodeManager configured with 61.4 G physical memory allocated to containers, which is more than 80% of the total physical memory available (62.9 G). Thrashing might happen. 2018-07-09 04:25:35,592 WARN containermanager.ContainerManagerImpl (ContainerManagerImpl.java:handle(1067)) - Event EventType: KILL_CONTAINER sent to absent container container_1531115940804_0760_01_000001 2018-07-09 04:25:35,592 WARN containermanager.ContainerManagerImpl (ContainerManagerImpl.java:handle(1083)) - Event EventType: FINISH_APPLICATION sent to absent application application_1531115940804_0760 i have 6 DD( 62GB RAM, 16vcore). mappred-sire.xml mapreduce.map.java.opts -Xmx6656m mapreduce.map.memory.mb 10000 mapreduce.reduce.java.opts -Xmx12800m mapreduce.reduce.memory.mb 16000 yarn-site.xml yarn.nodemanager.resource.memory-mb 62900 yarn.scheduler.minimum-allocation-mb 6656 yarn.scheduler.maximum-allocation-mb 62900

punit9876231 · ‎07-06-2018

I have a cluster of 6 data nodes and on starting the node manager it is getting down in every data nodes. i checked the logs at /var/log/hadoop-yarn/yarn and there is no error message. this is the warning message i got. 2018-07-06 01:53:37,256 WARN logaggregation.AppLogAggregatorImpl (AppLogAggregatorImpl.java:<init>(182)) - rollingMonitorInterval is set as -1. The log rolling mornitoring interval is disabled. The logs will be aggregated after this application is finished. 2018-07-06 01:53:39,261 WARN logaggregation.AppLogAggregatorImpl (AppLogAggregatorImpl.java:<init>(182)) - rollingMonitorInterval is set as -1. The log rolling mornitoring interval is disabled. The logs will be aggregated after this application is finished. 2018-07-06 01:53:44,913 WARN nodemanager.DefaultContainerExecutor (DefaultContainerExecutor.java:launchContainer(223)) - Exit code from container container_1530855405184_0121_02_000001 is : 143 2018-07-06 01:53:44,933 WARN nodemanager.NMAuditLogger (NMAuditLogger.java:logFailure(150)) - USER=dr.whoOPERATION=Container Finished - FailedTARGET=ContainerImplRESULT=FAILUREDESCRIPTION=Container failed with state: EXITED_WITH_FAILUREAPPID=application_1530855405184_0121CONTAINERID=container_1530855405184_0121_02_000001 2018-07-06 01:56:11,578 WARN logaggregation.AppLogAggregatorImpl (AppLogAggregatorImpl.java:<init>(182)) - rollingMonitorInterval is set as -1. The log rolling mornitoring interval is disabled. The logs will be aggregated after this application is finished.

punit9876231 · ‎09-17-2017

Hi, guys. I don't know why disk usage is different when I'm running du & df. 'hdfs dfs -du -h -s /' gives around 2TB in total and 'hadoop fs -df' does Filesystem Size Used Available Use% hdfs://ip:8020 15.8 T 12.6 T 2.3 T 80% and 'sudo -u hdfs hdfs fsck /' gives me this Total size: 2158294971710 B (Total open files size: 341851 B) Total dirs: 627169 Total files: 59276 Total symlinks: 0 (Files currently being written: 13) Total blocks (validated): 23879 (avg. block size 90384646 B) (Total open file blocks (not validated): 13) Minimally replicated blocks: 23879 (100.0 %) Over-replicated blocks: 0 (0.0 %) Under-replicated blocks: 8 (0.03350224 %) Mis-replicated blocks: 0 (0.0 %) Default replication factor: 2 Average block replication: 2.0228233 Corrupt blocks: 0 Missing replicas: 32 (0.066204615 %) Number of data-nodes: 6 Number of racks: 1 Let me know why and how I can get my mysteriously used space. Thanks.

punit9876231 · ‎01-25-2017

thnx @Jay SenSharma now its working. But now when i am doing some query in hue it doesnt show any output

punit9876231 · ‎01-25-2017

when i start hue i get this issue on hue hadoop.hdfs_clusters.default.webhdfs_urlCurrent value: http://namenode:50070/webhdfs/v1 Filesystem root '/' should be owned by 'hdfs'

punit9876231 · ‎01-24-2017

@Jay SenSharma Thnx, now its working.

punit9876231 · ‎01-24-2017

@Jay SenSharma now am getting this error in datanode.log 2017-01-24 03:39:19,891 INFO common.Storage (Storage.java:tryLock(715)) - Lock on /mnt/disk1/hadoop/hdfs/data/in_use.lock acquired by nodename 1491@datanode.ec2.internal 2017-01-24 03:39:19,902 INFO common.Storage (Storage.java:tryLock(715)) - Lock on /mnt/disk2/hadoop/hdfs/data/in_use.lock acquired by nodename 1491@datanode.ec2.internal 2017-01-24 03:39:19,903 FATAL datanode.DataNode (BPServiceActor.java:run(840)) - Initialization failed for Block pool <registering> (Datanode Uuid unassigned) service to namenode.ec2.internal/ namenode:8020. Exiting. java.io.IOException: Incompatible clusterIDs in /mnt/disk1/hadoop/hdfs/data: namenode clusterID = CID-297a140f-7cd6-4c73-afc8-bd0a7d01c0ee; datanode clusterID = CID-7591e6bd-ce9b-4b14-910c-c9603892a0f1 at org.apache.hadoop.hdfs.server.datanode.DataStorage.doTransition(DataStorage.java:646) at org.apache.hadoop.hdfs.server.datanode.DataStorage.addStorageLocations(DataStorage.java:320) at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:403) at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:422) at org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:1311) at org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:1276) at org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:314) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:220) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:828) at java.lang.Thread.run(Thread.java:745) 2017-01-24 03:39:19,904 WARN datanode.DataNode (BPServiceActor.java:run(861)) - Ending block pool service for: Block pool <registering> (Datanode Uuid unassigned) service to ip-172-31-17-251.ec2.internal/172.31.17.251:8020 2017-01-24 03:39:20,005 INFO datanode.DataNode (BlockPoolManager.java:remove(103)) - Removed Block pool <registering> (Datanode Uuid unassigned) 2017-01-24 03:39:22,005 WARN datanode.DataNode (DataNode.java:secureMain(2392)) - Exiting Datanode 2017-01-24 03:39:22,007 INFO util.ExitUtil (ExitUtil.java:terminate(124)) - Exiting with status 0 2017-01-24 03:39:22,008 INFO datanode.DataNode (StringUtils.java:run(659)) - SHUTDOWN_MSG: /************************************************************ SHUTDOWN_MSG: Shutting down DataNode at datanode.ec2.internal/datanode ************************************************************/

punit9876231 · ‎01-24-2017

yeah that was right total 4 drwxrwxrwt. 2 hdfs hadoop 4096 Nov 19 2014 cache srw-rw-rw-. 1 hdfs hadoop 0 Jan 24 03:39 dn_socket actually non of my datanode host is working. is that memory issue.

punit9876231 · ‎01-24-2017

@Jay SenSharma so the error is in log file of permissions

Online	Offline
Last Visited	‎12-06-2018 10:37 AM

Member Since	‎01-24-2017 08:52 AM
Last Visited	‎12-06-2018 10:37 AM
Posts	26

Cloudera Community

container failed with exit code 143

Re: node manager is getting down after few seconds

node manager is getting down after few seconds

space used in hdfs is different from free space

Re: Filesystem root '/' should be owned by 'hdfs' ...

Filesystem root '/' should be owned by 'hdfs' show...

Re: Datanode goes dows after few secs of starting

Re: Datanode goes dows after few secs of starting

Re: Datanode goes dows after few secs of starting

Re: Datanode goes dows after few secs of starting