Member since
01-24-2017
26
Posts
0
Kudos Received
0
Solutions
07-10-2018
05:59 PM
@Vinicius Higa Murakami after fixing the sticky bit error, again i was unable to start node manager and in log there was no error msg after that again i tried to start it and at that time container is getting failed, log of that. and i have other development hadoop clusters in the aws which was working previously but now in every cluster node manager is getting down.
... View more
07-10-2018
01:14 PM
@Vinicius Higa Murakami i also got the same issue and added that sticky bit. it worked for me for some days but again node manager is getting down.
... View more
07-09-2018
10:48 AM
i have a 6 DN cluster and every second all the nodemanager is getting down. post the log in https://community.hortonworks.com/questions/202914/node-manager-is-getting-down-after-few-seconds.html and now reducer job is also getting failed. 2018-07-09 06:25:26,262 WARN logaggregation.AppLogAggregatorImpl (AppLogAggregatorImpl.java:<init>(182)) - rollingMonitorInterval is set as -1. The log rolling mornitoring interval is disabled. The logs will be aggregated after this application is finished.
2018-07-09 06:25:26,616 WARN nodemanager.DefaultContainerExecutor (DefaultContainerExecutor.java:launchContainer(223)) - Exit code from container container_1531130193317_0195_02_000001 is : 143
2018-07-09 06:25:26,624 WARN nodemanager.DefaultContainerExecutor (DefaultContainerExecutor.java:launchContainer(223)) - Exit code from container container_1531130193317_0197_01_000001 is : 143
2018-07-09 06:25:26,712 WARN nodemanager.NMAuditLogger (NMAuditLogger.java:logFailure(150)) - USER=dr.whoOPERATION=Container Finished - FailedTARGET=ContainerImplRESULT=FAILUREDESCRIPTION=Container failed with state: EXITED_WITH_FAILUREAPPID=application_1531130193317_0195CONTAINERID=container_1531130193317_0195_02_000001
2018-07-09 06:25:26,819 WARN nodemanager.NMAuditLogger (NMAuditLogger.java:logFailure(150)) - USER=dr.whoOPERATION=Container Finished - FailedTARGET=ContainerImplRESULT=FAILUREDESCRIPTION=Container failed with state: EXITED_WITH_FAILUREAPPID=application_1531130193317_0197CONTAINERID=container_1531130193317_0197_01_000001
2018-07-09 06:25:30,271 WARN logaggregation.AppLogAggregatorImpl (AppLogAggregatorImpl.java:<init>(182)) - rollingMonitorInterval is set as -1. The log rolling mornitoring interval is disabled. The logs will be aggregated after this application is finished.
2018-07-09 06:25:30,534 WARN nodemanager.DefaultContainerExecutor (DefaultContainerExecutor.java:launchContainer(223)) - Exit code from container container_1531130193317_0198_02_000001 is : 143
2018-07-09 06:25:30,600 WARN nodemanager.NMAuditLogger (NMAuditLogger.java:logFailure(150)) - USER=dr.whoOPERATION=Container Finished - FailedTARGET=ContainerImplRESULT=FAILUREDESCRIPTION=Container failed with state: EXITED_WITH_FAILUREAPPID=application_1531130193317_0198CONTAINERID=container_1531130193317_0198_02_000001
2018-07-09 06:25:31,258 WARN logaggregation.AppLogAggregatorImpl (AppLogAggregatorImpl.java:<init>(182)) - rollingMonitorInterval is set as -1. The log rolling mornitoring interval is disabled. The logs will be aggregated after this application is finished.
2018-07-09 06:25:31,422 WARN nodemanager.DefaultContainerExecutor (DefaultContainerExecutor.java:launchContainer(223)) - Exit code from container container_1531130193317_0200_02_000001 is : 143
2018-07-09 06:25:31,486 WARN nodemanager.NMAuditLogger (NMAuditLogger.java:logFailure(150)) - USER=dr.whoOPERATION=Container Finished - FailedTARGET=ContainerImplRESULT=FAILUREDESCRIPTION=Container failed with state: EXITED_WITH_FAILUREAPPID=application_1531130193317_0200CONTAINERID=container_1531130193317_0200_02_000001
... View more
Labels:
- Labels:
-
Apache Hadoop
-
Apache YARN
07-09-2018
09:32 AM
@Sandeep Nemuri i checked the log again,this time i got this error. and code 143 error 2018-07-09 04:20:14,067 WARN containermanager.ContainerManagerImpl (ContainerManagerImpl.java:handle(1083)) - Event EventType: FINISH_APPLICATION sent to absent application application_1531115940804_0668
2018-07-09 04:20:15,985 WARN logaggregation.AppLogAggregatorImpl (AppLogAggregatorImpl.java:<init>(182)) - rollingMonitorInterval is set as -1. The log rolling mornitoring interval is disabled. The logs will be aggregated after this application is finished.
2018-07-09 04:20:16,705 WARN localizer.ResourceLocalizationService (ResourceLocalizationService.java:update(1023)) - { hdfs://ip-172-31-17-251.ec2.internal:8020/tmp/hive/root/_tez_session_dir/41b28f9d-f7ae-4652-b9a7-ffed72220a41/.tez/application_1531115940804_0607/tez.session.local-resources.pb, 1531123411553, FILE, null } failed: File does not exist: hdfs://ip-172-31-17-251.ec2.internal:8020/tmp/hive/root/_tez_session_dir/41b28f9d-f7ae-4652-b9a7-ffed72220a41/.tez/application_1531115940804_0607/tez.session.local-resources.pb
2018-07-09 04:20:16,719 WARN nodemanager.NMAuditLogger (NMAuditLogger.java:logFailure(150)) - USER=rootOPERATION=Container Finished - FailedTARGET=ContainerImplRESULT=FAILUREDESCRIPTION=Container failed with state: LOCALIZATION_FAILEDAPPID=application_1531115940804_0607CONTAINERID=container_1531115940804_0607_02_000001
2018-07-09 04:20:16,719 WARN ipc.Client (Client.java:call(1446)) - interrupted waiting to send rpc request to server
2018-07-09 04:20:52,186 WARN logaggregation.AppLogAggregatorImpl (AppLogAggregatorImpl.java:<init>(182)) - rollingMonitorInterval is set as -1. The log rolling mornitoring interval is disabled. The logs will be aggregated after this application is finished.
2018-07-09 04:25:34,693 WARN containermanager.AuxServices (AuxServices.java:serviceInit(130)) - The Auxilurary Service named 'mapreduce_shuffle' in the configuration is for class class org.apache.hadoop.mapred.ShuffleHandler which has a name of 'httpshuffle'. Because these are not the same tools trying to send ServiceData and read Service Meta Data may have issues unless the refer to the name in the config.
2018-07-09 04:25:34,761 WARN monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:serviceInit(154)) - NodeManager configured with 61.4 G physical memory allocated to containers, which is more than 80% of the total physical memory available (62.9 G). Thrashing might happen.
2018-07-09 04:25:35,592 WARN containermanager.ContainerManagerImpl (ContainerManagerImpl.java:handle(1067)) - Event EventType: KILL_CONTAINER sent to absent container container_1531115940804_0760_01_000001
2018-07-09 04:25:35,592 WARN containermanager.ContainerManagerImpl (ContainerManagerImpl.java:handle(1083)) - Event EventType: FINISH_APPLICATION sent to absent application application_1531115940804_0760 i have 6 DD( 62GB RAM, 16vcore). mappred-sire.xml mapreduce.map.java.opts -Xmx6656m
mapreduce.map.memory.mb 10000
mapreduce.reduce.java.opts -Xmx12800m
mapreduce.reduce.memory.mb 16000 yarn-site.xml yarn.nodemanager.resource.memory-mb 62900
yarn.scheduler.minimum-allocation-mb 6656
yarn.scheduler.maximum-allocation-mb 62900
... View more
07-06-2018
06:20 AM
I have a cluster of 6 data nodes and on starting the node manager it is getting down in every data nodes. i checked the logs at /var/log/hadoop-yarn/yarn and there is no error message. this is the warning message i got. 2018-07-06 01:53:37,256 WARN logaggregation.AppLogAggregatorImpl (AppLogAggregatorImpl.java:<init>(182)) - rollingMonitorInterval is set as -1. The log rolling mornitoring interval is disabled. The logs will be aggregated after this application is finished.
2018-07-06 01:53:39,261 WARN logaggregation.AppLogAggregatorImpl (AppLogAggregatorImpl.java:<init>(182)) - rollingMonitorInterval is set as -1. The log rolling mornitoring interval is disabled. The logs will be aggregated after this application is finished.
2018-07-06 01:53:44,913 WARN nodemanager.DefaultContainerExecutor (DefaultContainerExecutor.java:launchContainer(223)) - Exit code from container container_1530855405184_0121_02_000001 is : 143
2018-07-06 01:53:44,933 WARN nodemanager.NMAuditLogger (NMAuditLogger.java:logFailure(150)) - USER=dr.whoOPERATION=Container Finished - FailedTARGET=ContainerImplRESULT=FAILUREDESCRIPTION=Container failed with state: EXITED_WITH_FAILUREAPPID=application_1530855405184_0121CONTAINERID=container_1530855405184_0121_02_000001
2018-07-06 01:56:11,578 WARN logaggregation.AppLogAggregatorImpl (AppLogAggregatorImpl.java:<init>(182)) - rollingMonitorInterval is set as -1. The log rolling mornitoring interval is disabled. The logs will be aggregated after this application is finished.
... View more
Labels:
- Labels:
-
Apache YARN
04-09-2018
07:22 PM
@Sergey Soldatov i tired that still am getting this error: bash: zookeeper_host:2181:/hbase-unsecure: No such file or directory
... View more
04-09-2018
06:54 PM
able to connect with pheonix with /usr/hdp/current/phoenix-client/bin/sqlline.py zookeeper_host:2181:/hbase-unsecure
but when i tried zookeeper_host:2181:/hbase-unsecure am unable to connect.
for that i have to add some classpath get though this link. but didnt get the exact ans.
... View more
Labels:
- Labels:
-
Apache HBase
-
Apache Phoenix
09-17-2017
05:42 PM
Hi, guys. I don't know why disk usage is different when I'm running du & df. 'hdfs dfs -du -h -s /' gives around 2TB in total and 'hadoop fs -df' does Filesystem Size Used Available Use% hdfs://ip:8020 15.8 T 12.6 T 2.3 T 80% and 'sudo -u hdfs hdfs fsck /' gives me this Total size: 2158294971710 B (Total open files size: 341851 B) Total dirs: 627169
Total files: 59276 Total symlinks: 0 (Files currently being written: 13)
Total blocks (validated): 23879 (avg. block size 90384646 B) (Total open file blocks (not validated): 13) Minimally replicated blocks: 23879 (100.0 %) Over-replicated blocks: 0 (0.0 %)
Under-replicated blocks: 8 (0.03350224 %)
Mis-replicated blocks: 0 (0.0 %) Default replication factor: 2 Average block replication: 2.0228233 Corrupt blocks: 0 Missing replicas: 32 (0.066204615 %)
Number of data-nodes: 6
Number of racks: 1 Let me know why and how I can get my mysteriously used space. Thanks.
... View more
Labels:
- Labels:
-
Apache Hadoop
01-27-2017
08:27 AM
@gnovak it also have some URI issues on configuration file. should i have to change hdfs-site.xml
... View more
01-27-2017
08:25 AM
@gnovak here is my /mnt/disk1/hadoop/hdfs/namesecondary total 36
drwxr-xr-x. 2 hdfs hadoop 28672 Jan 26 05:13 current
-rw-r--r-- 1 hdfs hadoop 34 Jan 27 02:43 in_use.lock
... View more
01-27-2017
08:03 AM
@gnovak this is what i get on running this command 17/01/27 02:51:16 INFO mortbay.log: Started HttpServer2$SelectChannelConnectorWithSafeStartup@ip.ec2.internal:50070
17/01/27 02:51:16 WARN common.Util: Path /hadoop/hdfs/namenode should be specified as a URI in configuration files. Please update hdfs configuration.
17/01/27 02:51:16 WARN common.Util: Path /hadoop/hdfs/namenode should be specified as a URI in configuration files. Please update hdfs configuration.
17/01/27 02:51:16 WARN namenode.FSNamesystem: !!! WARNING !!!
The NameNode currently runs without persistent storage.
Any changes to the file system meta-data may be lost.
Recommended actions:
- shutdown and restart NameNode with configured "dfs.namenode.edits.dir.required" in hdfs-site.xml;
- use Backup Node as a persistent and up-to-date storage of the file system meta-data.
17/01/27 02:51:16 WARN namenode.FSNamesystem: Only one image storage directory (dfs.namenode.name.dir) configured. Beware of data loss due to lack of redundant storage directories!
17/01/27 02:51:16 WARN namenode.FSNamesystem: Only one namespace edits storage directory (dfs.namenode.edits.dir) configured. Beware of data loss due to lack of redundant storage directories!
17/01/27 02:51:16 WARN common.Util: Path /hadoop/hdfs/namenode should be specified as a URI in configuration files. Please update hdfs configuration.
17/01/27 02:51:16 WARN common.Util: Path /hadoop/hdfs/namenode should be specified as a URI in configuration files. Please update hdfs configuration.
17/01/27 02:51:16 WARN common.Storage: set restore failed storage to true
17/01/27 02:51:16 INFO namenode.FSNamesystem: No KeyProvider found.
'''
''''
17/01/27 02:51:16 INFO common.Storage: Lock on /hadoop/hdfs/namenode/in_use.lock acquired by nodename 21282@ip-172-31-17-251.ec2.internal
17/01/27 02:51:16 INFO namenode.FSImage: Storage directory /hadoop/hdfs/namenode is not formatted.
17/01/27 02:51:16 INFO namenode.FSImage: Formatting ...
17/01/27 02:51:16 WARN common.Util: Path /mnt/disk1/hadoop/hdfs/namesecondary should be specified as a URI in configuration files. Please update hdfs configuration.
17/01/27 02:51:16 WARN common.Util: Path /mnt/disk1/hadoop/hdfs/namesecondary should be specified as a URI in configuration files. Please update hdfs configuration.
17/01/27 02:51:16 WARN common.Storage: set restore failed storage to true
17/01/27 02:51:16 WARN common.Storage: Storage directory /mnt/disk1/hadoop/hdfs/namesecondary does not exist
17/01/27 02:51:16 WARN namenode.FSNamesystem: Encountered exception loading fsimage
org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: Directory /mnt/disk1/hadoop/hdfs/namesecondary is in an inconsistent state: storage directory does not exist or is not accessible.
at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverStorageDirs(FSImage.java:313)
at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:202)
at org.apache.hadoop.hdfs.server.namenode.FSImage.doImportCheckpoint(FSImage.java:515)
at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:271)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1022)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:741)
at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:536)
at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:595)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:762)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:746)
at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1438)
at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1504)
17/01/27 02:51:16 INFO mortbay.log: Stopped HttpServer2$SelectChannelConnectorWithSafeStartup@ip-.ec2.internal:50070
17/01/27 02:51:16 INFO impl.MetricsSystemImpl: Stopping NameNode metrics system...
17/01/27 02:51:16 INFO impl.MetricsSinkAdapter: ganglia thread interrupted.
17/01/27 02:51:16 INFO impl.MetricsSystemImpl: NameNode metrics system stopped.
17/01/27 02:51:16 INFO impl.MetricsSystemImpl: NameNode metrics system shutdown complete.
17/01/27 02:51:16 FATAL namenode.NameNode: Failed to start namenode.
org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: Directory /mnt/disk1/hadoop/hdfs/namesecondary is in an inconsistent state: storage directory does not exist or is not accessible.
at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverStorageDirs(FSImage.java:313)
at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:202)
at org.apache.hadoop.hdfs.server.namenode.FSImage.doImportCheckpoint(FSImage.java:515)
at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:271)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1022)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:741)
at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:536)
at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:595)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:762)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:746)
at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1438)
at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1504)
17/01/27 02:51:16 INFO util.ExitUtil: Exiting with status 1
17/01/27 02:51:16 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at ip-.ec2.internal/
************************************************************/
... View more
01-27-2017
07:11 AM
namenode cluster ID is changed so i drop the ../namenode/current dir which contains fsimage and edit logs. now i wants to recover my nn from secondary nn which contains that exact copy of /current dir. for that what i did is just copy fsimage from SNN to NN. but now when i try to start NN it show to format the namenode. whants to know that should i format the nn or there is any other method to recover nn. namenode.log 2017-01-27 01:07:10,184 INFO util.GSet (LightWeightGSet.java:computeCapacity(354)) - Computing capacity for map NameNodeRetryCache
2017-01-27 01:07:10,184 INFO util.GSet (LightWeightGSet.java:computeCapacity(355)) - VM type = 64-bit
2017-01-27 01:07:10,184 INFO util.GSet (LightWeightGSet.java:computeCapacity(356)) - 0.029999999329447746% max memory 2.9 GB = 922.8 KB
2017-01-27 01:07:10,185 INFO util.GSet (LightWeightGSet.java:computeCapacity(361)) - capacity = 2^17 = 131072 entries
2017-01-27 01:07:10,188 INFO namenode.NNConf (NNConf.java:<init>(62)) - ACLs enabled? true
2017-01-27 01:07:10,188 INFO namenode.NNConf (NNConf.java:<init>(66)) - XAttrs enabled? true
2017-01-27 01:07:10,188 INFO namenode.NNConf (NNConf.java:<init>(74)) - Maximum size of an xattr: 16384
2017-01-27 01:07:10,202 INFO common.Storage (Storage.java:tryLock(715)) - Lock on /hadoop/hdfs/namenode/in_use.lock acquired by nodename 7391@ip.ec2.internal
2017-01-27 01:07:10,204 WARN namenode.FSNamesystem (FSNamesystem.java:loadFromDisk(743)) - Encountered exception loading fsimage
java.io.IOException: NameNode is not formatted.
at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:212)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1022)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:741)
at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:536)
at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:595)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:762)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:746)
at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1438)
at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1504)
2017-01-27 01:07:10,209 INFO mortbay.log (Slf4jLog.java:info(67)) - Stopped HttpServer2$SelectChannelConnectorWithSafeStartup@ip.ec2.internal:50070
2017-01-27 01:07:10,310 INFO impl.MetricsSystemImpl (MetricsSystemImpl.java:stop(210)) - Stopping NameNode metrics system...
2017-01-27 01:07:10,310 INFO impl.MetricsSinkAdapter (MetricsSinkAdapter.java:publishMetricsFromQueue(135)) - ganglia thread interrupted.
2017-01-27 01:07:10,311 INFO impl.MetricsSystemImpl (MetricsSystemImpl.java:stop(216)) - NameNode metrics system stopped.
2017-01-27 01:07:10,311 INFO impl.MetricsSystemImpl (MetricsSystemImpl.java:shutdown(605)) - NameNode metrics system shutdown complete.
2017-01-27 01:07:10,311 FATAL namenode.NameNode (NameNode.java:main(1509)) - Failed to start namenode.
java.io.IOException: NameNode is not formatted.
at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:212)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1022)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:741)
at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:536)
at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:595)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:762)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:746)
at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1438)
at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1504)
2017-01-27 01:07:10,313 INFO util.ExitUtil (ExitUtil.java:terminate(124)) - Exiting with status 1
2017-01-27 01:07:10,314 INFO namenode.NameNode (StringUtils.java:run(659)) - SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at ip.ec2.internal/nn
************************************************************/
... View more
Labels:
- Labels:
-
Apache Hadoop
01-25-2017
07:30 AM
thnx @Jay SenSharma now its working. But now when i am doing some query in hue it doesnt show any output
... View more
01-25-2017
07:03 AM
when i start hue i get this issue on hue hadoop.hdfs_clusters.default.webhdfs_urlCurrent value: http://namenode:50070/webhdfs/v1
Filesystem root '/' should be owned by 'hdfs'
... View more
- Tags:
- Hadoop Core
- hue
Labels:
- Labels:
-
Cloudera Hue
01-24-2017
04:12 PM
@Jay SenSharma Thnx, now its working.
... View more
01-24-2017
10:52 AM
@Jay SenSharma now am getting this error in datanode.log 2017-01-24 03:39:19,891 INFO common.Storage (Storage.java:tryLock(715)) - Lock on /mnt/disk1/hadoop/hdfs/data/in_use.lock acquired by nodename 1491@datanode.ec2.internal
2017-01-24 03:39:19,902 INFO common.Storage (Storage.java:tryLock(715)) - Lock on /mnt/disk2/hadoop/hdfs/data/in_use.lock acquired by nodename 1491@datanode.ec2.internal
2017-01-24 03:39:19,903 FATAL datanode.DataNode (BPServiceActor.java:run(840)) - Initialization failed for Block pool <registering> (Datanode Uuid unassigned) service to namenode.ec2.internal/ namenode:8020. Exiting. java.io.IOException: Incompatible clusterIDs in /mnt/disk1/hadoop/hdfs/data: namenode clusterID = CID-297a140f-7cd6-4c73-afc8-bd0a7d01c0ee; datanode clusterID = CID-7591e6bd-ce9b-4b14-910c-c9603892a0f1
at org.apache.hadoop.hdfs.server.datanode.DataStorage.doTransition(DataStorage.java:646)
at org.apache.hadoop.hdfs.server.datanode.DataStorage.addStorageLocations(DataStorage.java:320)
at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:403)
at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:422)
at org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:1311)
at org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:1276)
at org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:314)
at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:220)
at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:828)
at java.lang.Thread.run(Thread.java:745)
2017-01-24 03:39:19,904 WARN datanode.DataNode (BPServiceActor.java:run(861)) - Ending block pool service for: Block pool <registering> (Datanode Uuid unassigned) service to ip-172-31-17-251.ec2.internal/172.31.17.251:8020
2017-01-24 03:39:20,005 INFO datanode.DataNode (BlockPoolManager.java:remove(103)) - Removed Block pool <registering> (Datanode Uuid unassigned)
2017-01-24 03:39:22,005 WARN datanode.DataNode (DataNode.java:secureMain(2392)) - Exiting Datanode
2017-01-24 03:39:22,007 INFO util.ExitUtil (ExitUtil.java:terminate(124)) - Exiting with status 0
2017-01-24 03:39:22,008 INFO datanode.DataNode (StringUtils.java:run(659)) - SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down DataNode at datanode.ec2.internal/datanode
************************************************************/
... View more
01-24-2017
10:40 AM
yeah that was right total 4
drwxrwxrwt. 2 hdfs hadoop 4096 Nov 19 2014 cache
srw-rw-rw-. 1 hdfs hadoop 0 Jan 24 03:39 dn_socket
actually non of my datanode host is working.
is that memory issue.
... View more
01-24-2017
10:11 AM
@Jay SenSharma so the error is in log file of permissions
... View more
01-24-2017
10:09 AM
top top - 05:03:36 up 4:41, 1 user, load average: 0.00, 0.00, 0.00
Tasks: 186 total, 1 running, 185 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.1%us, 0.4%sy, 0.0%ni, 99.5%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 32877652k total, 1678960k used, 31198692k free, 335884k buffers
Swap: 0k total, 0k used, 0k free, 517928k cached
... View more
01-24-2017
10:08 AM
df -h Filesystem Size Used Avail Use% Mounted on
/dev/xvda1 30G 9.9G 19G 36% /
tmpfs 16G 0 16G 0% /dev/shm
/dev/xvdf 1.1T 905G 75G 93% /mnt/disk1
/dev/xvdg 1.1T 890G 90G 91% /mnt/disk2
... View more
01-24-2017
10:06 AM
output of datanode.log 2017-01-24 04:59:13,837 INFO datanode.DataNode (DataNode.java:shutdown(1720)) - Shutdown complete.
2017-01-24 04:59:13,839 FATAL datanode.DataNode (DataNode.java:secureMain(2385)) - Exception in secureMain
java.io.IOException: the path component: '/var/lib/hadoop-hdfs' is owned by a user who is not root and not you. Your effective user id is 0; the path is owned by user id 508, and its permissions are 0751. Please fix this or select a different socket path.
at org.apache.hadoop.net.unix.DomainSocket.validateSocketPathSecurity0(Native Method)
at org.apache.hadoop.net.unix.DomainSocket.bindAndListen(DomainSocket.java:189)
at org.apache.hadoop.hdfs.net.DomainPeerServer.<init>(DomainPeerServer.java:40)
at org.apache.hadoop.hdfs.server.datanode.DataNode.getDomainPeerServer(DataNode.java:892)
at org.apache.hadoop.hdfs.server.datanode.DataNode.initDataXceiver(DataNode.java:858)
at org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:1056)
at org.apache.hadoop.hdfs.server.datanode.DataNode.<init>(DataNode.java:415)
at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:2268)
at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:2155)
at org.apache.hadoo
p.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:2202)
at org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:2378)
at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:2402)
2017-01-24 04:59:13,841 INFO util.ExitUtil (ExitUtil.java:terminate(124)) - Exiting with status 1
2017-01-24 04:59:13,843 INFO datanode.DataNode (StringUtils.java:run(659)) - SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down DataNode at datanode.ec2.internal/datanode
************************************************************/
... View more
01-24-2017
09:55 AM
@Jay SenSharma 3. other components on agent are running without an issues only issues is in datanode which goes down after few sec. 4. on running 'top' command i have enough space on agent
... View more
01-24-2017
09:52 AM
@Jay SenSharma 1. i didnt got any error on datanode log. 2. ambari-server.log 22:53:19,873 WARN [Thread-1] HeartbeatMonitor:150 - Heartbeat lost from host datanode.ec2.internal
22:53:19,874 WARN [Thread-1] HeartbeatMonitor:150 - Heartbeat lost from host datanode.ec2.internal
22:53:19,874 WARN [Thread-1] HeartbeatMonitor:165 - Setting component state to UNKNOWN for component GANGLIA_MONITOR on datanode.ec2.internal
22:53:19,874 WARN [Thread-1] HeartbeatMonitor:165 - Setting component state to UNKNOWN for component DATANODE on datanode.ec2.internal
22:53:19,874 WARN [Thread-1] HeartbeatMonitor:165 - Setting component state to UNKNOWN for component NODEMANAGER on datanode.ec2.internal
22:53:19,890 WARN [Thread-1] HeartbeatMonitor:150 - Heartbeat lost from host datanode.ec2.internal 22:53:19,890 WARN [Thread-1] HeartbeatMonitor:165 - Setting component state to UNKNOWN for component GANGLIA_MONITOR on datanode.ec2.internal
... View more
01-24-2017
09:44 AM
that was not error. that was the output of that file. i got nothing on error-30684.txt output of command-30684.txt "namenode.ec2.internal"
],
"hs_host": [
"namenode.ec2.internal"
],
"hive_server_host": [
"namenode.ec2.internal"
]
}
}
... View more
01-24-2017
09:40 AM
@Jay SenSharma Hi Jay, thnx for reply. i got error on output-30684.txt. 2017-01-24 03:39:17,877 - File['/etc/hadoop/conf/slaves'] {'content': Template('slaves.j2'), 'owner': 'hdfs'}
2017-01-24 03:39:17,877 - Directory['/var/lib/hadoop-hdfs'] {'owner': 'hdfs', 'group': 'hadoop', 'mode': 0751, 'recursive': True}
2017-01-24 03:39:17,893 - Host contains mounts: ['/', '/proc', '/sys', '/dev/pts', '/dev/shm', '/mnt/disk1', '/mnt/disk2', '/proc/sys/fs/binfmt_misc'].
2017-01-24 03:39:17,894 - Mount point for directory /mnt/disk1/hadoop/hdfs/data is /mnt/disk1
2017-01-24 03:39:17,894 - Mount point for directory /mnt/disk2/hadoop/hdfs/data is /mnt/disk2
2017-01-24 03:39:17,895 - Directory['/var/run/hadoop/hdfs'] {'owner': 'hdfs', 'recursive': True}
2017-01-24 03:39:17,895 - Directory['/var/log/hadoop/hdfs'] {'owner': 'hdfs', 'recursive': True}
2017-01-24 03:39:17,896 - File['/var/run/hadoop/hdfs/hadoop-hdfs-datanode.pid'] {'action': ['delete'], 'not_if': 'ls /var/run/hadoop/hdfs/hadoop-hdfs-datanode.pid >/dev/null 2>&1 && ps `cat /var/run/hadoop/hdfs/hadoop-hdfs-datanode.pid` >/dev/null 2>&1'}
2017-01-24 03:39:17,919 - Deleting File['/var/run/hadoop/hdfs/hadoop-hdfs-datanode.pid']
2017-01-24 03:39:17,919 - Execute['ulimit -c unlimited; su -s /bin/bash - hdfs -c 'export HADOOP_LIBEXEC_DIR=/usr/hdp/current/hadoop-client/libexec && /usr/hdp/current/hadoop-client/sbin/hadoop-daemon.sh --config /etc/hadoop/conf start datanode''] {'not_if': 'ls /var/run/hadoop/hdfs/hadoop-hdfs-datanode.pid >/dev/null 2>&1 && ps `cat /var/run/hadoop/hdfs/hadoop-hdfs-datanode.pid` >/dev/null 2>&1'}
... View more
01-24-2017
08:59 AM
Datanode automatically goes down after a few sec on starting from ambari. i check that ambari agent is working. datanode receives the heartbeat but no commands from namenode. ambari agent log file. INFO 2017-01-24 03:44:59,747 PythonExecutor.py:118 - Result: {'structuredOut': {}, 'stdout': '', 'stderr': '', 'exitcode': 1}
INFO 2017-01-24 03:45:07,970 Heartbeat.py:78 - Building Heartbeat: {responseId = 210, timestamp = 1485247507970, commandsInProgress = False, componentsMapped = True}
INFO 2017-01-24 03:45:08,129 Controller.py:214 - Heartbeat response received (id = 211)
INFO 2017-01-24 03:45:08,129 Controller.py:249 - No commands sent from ip-172-31-17-251.ec2.internal
INFO 2017-01-24 03:45:18,130 Heartbeat.py:78 - Building Heartbeat: {responseId = 211, timestamp = 1485247518130, commandsInProgress = False, componentsMapped = True}
INFO 2017-01-24 03:45:18,274 Controller.py:214 - Heartbeat response received (id = 212)
INFO 2017-01-24 03:45:18,274 Controller.py:249 - No commands sent from NAMENODE.ec2.internal
... View more
Labels:
- Labels:
-
Apache Ambari
-
Apache Hadoop