Created 04-29-2016 09:16 PM
This may be a very basic question. i am just getting started with the devops in ambari. So, I am unable to start the HDFS service on ambari. I tried manually restarting all the components of HDFS from 'service actions' widget. Attched screen-shot-2016-04-29-at-14858-pm.pngscreen-shot-2016-04-29-at-21145-pm.pngscreen-shot-2016-04-29-at-21436-pm.pngpics
Created 04-29-2016 09:23 PM
From the third screenshot, I see that NN hasn't restarted (which was running before the restart). You can take a look at /var/log/hadoop/hdfs/hadoop-hdfs-datanode-sandbox.hortonworks.com.log which is your NN log. This should have some information on why hasn't started back up.
Created 04-29-2016 09:23 PM
From the third screenshot, I see that NN hasn't restarted (which was running before the restart). You can take a look at /var/log/hadoop/hdfs/hadoop-hdfs-datanode-sandbox.hortonworks.com.log which is your NN log. This should have some information on why hasn't started back up.
Created on 04-29-2016 09:51 PM - edited 08-19-2019 03:17 AM
looks like the cnnection exception What does this mean?
Created 04-29-2016 09:55 PM
It means namenode is not running. You need to paste the full NN log to see why it hasn't started. Please upload either the full NN log or the last 20 lines from there.
Created 04-29-2016 10:04 PM
2016-04-29 20:57:08,371 INFO blockmanagement.DatanodeDescriptor (DatanodeDescriptor.java:updateHeartbeatState(451)) - Number of failed storage changes from 0 to 0
2016-04-29 20:57:08,405 WARN hdfs.DFSClient (DFSOutputStream.java:run(857)) - DFSOutputStream ResponseProcessor exception for block BP-1014530610-10.0.2.15-1456769265896:blk_1073786547_45771
java.io.EOFException: Premature EOF: no length prefix available
at org.apache.hadoop.hdfs.protocolPB.PBHelper.vintPrefixed(PBHelper.java:2293)
at org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.readFields(PipelineAck.java:244)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:749)
2016-04-29 20:57:08,407 INFO provider.BaseAuditHandler (BaseAuditHandler.java:logStatus(312)) - Audit Status Log: name=hdfs.async.multi_dest.batch.hdfs, interval=03.058 seconds, events=1, succcessCount=1, totalEvents=9406, totalSuccessCount=9403, totalDeferredCount=3
2016-04-29 20:57:08,407 INFO queue.AuditFileSpool (AuditFileSpool.java:stop(321)) - Stop called, queueName=hdfs.async.multi_dest.batch, consumer=hdfs.async.multi_dest.batch.hdfs
2016-04-29 20:57:08,407 INFO provider.BaseAuditHandler (BaseAuditHandler.java:logStatus(312)) - Audit Status Log: name=hdfs.async.multi_dest.batch, finalDestination=hdfs.async.multi_dest.batch.hdfs, interval=21.059 seconds, events=30, succcessCount=9, stashedCount=3, totalEvents=1057848, totalSuccessCount=9295, totalStashedCount=3
2016-04-29 20:57:08,407 INFO queue.AuditFileSpool (AuditFileSpool.java:runDoAs(877)) - Caught exception in consumer thread. Shutdown might be in progress
2016-04-29 20:57:08,407 INFO queue.AuditBatchQueue (AuditBatchQueue.java:runDoAs(373)) - Exiting consumerThread.run() method. name=hdfs.async.multi_dest.batch
2016-04-29 20:57:08,439 ERROR hdfs.DFSClient (DFSClient.java:closeAllFilesBeingWritten(954)) - Failed to close inode 96024
java.io.IOException: All datanodes DatanodeInfoWithStorage[10.0.2.15:50010,DS-c63e2550-a18a-4035-8bb3-7f8f2b4dd607,DISK] are bad. Aborting...
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1146)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:909)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:412)
2016-04-29 20:57:08,565 INFO namenode.NameNode (LogAdapter.java:info(47)) - SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at sandbox.hortonworks.com/10.0.2.15
************************************************************/
[root@sandbox ~]#
Created 04-29-2016 10:12 PM
NN has stopped with this
'java.io.IOException: All datanodes DatanodeInfoWithStorage[10.0.2.15:50010,DS-c63e2550-a18a-4035-8bb3-7f8f2b4dd607,DISK] are bad. Aborting...'
One possible reason is that you are hitting ulimit.
Please post
ulimit -a output
and datanode log (from the same folder as NN log)
Created 04-29-2016 10:37 PM
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 46763
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 10240
cpu time (seconds, -t) unlimited
max user processes (-u) 46763
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
This is the ulimit result
Created 04-30-2016 12:54 AM
ulimit -n 8096
Try this and restart DN and NN and see if this works. I haven't seen your DN logs but it looks like you are running into max open files issue.
Created 04-29-2016 10:46 PM
Hi @ganne!
Try to do "stop all" and "start all" in Ambari. Some of the services rely on each-other so they must be started in a certain order. If you stop everything and start everything, Ambari will make sure every starts in the right order.
Let me know if that doesn't work.
Created 04-29-2016 11:20 PM
HI ryan, Yeah i did that, also had to increase my node's memory and remove the SNN from safe mode. Thanks!