Support Questions

ganne · ‎04-29-2016

This may be a very basic question. i am just getting started with the devops in ambari. So, I am unable to start the HDFS service on ambari. I tried manually restarting all the components of HDFS from 'service actions' widget. Attched screen-shot-2016-04-29-at-14858-pm.png screen-shot-2016-04-29-at-21145-pm.png screen-shot-2016-04-29-at-21436-pm.pngpics

ravi1 · ‎04-29-2016

From the third screenshot, I see that NN hasn't restarted (which was running before the restart). You can take a look at /var/log/hadoop/hdfs/hadoop-hdfs-datanode-sandbox.hortonworks.com.log which is your NN log. This should have some information on why hasn't started back up.

View solution in original post

ravi1 · ‎04-29-2016

From the third screenshot, I see that NN hasn't restarted (which was running before the restart). You can take a look at /var/log/hadoop/hdfs/hadoop-hdfs-datanode-sandbox.hortonworks.com.log which is your NN log. This should have some information on why hasn't started back up.

ganne · ‎04-29-2016

looks like the cnnection exception What does this mean?

ravi1 · ‎04-29-2016

It means namenode is not running. You need to paste the full NN log to see why it hasn't started. Please upload either the full NN log or the last 20 lines from there.

ganne · ‎04-29-2016

2016-04-29 20:57:08,371 INFO blockmanagement.DatanodeDescriptor (DatanodeDescriptor.java:updateHeartbeatState(451)) - Number of failed storage changes from 0 to 0

2016-04-29 20:57:08,405 WARN hdfs.DFSClient (DFSOutputStream.java:run(857)) - DFSOutputStream ResponseProcessor exception for block BP-1014530610-10.0.2.15-1456769265896:blk_1073786547_45771

java.io.EOFException: Premature EOF: no length prefix available

at org.apache.hadoop.hdfs.protocolPB.PBHelper.vintPrefixed(PBHelper.java:2293)

at org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.readFields(PipelineAck.java:244)

at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:749)

2016-04-29 20:57:08,407 INFO provider.BaseAuditHandler (BaseAuditHandler.java:logStatus(312)) - Audit Status Log: name=hdfs.async.multi_dest.batch.hdfs, interval=03.058 seconds, events=1, succcessCount=1, totalEvents=9406, totalSuccessCount=9403, totalDeferredCount=3

2016-04-29 20:57:08,407 INFO queue.AuditFileSpool (AuditFileSpool.java:stop(321)) - Stop called, queueName=hdfs.async.multi_dest.batch, consumer=hdfs.async.multi_dest.batch.hdfs

2016-04-29 20:57:08,407 INFO provider.BaseAuditHandler (BaseAuditHandler.java:logStatus(312)) - Audit Status Log: name=hdfs.async.multi_dest.batch, finalDestination=hdfs.async.multi_dest.batch.hdfs, interval=21.059 seconds, events=30, succcessCount=9, stashedCount=3, totalEvents=1057848, totalSuccessCount=9295, totalStashedCount=3

2016-04-29 20:57:08,407 INFO queue.AuditFileSpool (AuditFileSpool.java:runDoAs(877)) - Caught exception in consumer thread. Shutdown might be in progress

2016-04-29 20:57:08,407 INFO queue.AuditBatchQueue (AuditBatchQueue.java:runDoAs(373)) - Exiting consumerThread.run() method. name=hdfs.async.multi_dest.batch

2016-04-29 20:57:08,439 ERROR hdfs.DFSClient (DFSClient.java:closeAllFilesBeingWritten(954)) - Failed to close inode 96024

java.io.IOException: All datanodes DatanodeInfoWithStorage[10.0.2.15:50010,DS-c63e2550-a18a-4035-8bb3-7f8f2b4dd607,DISK] are bad. Aborting...

at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1146)

at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:909)

at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:412)

2016-04-29 20:57:08,565 INFO namenode.NameNode (LogAdapter.java:info(47)) - SHUTDOWN_MSG:

/************************************************************

SHUTDOWN_MSG: Shutting down NameNode at sandbox.hortonworks.com/10.0.2.15

************************************************************/

[root@sandbox ~]#

ravi1 · ‎04-29-2016

NN has stopped with this

'java.io.IOException: All datanodes DatanodeInfoWithStorage[10.0.2.15:50010,DS-c63e2550-a18a-4035-8bb3-7f8f2b4dd607,DISK] are bad. Aborting...'

One possible reason is that you are hitting ulimit.

Please post

ulimit -a output

and datanode log (from the same folder as NN log)

ganne · ‎04-29-2016

core file size (blocks, -c) 0

data seg size (kbytes, -d) unlimited

scheduling priority (-e) 0

file size (blocks, -f) unlimited

pending signals (-i) 46763

max locked memory (kbytes, -l) 64

max memory size (kbytes, -m) unlimited

open files (-n) 1024

pipe size (512 bytes, -p) 8

POSIX message queues (bytes, -q) 819200

real-time priority (-r) 0

stack size (kbytes, -s) 10240

cpu time (seconds, -t) unlimited

max user processes (-u) 46763

virtual memory (kbytes, -v) unlimited

file locks (-x) unlimited

This is the ulimit result

ravi1 · ‎04-30-2016

ulimit -n 8096

Try this and restart DN and NN and see if this works. I haven't seen your DN logs but it looks like you are running into max open files issue.

RyanCicak · ‎04-29-2016

Hi @ganne!

Try to do "stop all" and "start all" in Ambari. Some of the services rely on each-other so they must be started in a certain order. If you stop everything and start everything, Ambari will make sure every starts in the right order.

Let me know if that doesn't work.

ganne · ‎04-29-2016

HI ryan, Yeah i did that, also had to increase my node's memory and remove the SNN from safe mode. Thanks!

Cloudera Community

Support Questions

Ambari's HDFS service on ambari not running on sandbox