Support Questions

Find answers, ask questions, and share your expertise

Ambari's HDFS service on ambari not running on sandbox

avatar
Expert Contributor

This may be a very basic question. i am just getting started with the devops in ambari. So, I am unable to start the HDFS service on ambari. I tried manually restarting all the components of HDFS from 'service actions' widget. Attched screen-shot-2016-04-29-at-14858-pm.pngscreen-shot-2016-04-29-at-21145-pm.pngscreen-shot-2016-04-29-at-21436-pm.pngpics

1 ACCEPTED SOLUTION

avatar
Guru

From the third screenshot, I see that NN hasn't restarted (which was running before the restart). You can take a look at /var/log/hadoop/hdfs/hadoop-hdfs-datanode-sandbox.hortonworks.com.log which is your NN log. This should have some information on why hasn't started back up.

View solution in original post

9 REPLIES 9

avatar
Guru

From the third screenshot, I see that NN hasn't restarted (which was running before the restart). You can take a look at /var/log/hadoop/hdfs/hadoop-hdfs-datanode-sandbox.hortonworks.com.log which is your NN log. This should have some information on why hasn't started back up.

avatar
Expert Contributor

3870-screen-shot-2016-04-29-at-24659-pm.png

looks like the cnnection exception What does this mean?

avatar
Guru

It means namenode is not running. You need to paste the full NN log to see why it hasn't started. Please upload either the full NN log or the last 20 lines from there.

avatar
Expert Contributor

2016-04-29 20:57:08,371 INFO blockmanagement.DatanodeDescriptor (DatanodeDescriptor.java:updateHeartbeatState(451)) - Number of failed storage changes from 0 to 0

2016-04-29 20:57:08,405 WARN hdfs.DFSClient (DFSOutputStream.java:run(857)) - DFSOutputStream ResponseProcessor exception for block BP-1014530610-10.0.2.15-1456769265896:blk_1073786547_45771

java.io.EOFException: Premature EOF: no length prefix available

at org.apache.hadoop.hdfs.protocolPB.PBHelper.vintPrefixed(PBHelper.java:2293)

at org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.readFields(PipelineAck.java:244)

at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:749)

2016-04-29 20:57:08,407 INFO provider.BaseAuditHandler (BaseAuditHandler.java:logStatus(312)) - Audit Status Log: name=hdfs.async.multi_dest.batch.hdfs, interval=03.058 seconds, events=1, succcessCount=1, totalEvents=9406, totalSuccessCount=9403, totalDeferredCount=3

2016-04-29 20:57:08,407 INFO queue.AuditFileSpool (AuditFileSpool.java:stop(321)) - Stop called, queueName=hdfs.async.multi_dest.batch, consumer=hdfs.async.multi_dest.batch.hdfs

2016-04-29 20:57:08,407 INFO provider.BaseAuditHandler (BaseAuditHandler.java:logStatus(312)) - Audit Status Log: name=hdfs.async.multi_dest.batch, finalDestination=hdfs.async.multi_dest.batch.hdfs, interval=21.059 seconds, events=30, succcessCount=9, stashedCount=3, totalEvents=1057848, totalSuccessCount=9295, totalStashedCount=3

2016-04-29 20:57:08,407 INFO queue.AuditFileSpool (AuditFileSpool.java:runDoAs(877)) - Caught exception in consumer thread. Shutdown might be in progress

2016-04-29 20:57:08,407 INFO queue.AuditBatchQueue (AuditBatchQueue.java:runDoAs(373)) - Exiting consumerThread.run() method. name=hdfs.async.multi_dest.batch

2016-04-29 20:57:08,439 ERROR hdfs.DFSClient (DFSClient.java:closeAllFilesBeingWritten(954)) - Failed to close inode 96024

java.io.IOException: All datanodes DatanodeInfoWithStorage[10.0.2.15:50010,DS-c63e2550-a18a-4035-8bb3-7f8f2b4dd607,DISK] are bad. Aborting...

at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1146)

at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:909)

at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:412)

2016-04-29 20:57:08,565 INFO namenode.NameNode (LogAdapter.java:info(47)) - SHUTDOWN_MSG:

/************************************************************

SHUTDOWN_MSG: Shutting down NameNode at sandbox.hortonworks.com/10.0.2.15

************************************************************/

[root@sandbox ~]#

avatar
Guru

NN has stopped with this

'java.io.IOException: All datanodes DatanodeInfoWithStorage[10.0.2.15:50010,DS-c63e2550-a18a-4035-8bb3-7f8f2b4dd607,DISK] are bad. Aborting...'

One possible reason is that you are hitting ulimit.

Please post

ulimit -a output

and datanode log (from the same folder as NN log)

avatar
Expert Contributor

core file size (blocks, -c) 0

data seg size (kbytes, -d) unlimited

scheduling priority (-e) 0

file size (blocks, -f) unlimited

pending signals (-i) 46763

max locked memory (kbytes, -l) 64

max memory size (kbytes, -m) unlimited

open files (-n) 1024

pipe size (512 bytes, -p) 8

POSIX message queues (bytes, -q) 819200

real-time priority (-r) 0

stack size (kbytes, -s) 10240

cpu time (seconds, -t) unlimited

max user processes (-u) 46763

virtual memory (kbytes, -v) unlimited

file locks (-x) unlimited

This is the ulimit result

avatar
Guru

ulimit -n 8096

Try this and restart DN and NN and see if this works. I haven't seen your DN logs but it looks like you are running into max open files issue.

avatar

Hi @ganne!

Try to do "stop all" and "start all" in Ambari. Some of the services rely on each-other so they must be started in a certain order. If you stop everything and start everything, Ambari will make sure every starts in the right order.

Let me know if that doesn't work.

avatar
Expert Contributor

HI ryan, Yeah i did that, also had to increase my node's memory and remove the SNN from safe mode. Thanks!