Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

FAILED to start HDFS Namenode

avatar
Explorer

Hi,
I have a problem when I'll start my HDFS Service. I have 3 nodes ( 1 Master, 2 Slaves).
All of Secondary Namenode, two Data Nodes is starting successfully, but the NameNode is failed to start.
there's the error messages a stderr 

Can't open /var/run/cloudera-scm-agent/process/117-hdfs-NAMENODE/supervisor.conf: Permission denied.
+ make_scripts_executable
+ find /var/run/cloudera-scm-agent/process/117-hdfs-NAMENODE -regex '.*\.\(py\|sh\)$' -exec chmod u+x '{}' ';'
+ '[' DATANODE_MAX_LOCKED_MEMORY '!=' '' ']'
+ ulimit -l
+ export HADOOP_IDENT_STRING=hdfs
+ HADOOP_IDENT_STRING=hdfs
+ '[' -n '' ']'
+ acquire_kerberos_tgt hdfs.keytab
+ '[' -z hdfs.keytab ']'
+ '[' -n '' ']'
+ '[' validate-writable-empty-dirs = namenode ']'
+ '[' file-operation = namenode ']'
+ '[' bootstrap = namenode ']'
+ '[' failover = namenode ']'
+ '[' transition-to-active = namenode ']'
+ '[' initializeSharedEdits = namenode ']'
+ '[' initialize-znode = namenode ']'
+ '[' format-namenode = namenode ']'
+ '[' monitor-decommission = namenode ']'
+ '[' jnSyncWait = namenode ']'
+ '[' nnRpcWait = namenode ']'
+ '[' -safemode = '' -a get = '' ']'
+ '[' monitor-upgrade = namenode ']'
+ '[' finalize-upgrade = namenode ']'
+ '[' rolling-upgrade-prepare = namenode ']'
+ '[' rolling-upgrade-finalize = namenode ']'
+ '[' nnDnLiveWait = namenode ']'
+ '[' refresh-datanode = namenode ']'
+ '[' mkdir = namenode ']'
+ '[' nfs3 = namenode ']'
+ '[' namenode = namenode -o secondarynamenode = namenode -o datanode = namenode ']'
+ HADOOP_OPTS='-Dsecurity.audit.logger=INFO,RFAS -Djava.net.preferIPv4Stack=true '
+ export 'HADOOP_OPTS=-Dhdfs.audit.logger=INFO,RFAAUDIT -Dsecurity.audit.logger=INFO,RFAS -Djava.net.preferIPv4Stack=true '
+ HADOOP_OPTS='-Dhdfs.audit.logger=INFO,RFAAUDIT -Dsecurity.audit.logger=INFO,RFAS -Djava.net.preferIPv4Stack=true '
+ '[' namenode = namenode -a rollingUpgrade = '' ']'
+ exec /usr/lib/hadoop-hdfs/bin/hdfs --config /var/run/cloudera-scm-agent/process/117-hdfs-NAMENODE namenode


and there is the message from stdout

Screenshot from 2016-09-15 11_17_08.png

 

can anybody tell why it's happen ?
Thank you.

12 REPLIES 12

avatar
Expert Contributor

As per the log

Can't open /var/run/cloudera-scm-agent/process/117-hdfs-NAMENODE/supervisor.conf: Permission denied.

Check directory permissions

avatar
Explorer

what it should be ?

I already chaneg the permission but it still happen.


-rw------- 1 root root 2955 Sep 16 13:42 supervisor.con

 

 

avatar
Master Guru

Hello,

 

If you see stderr output, then the supervisor.conf was already read.  The permissions error is not relevant, I think, as the supervisor runs as root and has permission to access.  The fact that you see stderr informaiton means the supervisor.conf was already read successfully and the process started as we see the "exec" line.

 

Check your NameNode log (usually in /var/log/hadoop-hdfs) for details about the failure.

 

From what you showed us, it appears the agent/supervisor started the NameNode but then it failed to stay running for more than a few seconds at most.

 

Let us know what you see in the log.

 

 

avatar
Explorer
 2016-09-23 10:33:38,220 INFO org.apache.hadoop.util.GSet: Computing capacity for map BlocksMap
2016-09-23 10:33:38,221 INFO org.apache.hadoop.util.GSet: VM type = 64-bit
2016-09-23 10:33:38,224 INFO org.apache.hadoop.util.GSet: 2.0% max memory 3.9 GB = 80.6 MB
2016-09-23 10:33:38,225 INFO org.apache.hadoop.util.GSet: capacity = 2^23 = 8388608 entries
2016-09-23 10:33:38,554 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: dfs.block.access.token.enable=false
2016-09-23 10:33:38,558 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: defaultReplication = 3
2016-09-23 10:33:38,558 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: maxReplication = 512
2016-09-23 10:33:38,558 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: minReplication = 1
2016-09-23 10:33:38,558 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: maxReplicationStreams = 20
2016-09-23 10:33:38,558 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: replicationRecheckInterval = 3000
2016-09-23 10:33:38,558 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: encryptDataTransfer = false
2016-09-23 10:33:38,559 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: maxNumBlocksToLog = 1000
2016-09-23 10:33:38,572 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: fsOwner = hdfs (auth:SIMPLE)
2016-09-23 10:33:38,573 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: supergroup = supergroup
2016-09-23 10:33:38,573 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: isPermissionEnabled = true
2016-09-23 10:33:38,574 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: HA Enabled: false
2016-09-23 10:33:38,579 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Append Enabled: true
2016-09-23 10:33:38,961 INFO org.apache.hadoop.util.GSet: Computing capacity for map INodeMap
2016-09-23 10:33:38,961 INFO org.apache.hadoop.util.GSet: VM type = 64-bit
2016-09-23 10:33:38,962 INFO org.apache.hadoop.util.GSet: 1.0% max memory 3.9 GB = 40.3 MB
2016-09-23 10:33:38,962 INFO org.apache.hadoop.util.GSet: capacity = 2^22 = 4194304 entries
2016-09-23 10:33:38,975 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: Caching file names occuring more than 10 times
2016-09-23 10:33:38,988 INFO org.apache.hadoop.util.GSet: Computing capacity for map cachedBlocks
2016-09-23 10:33:38,989 INFO org.apache.hadoop.util.GSet: VM type = 64-bit
2016-09-23 10:33:38,989 INFO org.apache.hadoop.util.GSet: 0.25% max memory 3.9 GB = 10.1 MB
2016-09-23 10:33:38,989 INFO org.apache.hadoop.util.GSet: capacity = 2^20 = 1048576 entries
2016-09-23 10:33:39,000 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: dfs.namenode.safemode.threshold-pct = 0.9990000128746033
2016-09-23 10:33:39,000 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: dfs.namenode.safemode.min.datanodes = 0
2016-09-23 10:33:39,001 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: dfs.namenode.safemode.extension = 30000
2016-09-23 10:33:39,007 INFO org.apache.hadoop.hdfs.server.namenode.top.metrics.TopMetrics: NNTop conf: dfs.namenode.top.window.num.buckets = 10
2016-09-23 10:33:39,007 INFO org.apache.hadoop.hdfs.server.namenode.top.metrics.TopMetrics: NNTop conf: dfs.namenode.top.num.users = 10
2016-09-23 10:33:39,007 INFO org.apache.hadoop.hdfs.server.namenode.top.metrics.TopMetrics: NNTop conf: dfs.namenode.top.windows.minutes = 1,5,25
2016-09-23 10:33:39,010 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Retry cache on namenode is enabled
2016-09-23 10:33:39,011 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Retry cache will use 0.03 of total heap and retry cache entry expiry time is 600000 millis
2016-09-23 10:33:39,015 INFO org.apache.hadoop.util.GSet: Computing capacity for map NameNodeRetryCache
2016-09-23 10:33:39,015 INFO org.apache.hadoop.util.GSet: VM type = 64-bit
2016-09-23 10:33:39,016 INFO org.apache.hadoop.util.GSet: 0.029999999329447746% max memory 3.9 GB = 1.2 MB
2016-09-23 10:33:39,016 INFO org.apache.hadoop.util.GSet: capacity = 2^17 = 131072 entries
2016-09-23 10:33:39,025 INFO org.apache.hadoop.hdfs.server.namenode.NNConf: ACLs enabled? false
2016-09-23 10:33:39,026 INFO org.apache.hadoop.hdfs.server.namenode.NNConf: XAttrs enabled? true
2016-09-23 10:33:39,026 INFO org.apache.hadoop.hdfs.server.namenode.NNConf: Maximum size of an xattr: 16384
2016-09-23 10:33:39,093 INFO org.apache.hadoop.hdfs.server.common.Storage: Lock on /dfs/nn/in_use.lock acquired by nodename 11727@master1
2016-09-23 10:33:39,098 WARN org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Encountered exception loading fsimage
java.io.FileNotFoundException: /dfs/nn/current/VERSION (Permission denied)
at java.io.RandomAccessFile.open(Native Method)
at java.io.RandomAccessFile.<init>(RandomAccessFile.java:241)
at org.apache.hadoop.hdfs.server.common.StorageInfo.readPropertiesFile(StorageInfo.java:245)
at org.apache.hadoop.hdfs.server.namenode.NNStorage.readProperties(NNStorage.java:627)
at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverStorageDirs(FSImage.java:337)
at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:213)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1080)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:777)
at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:613)
at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:675)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:843)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:822)
at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1543)
at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1611)
2016-09-23 10:33:39,146 INFO org.mortbay.log: Stopped HttpServer2$SelectChannelConnectorWithSafeStartup@master1:50070
2016-09-23 10:33:39,250 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping NameNode metrics system...
2016-09-23 10:33:39,251 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NameNode metrics system stopped.
2016-09-23 10:33:39,251 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NameNode metrics system shutdown complete.
2016-09-23 10:33:39,251 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: Failed to start namenode.
java.io.FileNotFoundException: /dfs/nn/current/VERSION (Permission denied)
at java.io.RandomAccessFile.open(Native Method)
at java.io.RandomAccessFile.<init>(RandomAccessFile.java:241)
at org.apache.hadoop.hdfs.server.common.StorageInfo.readPropertiesFile(StorageInfo.java:245)
at org.apache.hadoop.hdfs.server.namenode.NNStorage.readProperties(NNStorage.java:627)
at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverStorageDirs(FSImage.java:337)
at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:213)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1080)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:777)
at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:613)
at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:675)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:843)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:822)
at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1543)
at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1611)
2016-09-23 10:33:39,256 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 1
2016-09-23 10:33:39,259 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at master1/10.5.1.160
************************************************************/

 that is the log file in /var/log/hadoop-hdfs/hadoop-cmf-hdfs-NAMENODE-master1.log.out 

 

 

avatar
Master Guru

Since the exception is

 

2016-09-23 10:33:39,098 WARN org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Encountered exception loading fsimage
java.io.FileNotFoundException: /dfs/nn/current/VERSION (Permission denied)
at java.io.RandomAccessFile.open(Native Method)

the NameNode cannot start due to an inability to load fsimage.  fsimage cannot be loaded since there is no VERSION file (hdfs user cannot see it).

 

I would check permissions on your HDFS local disk directories on the NameNode.  To resolve the issue in the exception, make sure that the VERSION file is owned by "hdfs" user... like this:

 

-rw-r--r-- 1 hdfs hdfs 172 Nov 7 14:37 /dfs/nn/current/VERSION

 

I hope that is the only issue; fixing this may lead to other issues due to permissions if something happened.

If the owner of the file is shown as a number, that would indicate the OS cannot resolve the file's owner id with a user.

 

 

avatar
New Contributor

I seem to be having the same problem but i am not able to change the file permissions.  

 

drwxr-xr-x 2 root root 32768 Jan 18 04:35 .
drwxr-xr-x 3 root root 32768 Jan 16 22:53 ..
-rwxr-xr-x 1 root root   321 Jan 16 22:53 fsimage_0000000000000000000
-rwxr-xr-x 1 root root    62 Jan 16 22:53 fsimage_0000000000000000000.md5
-rwxr-xr-x 1 root root     2 Jan 16 22:53 seen_txid
-rwxr-xr-x 1 root root     0 Jan 18 04:35 test
-rwxr-xr-x 1 root root   203 Jan 16 22:53 VERSION
chown: changing ownership of ‘fsimage_0000000000000000000’: Operation not permitted
chown: changing ownership of ‘fsimage_0000000000000000000.md5’: Operation not permitted
chown: changing ownership of ‘seen_txid’: Operation not permitted
chown: changing ownership of ‘test’: Operation not permitted
chown: changing ownership of ‘VERSION’: Operation not permitted
[root@hd-master-1 current]#

note that I am root trying this.  This is a default download of cloudera manager 5.9.1 install with the basic hadoop package running on centos.  no luck.

avatar
check if all the firewall services are turned off, including firewalld and
iptables etc. For me, it worked after I stopped firewalld.

avatar
Contributor

Hi

 

Check the High Availability is configured properly or not.

 

regards

Shafi

avatar

Hi, 

 

did you find any solution ? I am facing the same problem. 

 

Thanks