Support Questions

Find answers, ask questions, and share your expertise

Name node not starting after computer restarts

avatar
Explorer

Hi al,

 

I have set up a 3 node Cloudera 5.14 cluster on Azure with Ubuntu 14 machines and there seems an issue when I start the cluster.

 

If I shutdown the Azure instances for the day and start the Azure instances next morning and when I try to start the cloudera cluster, all services are getting up except name node. The name node is not starting up and shows the error message " the name node is not formatted". The work aroud I do is format the name node every morning (yes, it is a test cluster) and do the rest of the job. Any clue what is going wrong?

 

STARTUP_MSG:   java = 1.7.0_67
************************************************************/
2018-04-08 23:29:41,562 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: registered UNIX signal handlers for [TERM, HUP, INT]
2018-04-08 23:29:41,565 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: createNameNode []
2018-04-08 23:29:41,945 INFO org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from hadoop-metrics2.properties
2018-04-08 23:29:42,067 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s).
2018-04-08 23:29:42,067 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NameNode metrics system started
2018-04-08 23:29:42,097 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: fs.defaultFS is hdfs://udp-cdh-node1:8020
2018-04-08 23:29:42,097 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: Clients are to use udp-cdh-node1:8020 to access this namenode/service.
2018-04-08 23:29:42,404 INFO org.apache.hadoop.util.JvmPauseMonitor: Starting JVM pause monitor
2018-04-08 23:29:42,418 INFO org.apache.hadoop.hdfs.DFSUtil: Starting Web-server for hdfs at: http://udp-cdh-node1:50070
2018-04-08 23:29:42,468 INFO org.mortbay.log: Logging to org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog
2018-04-08 23:29:42,476 INFO org.apache.hadoop.security.authentication.server.AuthenticationFilter: Unable to initialize FileSignerSecretProvider, falling back to use random secrets.
2018-04-08 23:29:42,483 INFO org.apache.hadoop.http.HttpRequestLog: Http request log for http.requests.namenode is not defined
2018-04-08 23:29:42,495 INFO org.apache.hadoop.http.HttpServer2: Added global filter 'safety' (class=org.apache.hadoop.http.HttpServer2$QuotingInputFilter)
2018-04-08 23:29:42,499 INFO org.apache.hadoop.http.HttpServer2: Added filter static_user_filter (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to context hdfs
2018-04-08 23:29:42,499 INFO org.apache.hadoop.http.HttpServer2: Added filter static_user_filter (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to context logs
2018-04-08 23:29:42,499 INFO org.apache.hadoop.http.HttpServer2: Added filter static_user_filter (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to context static
2018-04-08 23:29:42,716 INFO org.apache.hadoop.http.HttpServer2: Added filter 'org.apache.hadoop.hdfs.web.AuthFilter' (class=org.apache.hadoop.hdfs.web.AuthFilter)
2018-04-08 23:29:42,718 INFO org.apache.hadoop.http.HttpServer2: addJerseyResourcePackage: packageName=org.apache.hadoop.hdfs.server.namenode.web.resources;org.apache.hadoop.hdfs.web.resources, pathSpec=/webhdfs/v1/*
2018-04-08 23:29:42,734 INFO org.apache.hadoop.http.HttpServer2: Jetty bound to port 50070
2018-04-08 23:29:42,734 INFO org.mortbay.log: jetty-6.1.26.cloudera.4
2018-04-08 23:29:43,038 INFO org.mortbay.log: Started HttpServer2$SelectChannelConnectorWithSafeStartup@udp-cdh-node1:50070
2018-04-08 23:29:43,073 WARN org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Only one image storage directory (dfs.namenode.name.dir) configured. Beware of data loss due to lack of redundant storage directories!
2018-04-08 23:29:43,073 WARN org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Only one namespace edits storage directory (dfs.namenode.edits.dir) configured. Beware of data loss due to lack of redundant storage directories!
2018-04-08 23:29:43,116 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Edit logging is async:true
2018-04-08 23:29:43,129 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: No KeyProvider found.
2018-04-08 23:29:43,139 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: fsLock is fair: true
2018-04-08 23:29:43,303 INFO org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: dfs.block.invalidate.limit=1000
2018-04-08 23:29:43,303 INFO org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: dfs.namenode.datanode.registration.ip-hostname-check=true
2018-04-08 23:29:43,304 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: dfs.namenode.startup.delay.block.deletion.sec is set to 000:00:00:00.000
2018-04-08 23:29:43,305 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: The block deletion will start around 2018 Apr 08 23:29:43
2018-04-08 23:29:43,307 INFO org.apache.hadoop.util.GSet: Computing capacity for map BlocksMap
2018-04-08 23:29:43,307 INFO org.apache.hadoop.util.GSet: VM type       = 64-bit
2018-04-08 23:29:43,310 INFO org.apache.hadoop.util.GSet: 2.0% max memory 1.0 GB = 21.5 MB
2018-04-08 23:29:43,310 INFO org.apache.hadoop.util.GSet: capacity      = 2^21 = 2097152 entries
2018-04-08 23:29:43,317 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: dfs.block.access.token.enable=false
2018-04-08 23:29:43,319 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: defaultReplication         = 3
2018-04-08 23:29:43,319 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: maxReplication             = 512
2018-04-08 23:29:43,319 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: minReplication             = 1
2018-04-08 23:29:43,320 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: maxReplicationStreams      = 20
2018-04-08 23:29:43,320 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: replicationRecheckInterval = 3000
2018-04-08 23:29:43,320 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: encryptDataTransfer        = false
2018-04-08 23:29:43,320 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: maxNumBlocksToLog          = 1000
2018-04-08 23:29:43,327 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: fsOwner             = hdfs (auth:SIMPLE)
2018-04-08 23:29:43,327 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: supergroup          = supergroup
2018-04-08 23:29:43,327 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: isPermissionEnabled = true
2018-04-08 23:29:43,327 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: HA Enabled: false
2018-04-08 23:29:43,329 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Append Enabled: true
2018-04-08 23:29:43,506 INFO org.apache.hadoop.util.GSet: Computing capacity for map INodeMap
2018-04-08 23:29:43,506 INFO org.apache.hadoop.util.GSet: VM type       = 64-bit
2018-04-08 23:29:43,506 INFO org.apache.hadoop.util.GSet: 1.0% max memory 1.0 GB = 10.7 MB
2018-04-08 23:29:43,507 INFO org.apache.hadoop.util.GSet: capacity      = 2^20 = 1048576 entries
2018-04-08 23:29:43,507 INFO org.apache.hadoop.hdfs.server.namenode.FSDirectory: POSIX ACL inheritance enabled? false
2018-04-08 23:29:43,508 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: Caching file names occuring more than 10 times
2018-04-08 23:29:43,514 INFO org.apache.hadoop.hdfs.server.namenode.snapshot.SnapshotManager: Loaded config captureOpenFiles: false, skipCaptureAccessTimeOnlyChange: false, snapshotDiffAllowSnapRootDescendant: true
2018-04-08 23:29:43,519 INFO org.apache.hadoop.util.GSet: Computing capacity for map cachedBlocks
2018-04-08 23:29:43,520 INFO org.apache.hadoop.util.GSet: VM type       = 64-bit
2018-04-08 23:29:43,520 INFO org.apache.hadoop.util.GSet: 0.25% max memory 1.0 GB = 2.7 MB
2018-04-08 23:29:43,520 INFO org.apache.hadoop.util.GSet: capacity      = 2^18 = 262144 entries
2018-04-08 23:29:43,523 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: dfs.namenode.safemode.threshold-pct = 0.9990000128746033
2018-04-08 23:29:43,523 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: dfs.namenode.safemode.min.datanodes = 1
2018-04-08 23:29:43,524 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: dfs.namenode.safemode.extension     = 30000
2018-04-08 23:29:43,527 INFO org.apache.hadoop.hdfs.server.namenode.top.metrics.TopMetrics: NNTop conf: dfs.namenode.top.window.num.buckets = 10
2018-04-08 23:29:43,528 INFO org.apache.hadoop.hdfs.server.namenode.top.metrics.TopMetrics: NNTop conf: dfs.namenode.top.num.users = 10
2018-04-08 23:29:43,528 INFO org.apache.hadoop.hdfs.server.namenode.top.metrics.TopMetrics: NNTop conf: dfs.namenode.top.windows.minutes = 1,5,25
2018-04-08 23:29:43,532 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Retry cache on namenode is enabled
2018-04-08 23:29:43,532 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Retry cache will use 0.03 of total heap and retry cache entry expiry time is 600000 millis
2018-04-08 23:29:43,535 INFO org.apache.hadoop.util.GSet: Computing capacity for map NameNodeRetryCache
2018-04-08 23:29:43,535 INFO org.apache.hadoop.util.GSet: VM type       = 64-bit
2018-04-08 23:29:43,535 INFO org.apache.hadoop.util.GSet: 0.029999999329447746% max memory 1.0 GB = 330.2 KB
2018-04-08 23:29:43,535 INFO org.apache.hadoop.util.GSet: capacity      = 2^15 = 32768 entries
2018-04-08 23:29:43,538 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: ACLs enabled? false
2018-04-08 23:29:43,538 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: XAttrs enabled? true
2018-04-08 23:29:43,538 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Maximum size of an xattr: 16384
2018-04-08 23:29:43,548 INFO org.apache.hadoop.hdfs.server.common.Storage: Lock on /mnt/dfs/nn/in_use.lock acquired by nodename 5705@udp-cdh-node1
2018-04-08 23:29:43,552 WARN org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Encountered exception loading fsimage
java.io.IOException: NameNode is not formatted.
	at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:232)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1150)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:797)
	at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:614)
	at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:676)
	at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:844)
	at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:823)
	at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1547)
	at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1615)
2018-04-08 23:29:43,567 INFO org.mortbay.log: Stopped HttpServer2$SelectChannelConnectorWithSafeStartup@udp-cdh-node1:50070
2018-04-08 23:29:43,568 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping NameNode metrics system...
2018-04-08 23:29:43,568 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NameNode metrics system stopped.
2018-04-08 23:29:43,569 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NameNode metrics system shutdown complete.
2018-04-08 23:29:43,569 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: Failed to start namenode.
java.io.IOException: NameNode is not formatted.
	at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:232)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1150)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:797)
	at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:614)
	at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:676)
	at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:844)
	at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:823)
	at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1547)
	at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1615)
1 ACCEPTED SOLUTION

avatar
Explorer
Issue got resolved. Azure volume mount problem. The NN directories were pointed to default location and lost each time when the instance is restarted

View solution in original post

4 REPLIES 4

avatar
Explorer
Issue got resolved. Azure volume mount problem. The NN directories were pointed to default location and lost each time when the instance is restarted

avatar
Expert Contributor

Glad your problem is solved. Thanks for letting the community know the solution. Hopefully if someone else ends up in your situation they will find your answer helpful.

avatar
New Contributor

Hey arunvpy ,

can you explain it in brief what you did to get rid of this problem.

avatar
Contributor

I had the same issue with Cloudera on Azure. 

Quickly to reproduce...

1) just shutdown (via Cloudera manager all the services)
2) then shutdown the VMs via Altus portal

On startup... the HDFS (namenode) and all dependent services began reporting errors. 

The problem went away after I switching to the HA configuration via bootstrap-remote.