Support Questions
Find answers, ask questions, and share your expertise

How to troubleshoot slow NameNode start up?

How to troubleshoot slow NameNode start up?

Explorer

Hi,

I have a fairly small setup (HDP 2.6) with roughly 1429 blocks on a 15 TB HDD. The system has 512 GB RAM and 128 cores (256 threads).

Over last few days, I've seen the startup of entire HDP setup go from about 10 minutes to several hours. The culprit turned out to be the NameNode.When the box was first setup without any data, the entire HDP + HCP setup would startup in about 10 minutes (including data and name nodes). We start testing with large volumes of data and over time our block went over 23 million. At this point the system took around 3 hours to start. This was mostly due to NameNode startup time, which seems understandable given the large number of blocks.

However, even after deleting all the folders/blocks and leaving behind just 1429 blocks, the system is still taking over 50 minutes to start name node and come out of Safe Mode automatically.

 

The DataNode logs pause after the Replica Cache line below and then start displaying "Detected pause in JVM or host machine (eg GC)".

 

 

************************************************************/
2019-10-29 00:30:01,711 INFO  datanode.DataNode (LogAdapter.java:info(47)) - STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting DataNode
STARTUP_MSG:   user = hdfs
STARTUP_MSG:   host = xxxx.corp.com/scrambled.private.ip.address
STARTUP_MSG:   args = []
STARTUP_MSG:   version = 2.7.3.2.6.5.1100-53
STARTUP_MSG:   classpath = omitted for clarity
STARTUP_MSG:   build = git@github.com:hortonworks/hadoop.git -r 3091053c59a62c82d82c9f778c48bde5ef0a89a1; compiled by 'jenkins' on 2019-03-13T15:40Z
STARTUP_MSG:   java = 1.8.0_112
************************************************************/
2019-10-29 00:30:01,721 INFO  datanode.DataNode (LogAdapter.java:info(47)) - registered UNIX signal handlers for [TERM, HUP, INT]
2019-10-29 00:30:02,253 INFO  checker.ThrottledAsyncChecker (ThrottledAsyncChecker.java:schedule(122)) - Scheduling a check for [DISK]file:/hadoop/hdfs/data/
2019-10-29 00:30:02,405 INFO  impl.MetricsConfig (MetricsConfig.java:loadFirst(112)) - loaded properties from hadoop-metrics2.properties
2019-10-29 00:30:03,985 INFO  timeline.HadoopTimelineMetricsSink (HadoopTimelineMetricsSink.java:init(82)) - Initializing Timeline metrics sink.
2019-10-29 00:30:03,987 INFO  timeline.HadoopTimelineMetricsSink (HadoopTimelineMetricsSink.java:init(102)) - Identified hostname = xxxx.corp.com, serviceName = datanode
2019-10-29 00:30:04,133 INFO  timeline.HadoopTimelineMetricsSink (HadoopTimelineMetricsSink.java:init(128)) - No suitable collector found.
2019-10-29 00:30:04,140 INFO  impl.MetricsSinkAdapter (MetricsSinkAdapter.java:start(206)) - Sink timeline started
2019-10-29 00:30:04,186 INFO  impl.MetricsSystemImpl (MetricsSystemImpl.java:startTimer(376)) - Scheduled snapshot period at 10 second(s).
2019-10-29 00:30:04,186 INFO  impl.MetricsSystemImpl (MetricsSystemImpl.java:start(192)) - DataNode metrics system started
2019-10-29 00:30:04,189 INFO  datanode.BlockScanner (BlockScanner.java:<init>(180)) - Initialized block scanner with targetBytesPerSec 1048576
2019-10-29 00:30:04,193 INFO  common.Util (Util.java:isDiskStatsEnabled(111)) - dfs.datanode.fileio.profiling.sampling.percentage set to 0. Disabling file IO profiling
2019-10-29 00:30:04,197 INFO  datanode.DataNode (DataNode.java:<init>(444)) - File descriptor passing is enabled.
2019-10-29 00:30:04,197 INFO  datanode.DataNode (DataNode.java:<init>(455)) - Configured hostname is xxxx.corp.com
2019-10-29 00:30:04,197 INFO  common.Util (Util.java:isDiskStatsEnabled(111)) - dfs.datanode.fileio.profiling.sampling.percentage set to 0. Disabling file IO profiling
2019-10-29 00:30:04,198 WARN  conf.Configuration (Configuration.java:getTimeDurationHelper(1659)) - No unit for dfs.datanode.outliers.report.interval(1800000) assuming MILLISECONDS
2019-10-29 00:30:04,204 INFO  datanode.DataNode (DataNode.java:startDataNode(1251)) - Starting DataNode with maxLockedMemory = 0
2019-10-29 00:30:04,221 INFO  datanode.DataNode (DataNode.java:initDataXceiver(1028)) - Opened streaming server at /0.0.0.0:50010
2019-10-29 00:30:04,223 INFO  datanode.DataNode (DataXceiverServer.java:<init>(78)) - Balancing bandwith is 6250000 bytes/s
2019-10-29 00:30:04,223 INFO  datanode.DataNode (DataXceiverServer.java:<init>(79)) - Number threads for balancing is 5
2019-10-29 00:30:04,225 INFO  datanode.DataNode (DataXceiverServer.java:<init>(78)) - Balancing bandwith is 6250000 bytes/s
2019-10-29 00:30:04,225 INFO  datanode.DataNode (DataXceiverServer.java:<init>(79)) - Number threads for balancing is 5
2019-10-29 00:30:04,226 INFO  datanode.DataNode (DataNode.java:initDataXceiver(1043)) - Listening on UNIX domain socket: /var/lib/hadoop-hdfs/dn_socket
2019-10-29 00:30:04,296 INFO  mortbay.log (Slf4jLog.java:info(67)) - Logging to org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog
2019-10-29 00:30:04,304 INFO  server.AuthenticationFilter (AuthenticationFilter.java:constructSecretProvider(296)) - Unable to initialize FileSignerSecretProvider, falling back to use random secrets.
2019-10-29 00:30:04,308 INFO  http.HttpRequestLog (HttpRequestLog.java:getRequestLog(80)) - Http request log for http.requests.datanode is not defined
2019-10-29 00:30:04,313 INFO  http.HttpServer2 (HttpServer2.java:addGlobalFilter(788)) - Added global filter 'safety' (class=org.apache.hadoop.http.HttpServer2$QuotingInputFilter)
2019-10-29 00:30:04,315 INFO  http.HttpServer2 (HttpServer2.java:addFilter(763)) - Added filter static_user_filter (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to context datanode
2019-10-29 00:30:04,316 INFO  http.HttpServer2 (HttpServer2.java:addFilter(771)) - Added filter static_user_filter (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to context static
2019-10-29 00:30:04,316 INFO  http.HttpServer2 (HttpServer2.java:addFilter(771)) - Added filter static_user_filter (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to context logs
2019-10-29 00:30:04,316 INFO  security.HttpCrossOriginFilterInitializer (HttpCrossOriginFilterInitializer.java:initFilter(49)) - CORS filter not enabled. Please set hadoop.http.cross-origin.enabled to 'true' to enable it
2019-10-29 00:30:04,337 INFO  http.HttpServer2 (HttpServer2.java:bindListener(986)) - Jetty bound to port 43272
2019-10-29 00:30:04,338 INFO  mortbay.log (Slf4jLog.java:info(67)) - jetty-6.1.26.hwx
2019-10-29 00:30:04,511 INFO  mortbay.log (Slf4jLog.java:info(67)) - Started HttpServer2$SelectChannelConnectorWithSafeStartup@localhost:43272
2019-10-29 00:30:07,643 INFO  web.DatanodeHttpServer (DatanodeHttpServer.java:start(233)) - Listening HTTP traffic on /0.0.0.0:50075
2019-10-29 00:30:07,647 INFO  util.JvmPauseMonitor (JvmPauseMonitor.java:run(179)) - Starting JVM pause monitor
2019-10-29 00:30:08,366 INFO  datanode.DataNode (DataNode.java:startDataNode(1277)) - dnUserName = hdfs
2019-10-29 00:30:08,366 INFO  datanode.DataNode (DataNode.java:startDataNode(1278)) - supergroup = hdfs
2019-10-29 00:30:08,579 INFO  ipc.CallQueueManager (CallQueueManager.java:<init>(75)) - Using callQueue: class java.util.concurrent.LinkedBlockingQueue scheduler: class org.apache.hadoop.ipc.DefaultRpcScheduler
2019-10-29 00:30:08,734 INFO  ipc.Server (Server.java:run(821)) - Starting Socket Reader #1 for port 8010
2019-10-29 00:30:09,244 INFO  datanode.DataNode (DataNode.java:initIpcServer(941)) - Opened IPC server at /0.0.0.0:8010
2019-10-29 00:30:09,258 INFO  datanode.DataNode (BlockPoolManager.java:refreshNamenodes(152)) - Refresh request received for nameservices: null
2019-10-29 00:30:09,274 INFO  datanode.DataNode (BlockPoolManager.java:doRefreshNamenodes(201)) - Starting BPOfferServices for nameservices: <default>
2019-10-29 00:30:09,430 INFO  datanode.DataNode (BPServiceActor.java:run(761)) - Block pool <registering> (Datanode Uuid unassigned) service to xxxx.corp.com/scrambled.private.ip.address:8020 starting to offer service
2019-10-29 00:30:09,434 INFO  ipc.Server (Server.java:run(1064)) - IPC Server Responder: starting
2019-10-29 00:30:09,434 INFO  ipc.Server (Server.java:run(900)) - IPC Server listener on 8010: starting
2019-10-29 00:30:10,930 INFO  common.Storage (DataStorage.java:getParallelVolumeLoadThreadsNum(384)) - Using 1 threads to upgrade data directories (dfs.datanode.parallel.volumes.load.threads.num=1, dataDirs=1)
2019-10-29 00:30:10,962 INFO  common.Storage (Storage.java:tryLock(776)) - Lock on /hadoop/hdfs/data/in_use.lock acquired by nodename 210295@xxxx.corp.com
2019-10-29 00:30:11,121 INFO  common.Storage (BlockPoolSliceStorage.java:recoverTransitionRead(250)) - Analyzing storage directories for bpid BP-814497463-127.0.0.1-1558792659773
2019-10-29 00:30:11,121 INFO  common.Storage (Storage.java:lock(735)) - Locking is disabled for /hadoop/hdfs/data/current/BP-814497463-127.0.0.1-1558792659773
2019-10-29 00:30:11,139 INFO  datanode.DataNode (DataNode.java:initStorage(1546)) - Setting up storage: nsid=875919329;bpid=BP-814497463-127.0.0.1-1558792659773;lv=-56;nsInfo=lv=-63;cid=CID-49b9105e-fc0d-4ea4-9d2f-caceb95ce4bb;nsid=875919329;c=0;bpid=BP-814497463-127.0.0.1-1558792659773;dnuuid=0aff4a22-3f1a-485b-9aec-46fd881dfab0
2019-10-29 00:30:11,523 INFO  impl.FsDatasetImpl (FsVolumeList.java:addVolume(295)) - Added new volume: DS-ea7ed3be-90ad-4424-a00c-577601814d81
2019-10-29 00:30:11,523 INFO  impl.FsDatasetImpl (FsDatasetImpl.java:addVolume(426)) - Added volume - /hadoop/hdfs/data/current, StorageType: DISK
2019-10-29 00:30:11,527 INFO  impl.FsDatasetImpl (FsDatasetImpl.java:registerMBean(2203)) - Registered FSDatasetState MBean
2019-10-29 00:30:11,711 INFO  checker.ThrottledAsyncChecker (ThrottledAsyncChecker.java:schedule(122)) - Scheduling a check for /hadoop/hdfs/data/current
2019-10-29 00:30:11,719 INFO  checker.DatasetVolumeChecker (DatasetVolumeChecker.java:checkAllVolumes(210)) - Scheduled health check for volume /hadoop/hdfs/data/current
2019-10-29 00:30:11,721 INFO  impl.FsDatasetImpl (FsDatasetImpl.java:addBlockPool(2686)) - Adding block pool BP-814497463-127.0.0.1-1558792659773
2019-10-29 00:30:11,722 INFO  impl.FsDatasetImpl (FsVolumeList.java:run(392)) - Scanning block pool BP-814497463-127.0.0.1-1558792659773 on volume /hadoop/hdfs/data/current...
2019-10-29 00:30:11,898 INFO  impl.FsDatasetImpl (BlockPoolSlice.java:loadDfsUsed(251)) - Cached dfsUsed found for /hadoop/hdfs/data/current/BP-814497463-127.0.0.1-1558792659773/current: 414855315456
2019-10-29 00:30:11,901 INFO  impl.FsDatasetImpl (FsVolumeList.java:run(397)) - Time taken to scan block pool BP-814497463-127.0.0.1-1558792659773 on /hadoop/hdfs/data/current: 178ms
2019-10-29 00:30:11,901 INFO  impl.FsDatasetImpl (FsVolumeList.java:addBlockPool(423)) - Total time to scan all replicas for block pool BP-814497463-127.0.0.1-1558792659773: 180ms
2019-10-29 00:30:11,906 INFO  impl.FsDatasetImpl (FsVolumeList.java:run(188)) - Adding replicas to map for block pool BP-814497463-127.0.0.1-1558792659773 on volume /hadoop/hdfs/data/current...
2019-10-29 00:30:11,906 INFO  impl.BlockPoolSlice 
(BlockPoolSlice.java:readReplicasFromCache(738)) - Replica Cache file: /hadoop/hdfs/data/current/BP-814497463-127.0.0.1-1558792659773/current/replicas doesn't exist
/** This is where I suspect the issue is **/
2019-10-29 00:31:24,607 INFO  timeline.HadoopTimelineMetricsSink (AbstractTimelineMetricsSink.java:getCurrentCollectorHost(278)) - No live collector to send metrics to. Metrics to be sent will be discarded. This message will be skipped for the next 20 times.
2019-10-29 00:33:54,200 INFO  timeline.HadoopTimelineMetricsSink (AbstractTimelineMetricsSink.java:getCurrentCollectorHost(278)) - No live collector to send metrics to. Metrics to be sent will be discarded. This message will be skipped for the next 20 times.
2019-10-29 00:37:24,712 INFO  timeline.HadoopTimelineMetricsSink (AbstractTimelineMetricsSink.java:getCurrentCollectorHost(278)) - No live collector to send metrics to. Metrics to be sent will be discarded. This message will be skipped for the next 20 times.
2019-10-29 00:41:24,844 INFO  timeline.HadoopTimelineMetricsSink (AbstractTimelineMetricsSink.java:getCurrentCollectorHost(278)) - No live collector to send metrics to. Metrics to be sent will be discarded. This message will be skipped for the next 20 times.
2019-10-29 00:45:14,801 INFO  timeline.HadoopTimelineMetricsSink (AbstractTimelineMetricsSink.java:getCurrentCollectorHost(278)) - No live collector to send metrics to. Metrics to be sent will be discarded. This message will be skipped for the next 20 times.
2019-10-29 00:49:04,874 INFO  timeline.HadoopTimelineMetricsSink (AbstractTimelineMetricsSink.java:getCurrentCollectorHost(278)) - No live collector to send metrics to. Metrics to be sent will be discarded. This message will be skipped for the next 20 times.

/** This is followed by several Detected pause in JVM or host machine (eg GC): pause of approximately 1068ms statements **/

 

 

The corresponding NameNode log shows the following and keeps repeating "The reported blocks 0 needs additional 1429 blocks to reach the threshold 1.0000 of total blocks 1428."

 

 

 

************************************************************/
2019-10-29 00:30:20,165 INFO  namenode.NameNode (LogAdapter.java:info(47)) - STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   user = hdfs
STARTUP_MSG:   host = xxxx.corp.com/scrambled.private.ip.address
STARTUP_MSG:   args = []
STARTUP_MSG:   version = 2.7.3.2.6.5.1100-53
STARTUP_MSG:   classpath = removed for brevity
STARTUP_MSG:   build = git@github.com:hortonworks/hadoop.git -r 3091053c59a62c82d82c9f778c48bde5ef0a89a1; compiled by 'jenkins' on 2019-03-13T15:40Z
STARTUP_MSG:   java = 1.8.0_112
************************************************************/
2019-10-29 00:30:20,173 INFO  namenode.NameNode (LogAdapter.java:info(47)) - registered UNIX signal handlers for [TERM, HUP, INT]
2019-10-29 00:30:20,176 INFO  namenode.NameNode (NameNode.java:createNameNode(1624)) - createNameNode []
2019-10-29 00:30:20,370 INFO  impl.MetricsConfig (MetricsConfig.java:loadFirst(112)) - loaded properties from hadoop-metrics2.properties
2019-10-29 00:30:20,567 INFO  timeline.HadoopTimelineMetricsSink (HadoopTimelineMetricsSink.java:init(82)) - Initializing Timeline metrics sink.
2019-10-29 00:30:20,568 INFO  timeline.HadoopTimelineMetricsSink (HadoopTimelineMetricsSink.java:init(102)) - Identified hostname = xxxx.corp.com, serviceName = namenode
2019-10-29 00:30:20,682 INFO  timeline.HadoopTimelineMetricsSink (HadoopTimelineMetricsSink.java:init(128)) - No suitable collector found.
2019-10-29 00:30:20,685 INFO  timeline.HadoopTimelineMetricsSink (HadoopTimelineMetricsSink.java:init(180)) - RPC port properties configured: {8020=client}
2019-10-29 00:30:20,690 INFO  impl.MetricsSinkAdapter (MetricsSinkAdapter.java:start(206)) - Sink timeline started
2019-10-29 00:30:20,743 INFO  impl.MetricsSystemImpl (MetricsSystemImpl.java:startTimer(376)) - Scheduled snapshot period at 10 second(s).
2019-10-29 00:30:20,743 INFO  impl.MetricsSystemImpl (MetricsSystemImpl.java:start(192)) - NameNode metrics system started
2019-10-29 00:30:20,747 INFO  namenode.NameNode (NameNode.java:setClientNamenodeAddress(450)) - fs.defaultFS is hdfs://xxxx.corp.com:8020
2019-10-29 00:30:20,748 INFO  namenode.NameNode (NameNode.java:setClientNamenodeAddress(470)) - Clients are to use xxxx.corp.com:8020 to access this namenode/service.
2019-10-29 00:30:20,866 INFO  util.JvmPauseMonitor (JvmPauseMonitor.java:run(179)) - Starting JVM pause monitor
2019-10-29 00:30:20,874 INFO  hdfs.DFSUtil (DFSUtil.java:httpServerTemplateForNNAndJN(1803)) - Starting Web-server for hdfs at: <a href="http://xxxx.corp.com:50070" target="_blank">http://xxxx.corp.com:50070</a>
2019-10-29 00:30:20,913 INFO  mortbay.log (Slf4jLog.java:info(67)) - Logging to org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog
2019-10-29 00:30:20,923 INFO  server.AuthenticationFilter (AuthenticationFilter.java:constructSecretProvider(296)) - Unable to initialize FileSignerSecretProvider, falling back to use random secrets.
2019-10-29 00:30:20,927 INFO  http.HttpRequestLog (HttpRequestLog.java:getRequestLog(80)) - Http request log for http.requests.namenode is not defined
2019-10-29 00:30:20,931 INFO  http.HttpServer2 (HttpServer2.java:addGlobalFilter(788)) - Added global filter 'safety' (class=org.apache.hadoop.http.HttpServer2$QuotingInputFilter)
2019-10-29 00:30:20,933 INFO  http.HttpServer2 (HttpServer2.java:addFilter(763)) - Added filter static_user_filter (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to context hdfs
2019-10-29 00:30:20,933 INFO  http.HttpServer2 (HttpServer2.java:addFilter(771)) - Added filter static_user_filter (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to context logs
2019-10-29 00:30:20,933 INFO  http.HttpServer2 (HttpServer2.java:addFilter(771)) - Added filter static_user_filter (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to context static
2019-10-29 00:30:20,934 INFO  security.HttpCrossOriginFilterInitializer (HttpCrossOriginFilterInitializer.java:initFilter(49)) - CORS filter not enabled. Please set hadoop.http.cross-origin.enabled to 'true' to enable it
2019-10-29 00:30:20,953 INFO  http.HttpServer2 (NameNodeHttpServer.java:initWebHdfs(93)) - Added filter 'org.apache.hadoop.hdfs.web.AuthFilter' (class=org.apache.hadoop.hdfs.web.AuthFilter)
2019-10-29 00:30:20,954 INFO  http.HttpServer2 (HttpServer2.java:addJerseyResourcePackage(687)) - addJerseyResourcePackage: packageName=org.apache.hadoop.hdfs.server.namenode.web.resources;org.apache.hadoop.hdfs.web.resources, pathSpec=/webhdfs/v1/*
2019-10-29 00:30:20,961 INFO  http.HttpServer2 (HttpServer2.java:bindListener(986)) - Jetty bound to port 50070
2019-10-29 00:30:20,962 INFO  mortbay.log (Slf4jLog.java:info(67)) - jetty-6.1.26.hwx
2019-10-29 00:30:20,986 WARN  mortbay.log (Slf4jLog.java:warn(76)) - Can't reuse /tmp/Jetty_xxxx_corp_com_50070_hdfs____ggu70m, using /tmp/Jetty_xxxx_corp_com_50070_hdfs____ggu70m_2845921744604868870
2019-10-29 00:30:21,121 INFO  mortbay.log (Slf4jLog.java:info(67)) - Started HttpServer2$SelectChannelConnectorWithSafeStartup@xxxx.corp.com:50070
2019-10-29 00:30:21,143 WARN  common.Util (Util.java:stringAsURI(57)) - Path /hadoop/hdfs/namenode should be specified as a URI in configuration files. Please update hdfs configuration.
2019-10-29 00:30:21,143 WARN  common.Util (Util.java:stringAsURI(57)) - Path /hadoop/hdfs/namenode should be specified as a URI in configuration files. Please update hdfs configuration.
2019-10-29 00:30:21,143 WARN  namenode.FSNamesystem (FSNamesystem.java:checkConfiguration(690)) - Only one image storage directory (dfs.namenode.name.dir) configured. Beware of data loss due to lack of redundant storage directories!
2019-10-29 00:30:21,143 WARN  namenode.FSNamesystem (FSNamesystem.java:checkConfiguration(695)) - Only one namespace edits storage directory (dfs.namenode.edits.dir) configured. Beware of data loss due to lack of redundant storage directories!
2019-10-29 00:30:21,148 WARN  common.Util (Util.java:stringAsURI(57)) - Path /hadoop/hdfs/namenode should be specified as a URI in configuration files. Please update hdfs configuration.
2019-10-29 00:30:21,148 WARN  common.Util (Util.java:stringAsURI(57)) - Path /hadoop/hdfs/namenode should be specified as a URI in configuration files. Please update hdfs configuration.
2019-10-29 00:30:21,153 WARN  common.Storage (NNStorage.java:setRestoreFailedStorage(208)) - set restore failed storage to true
2019-10-29 00:30:21,172 INFO  namenode.FSEditLog (FSEditLog.java:newInstance(225)) - Edit logging is async:false
2019-10-29 00:30:21,176 INFO  namenode.FSNamesystem (FSNamesystem.java:<init>(759)) - No KeyProvider found.
2019-10-29 00:30:21,176 INFO  namenode.FSNamesystem (FSNamesystem.java:<init>(765)) - Enabling async auditlog
2019-10-29 00:30:21,178 INFO  namenode.FSNamesystem (FSNamesystem.java:<init>(769)) - fsLock is fair:false
2019-10-29 00:30:21,204 INFO  blockmanagement.HeartbeatManager (HeartbeatManager.java:<init>(91)) - Setting heartbeat recheck interval to 30000 since dfs.namenode.stale.datanode.interval is less than dfs.namenode.heartbeat.recheck-interval
2019-10-29 00:30:21,207 INFO  common.Util (Util.java:isDiskStatsEnabled(111)) - dfs.datanode.fileio.profiling.sampling.percentage set to 0. Disabling file IO profiling
2019-10-29 00:30:21,214 INFO  blockmanagement.DatanodeManager (DatanodeManager.java:<init>(274)) - dfs.block.invalidate.limit=1000
2019-10-29 00:30:21,214 INFO  blockmanagement.DatanodeManager (DatanodeManager.java:<init>(280)) - dfs.namenode.datanode.registration.ip-hostname-check=true
2019-10-29 00:30:21,215 INFO  blockmanagement.BlockManager (InvalidateBlocks.java:printBlockDeletionTime(71)) - dfs.namenode.startup.delay.block.deletion.sec is set to 000:01:00:00.000
2019-10-29 00:30:21,215 INFO  blockmanagement.BlockManager (InvalidateBlocks.java:printBlockDeletionTime(76)) - The block deletion will start around 2019 Oct 29 01:30:21
2019-10-29 00:30:21,217 INFO  util.GSet (LightWeightGSet.java:computeCapacity(395)) - Computing capacity for map BlocksMap
2019-10-29 00:30:21,217 INFO  util.GSet (LightWeightGSet.java:computeCapacity(396)) - VM type       = 64-bit
2019-10-29 00:30:21,218 INFO  util.GSet (LightWeightGSet.java:computeCapacity(397)) - 2.0% max memory 1011.3 MB = 20.2 MB
2019-10-29 00:30:21,218 INFO  util.GSet (LightWeightGSet.java:computeCapacity(402)) - capacity      = 2^21 = 2097152 entries
2019-10-29 00:30:21,231 INFO  blockmanagement.BlockManager (BlockManager.java:createBlockTokenSecretManager(409)) - dfs.block.access.token.enable=true
2019-10-29 00:30:21,231 INFO  blockmanagement.BlockManager (BlockManager.java:createBlockTokenSecretManager(430)) - dfs.block.access.key.update.interval=600 min(s), dfs.block.access.token.lifetime=600 min(s), dfs.encrypt.data.transfer.algorithm=null
2019-10-29 00:30:21,354 INFO  blockmanagement.BlockManager (BlockManager.java:<init>(395)) - defaultReplication         = 1
2019-10-29 00:30:21,354 INFO  blockmanagement.BlockManager (BlockManager.java:<init>(396)) - maxReplication             = 50
2019-10-29 00:30:21,354 INFO  blockmanagement.BlockManager (BlockManager.java:<init>(397)) - minReplication             = 1
2019-10-29 00:30:21,354 INFO  blockmanagement.BlockManager (BlockManager.java:<init>(398)) - maxReplicationStreams      = 2
2019-10-29 00:30:21,354 INFO  blockmanagement.BlockManager (BlockManager.java:<init>(399)) - replicationRecheckInterval = 3000
2019-10-29 00:30:21,354 INFO  blockmanagement.BlockManager (BlockManager.java:<init>(400)) - encryptDataTransfer        = false
2019-10-29 00:30:21,354 INFO  blockmanagement.BlockManager (BlockManager.java:<init>(401)) - maxNumBlocksToLog          = 1000
2019-10-29 00:30:21,360 INFO  namenode.FSNamesystem (FSNamesystem.java:<init>(790)) - fsOwner             = hdfs (auth:SIMPLE)
2019-10-29 00:30:21,360 INFO  namenode.FSNamesystem (FSNamesystem.java:<init>(791)) - supergroup          = hdfs
2019-10-29 00:30:21,360 INFO  namenode.FSNamesystem (FSNamesystem.java:<init>(792)) - isPermissionEnabled = true
2019-10-29 00:30:21,360 INFO  namenode.FSNamesystem (FSNamesystem.java:<init>(803)) - HA Enabled: false
2019-10-29 00:30:21,361 INFO  namenode.FSNamesystem (FSNamesystem.java:<init>(843)) - Append Enabled: true
2019-10-29 00:30:21,388 INFO  util.GSet (LightWeightGSet.java:computeCapacity(395)) - Computing capacity for map INodeMap
2019-10-29 00:30:21,388 INFO  util.GSet (LightWeightGSet.java:computeCapacity(396)) - VM type       = 64-bit
2019-10-29 00:30:21,388 INFO  util.GSet (LightWeightGSet.java:computeCapacity(397)) - 1.0% max memory 1011.3 MB = 10.1 MB
2019-10-29 00:30:21,389 INFO  util.GSet (LightWeightGSet.java:computeCapacity(402)) - capacity      = 2^20 = 1048576 entries
2019-10-29 00:30:21,393 INFO  namenode.FSDirectory (FSDirectory.java:<init>(256)) - ACLs enabled? false
2019-10-29 00:30:21,393 INFO  namenode.FSDirectory (FSDirectory.java:<init>(260)) - XAttrs enabled? true
2019-10-29 00:30:21,393 INFO  namenode.FSDirectory (FSDirectory.java:<init>(268)) - Maximum size of an xattr: 16384
2019-10-29 00:30:21,393 INFO  namenode.NameNode (FSDirectory.java:<init>(321)) - Caching file names occuring more than 10 times
2019-10-29 00:30:21,399 INFO  util.GSet (LightWeightGSet.java:computeCapacity(395)) - Computing capacity for map cachedBlocks
2019-10-29 00:30:21,399 INFO  util.GSet (LightWeightGSet.java:computeCapacity(396)) - VM type       = 64-bit
2019-10-29 00:30:21,400 INFO  util.GSet (LightWeightGSet.java:computeCapacity(397)) - 0.25% max memory 1011.3 MB = 2.5 MB
2019-10-29 00:30:21,400 INFO  util.GSet (LightWeightGSet.java:computeCapacity(402)) - capacity      = 2^18 = 262144 entries
2019-10-29 00:30:21,402 INFO  namenode.FSNamesystem (FSNamesystem.java:<init>(5582)) - dfs.namenode.safemode.threshold-pct = 1.0
2019-10-29 00:30:21,402 INFO  namenode.FSNamesystem (FSNamesystem.java:<init>(5583)) - dfs.namenode.safemode.min.datanodes = 0
2019-10-29 00:30:21,402 INFO  namenode.FSNamesystem (FSNamesystem.java:<init>(5584)) - dfs.namenode.safemode.extension     = 30000
2019-10-29 00:30:21,405 INFO  metrics.TopMetrics (TopMetrics.java:logConf(76)) - NNTop conf: dfs.namenode.top.window.num.buckets = 10
2019-10-29 00:30:21,405 INFO  metrics.TopMetrics (TopMetrics.java:logConf(78)) - NNTop conf: dfs.namenode.top.num.users = 10
2019-10-29 00:30:21,405 INFO  metrics.TopMetrics (TopMetrics.java:logConf(80)) - NNTop conf: dfs.namenode.top.windows.minutes = 1,5,25
2019-10-29 00:30:21,408 INFO  namenode.FSNamesystem (FSNamesystem.java:initRetryCache(971)) - Retry cache on namenode is enabled
2019-10-29 00:30:21,408 INFO  namenode.FSNamesystem (FSNamesystem.java:initRetryCache(979)) - Retry cache will use 0.03 of total heap and retry cache entry expiry time is 600000 millis
2019-10-29 00:30:21,410 INFO  util.GSet (LightWeightGSet.java:computeCapacity(395)) - Computing capacity for map NameNodeRetryCache
2019-10-29 00:30:21,411 INFO  util.GSet (LightWeightGSet.java:computeCapacity(396)) - VM type       = 64-bit
2019-10-29 00:30:21,411 INFO  util.GSet (LightWeightGSet.java:computeCapacity(397)) - 0.029999999329447746% max memory 1011.3 MB = 310.7 KB
2019-10-29 00:30:21,411 INFO  util.GSet (LightWeightGSet.java:computeCapacity(402)) - capacity      = 2^15 = 32768 entries
2019-10-29 00:30:21,456 INFO  common.Storage (Storage.java:tryLock(776)) - Lock on /home/hadoop/hdfs/namenode/in_use.lock acquired by nodename 211070@xxxx.corp.com
2019-10-29 00:30:21,503 INFO  namenode.FileJournalManager (FileJournalManager.java:recoverUnfinalizedSegments(388)) - Recovering unfinalized segments in /home/hadoop/hdfs/namenode/current
2019-10-29 00:30:21,527 INFO  namenode.FileJournalManager (FileJournalManager.java:finalizeLogSegment(142)) - Finalizing edits file /home/hadoop/hdfs/namenode/current/edits_inprogress_0000000000199282266 -> /home/hadoop/hdfs/namenode/current/edits_0000000000199282266-0000000000199282266
2019-10-29 00:30:21,532 INFO  namenode.FSImage (FSImage.java:loadFSImageFile(745)) - Planning to load image: FSImageFile(file=/home/hadoop/hdfs/namenode/current/fsimage_0000000000199282232, cpktTxId=0000000000199282232)
2019-10-29 00:30:21,562 INFO  namenode.FSImageFormatPBINode (FSImageFormatPBINode.java:loadINodeSection(257)) - Loading 1993 INodes.
2019-10-29 00:30:21,724 INFO  namenode.FSImageFormatProtobuf (FSImageFormatProtobuf.java:load(184)) - Loaded FSImage in 0 seconds.
2019-10-29 00:30:21,725 INFO  namenode.FSImage (FSImage.java:loadFSImage(911)) - Loaded image for txid 199282232 from /home/hadoop/hdfs/namenode/current/fsimage_0000000000199282232
2019-10-29 00:30:21,725 INFO  namenode.FSImage (FSImage.java:loadEdits(849)) - Reading org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream@63fd4873 expecting start txid #199282233
2019-10-29 00:30:21,726 INFO  namenode.FSImage (FSEditLogLoader.java:loadFSEdits(142)) - Start loading edits file /home/hadoop/hdfs/namenode/current/edits_0000000000199282233-0000000000199282265
2019-10-29 00:30:21,729 INFO  namenode.RedundantEditLogInputStream (RedundantEditLogInputStream.java:nextOp(177)) - Fast-forwarding stream '/home/hadoop/hdfs/namenode/current/edits_0000000000199282233-0000000000199282265' to transaction ID 199282233
2019-10-29 00:30:21,752 INFO  namenode.FSImage (FSEditLogLoader.java:loadFSEdits(145)) - Edits file /home/hadoop/hdfs/namenode/current/edits_0000000000199282233-0000000000199282265 of size 1048576 edits # 33 loaded in 0 seconds
2019-10-29 00:30:21,752 INFO  namenode.FSImage (FSImage.java:loadEdits(849)) - Reading org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream@1e11bc55 expecting start txid #199282266
2019-10-29 00:30:21,752 INFO  namenode.FSImage (FSEditLogLoader.java:loadFSEdits(142)) - Start loading edits file /home/hadoop/hdfs/namenode/current/edits_0000000000199282266-0000000000199282266
2019-10-29 00:30:21,752 INFO  namenode.RedundantEditLogInputStream (RedundantEditLogInputStream.java:nextOp(177)) - Fast-forwarding stream '/home/hadoop/hdfs/namenode/current/edits_0000000000199282266-0000000000199282266' to transaction ID 199282233
2019-10-29 00:30:21,753 INFO  namenode.FSImage (FSEditLogLoader.java:loadFSEdits(145)) - Edits file /home/hadoop/hdfs/namenode/current/edits_0000000000199282266-0000000000199282266 of size 1048576 edits # 1 loaded in 0 seconds
2019-10-29 00:30:21,753 INFO  namenode.FSNamesystem (FSNamesystem.java:loadFSImage(1083)) - Need to save fs image? false (staleImage=false, haEnabled=false, isRollingUpgrade=false)
2019-10-29 00:30:21,754 INFO  namenode.FSEditLog (FSEditLog.java:startLogSegment(1294)) - Starting log segment at 199282267
2019-10-29 00:30:21,880 INFO  namenode.NameCache (NameCache.java:initialized(143)) - initialized with 8 entries 214 lookups
2019-10-29 00:30:21,881 INFO  namenode.FSNamesystem (FSNamesystem.java:loadFromDisk(731)) - Finished loading FSImage in 465 msecs
2019-10-29 00:30:22,002 INFO  namenode.NameNode (NameNodeRpcServer.java:<init>(428)) - RPC server is binding to xxxx.corp.com:8020
2019-10-29 00:30:22,007 INFO  ipc.CallQueueManager (CallQueueManager.java:<init>(75)) - Using callQueue: class java.util.concurrent.LinkedBlockingQueue scheduler: class org.apache.hadoop.ipc.DefaultRpcScheduler
2019-10-29 00:30:22,015 INFO  ipc.Server (Server.java:run(821)) - Starting Socket Reader #1 for port 8020
2019-10-29 00:30:22,049 INFO  namenode.FSNamesystem (FSNamesystem.java:registerMBean(6517)) - Registered FSNamesystemState MBean
2019-10-29 00:30:22,050 WARN  common.Util (Util.java:stringAsURI(57)) - Path /hadoop/hdfs/namenode should be specified as a URI in configuration files. Please update hdfs configuration.
2019-10-29 00:30:22,064 INFO  namenode.LeaseManager (LeaseManager.java:getNumUnderConstructionBlocks(139)) - Number of blocks under construction: 0
2019-10-29 00:30:22,065 INFO  hdfs.StateChange (FSNamesystem.java:reportStatus(5952)) - STATE* Safe mode ON.
The reported blocks 0 needs additional 1429 blocks to reach the threshold 1.0000 of total blocks 1428.
The number of live datanodes 0 has reached the minimum number 0. Safe mode will be turned off automatically once the thresholds have been reached.
2019-10-29 00:30:22,075 INFO  blockmanagement.DatanodeDescriptor (DatanodeDescriptor.java:updateHeartbeatState(401)) - Number of failed storage changes from 0 to 0
2019-10-29 00:30:22,077 INFO  block.BlockTokenSecretManager (BlockTokenSecretManager.java:updateKeys(222)) - Updating block keys
2019-10-29 00:30:22,095 INFO  ipc.Server (Server.java:run(1064)) - IPC Server Responder: starting
2019-10-29 00:30:22,095 INFO  ipc.Server (Server.java:run(900)) - IPC Server listener on 8020: starting
2019-10-29 00:30:22,115 INFO  namenode.NameNode (NameNode.java:startCommonServices(885)) - NameNode RPC up at: xxxx.corp.com/scrambled.private.ip.address:8020
2019-10-29 00:30:22,116 INFO  namenode.FSNamesystem (FSNamesystem.java:startActiveServices(1191)) - Starting services required for active state
2019-10-29 00:30:22,116 INFO  namenode.FSDirectory (FSDirectory.java:updateCountForQuota(708)) - Initializing quota with 4 thread(s)
2019-10-29 00:30:22,127 INFO  namenode.FSDirectory (FSDirectory.java:updateCountForQuota(717)) - Quota initialization completed in 11 milliseconds
name space=1995
storage space=6473571992
storage types=RAM_DISK=0, SSD=0, DISK=0, ARCHIVE=0
2019-10-29 00:30:22,131 INFO  blockmanagement.CacheReplicationMonitor (CacheReplicationMonitor.java:run(161)) - Starting CacheReplicationMonitor with interval 30000 milliseconds
2019-10-29 00:30:22,525 INFO  fs.TrashPolicyDefault (TrashPolicyDefault.java:<init>(228)) - The configured checkpoint interval is 0 minutes. Using an interval of 60 minutes that is used for deletion instead
2019-10-29 00:31:40,751 INFO  timeline.HadoopTimelineMetricsSink (AbstractTimelineMetricsSink.java:getCurrentCollectorHost(278)) - No live collector to send metrics to. Metrics to be sent will be discarded. This message will be skipped for the next 20 times.
2019-10-29 00:31:52,817 INFO  ipc.Server (Server.java:logException(2428)) - IPC Server handler 29 on 8020, call org.apache.hadoop.hdfs.protocol.ClientProtocol.mkdirs from scrambled.private.ip.address:55080 Call#143 Retry#0: org.apache.hadoop.hdfs.server.namenode.SafeModeException: Cannot create directory /tmp/hive/anonymous/393adcb5-cd66-43a7-ab38-e759f5daf88e. Name node is in safe mode.
The reported blocks 0 needs additional 1429 blocks to reach the threshold 1.0000 of total blocks 1428.
The number of live datanodes 0 has reached the minimum number 0. Safe mode will be turned off automatically once the thresholds have been reached.
2019-10-29 00:31:52,823 INFO  ipc.Server (Server.java:logException(2428)) - IPC Server handler 24 on 8020, call org.apache.hadoop.hdfs.protocol.ClientProtocol.delete from scrambled.private.ip.address:55080 Call#144 Retry#0: org.apache.hadoop.hdfs.server.namenode.SafeModeException: Cannot delete /tmp/hive/anonymous/393adcb5-cd66-43a7-ab38-e759f5daf88e. Name node is in safe mode.
The reported blocks 0 needs additional 1429 blocks to reach the threshold 1.0000 of total blocks 1428.
The number of live datanodes 0 has reached the minimum number 0. Safe mode will be turned off automatically once the thresholds have been reached.
2019-10-29 00:31:52,921 INFO  ipc.Server (Server.java:logException(2428)) - IPC Server handler 44 on 8020, call org.apache.hadoop.hdfs.protocol.ClientProtocol.mkdirs from scrambled.private.ip.address:55084 Call#149 Retry#0: org.apache.hadoop.hdfs.server.namenode.SafeModeException: Cannot create directory /tmp/hive/anonymous/4b583a41-b3c9-4a6a-b8ee-afe8411c3861. Name node is in safe mode.
The reported blocks 0 needs additional 1429 blocks to reach the threshold 1.0000 of total blocks 1428.
The number of live datanodes 0 has reached the minimum number 0. Safe mode will be turned off automatically once the thresholds have been reached.
2019-10-29 00:31:52,926 INFO  ipc.Server (Server.java:logException(2428)) - IPC Server handler 68 on 8020, call org.apache.hadoop.hdfs.protocol.ClientProtocol.delete from scrambled.private.ip.address:55084 Call#150 Retry#0: org.apache.hadoop.hdfs.server.namenode.SafeModeException: Cannot delete /tmp/hive/anonymous/4b583a41-b3c9-4a6a-b8ee-afe8411c3861. Name node is in safe mode.
The reported blocks 0 needs additional 1429 blocks to reach the threshold 1.0000 of total blocks 1428.
The number of live datanodes 0 has reached the minimum number 0. Safe mode will be turned off automatically once the thresholds have been reached.
2019-10-29 00:31:54,849 INFO  ipc.Server (Server.java:logException(2428)) - IPC Server handler 38 on 8020, call org.apache.hadoop.hdfs.protocol.ClientProtocol.mkdirs from scrambled.private.ip.address:55088 Call#4 Retry#0: org.apache.hadoop.hdfs.server.namenode.SafeModeException: Cannot create directory /tmp/hive/ambari-qa/c35cf654-0bcf-48ff-8873-0d1447404f8c. Name node is in safe mode.
The reported blocks 0 needs additional 1429 blocks to reach the threshold 1.0000 of total blocks 1428.
The number of live datanodes 0 has reached the minimum number 0. Safe mode will be turned off automatically once the thresholds have been reached.
2019-10-29 00:33:00,753 INFO  timeline.HadoopTimelineMetricsSink (AbstractTimelineMetricsSink.java:getCurrentCollectorHost(278)) - No live collector to send metrics to. Metrics to be sent will be discarded. This message will be skipped for the next 20 times.
2019-10-29 00:34:10,752 INFO  timeline.HadoopTimelineMetricsSink (AbstractTimelineMetricsSink.java:getCurrentCollectorHost(278)) - No live collector to send metrics to. Metrics to be sent will be discarded. This message will be skipped for the next 20 times.
2019-10-29 00:34:52,597 INFO  ipc.Server (Server.java:logException(2428)) - IPC Server handler 24 on 8020, call org.apache.hadoop.hdfs.protocol.ClientProtocol.mkdirs from scrambled.private.ip.address:56002 Call#155 Retry#0: org.apache.hadoop.hdfs.server.namenode.SafeModeException: Cannot create directory /tmp/hive/anonymous/5ba0c62f-b60b-4ecb-be71-42137e023acd. Name node is in safe mode.
The reported blocks 0 needs additional 1429 blocks to reach the threshold 1.0000 of total blocks 1428.
The number of live datanodes 0 has reached the minimum number 0. Safe mode will be turned off automatically once the thresholds have been reached.
2019-10-29 00:34:52,601 INFO  ipc.Server (Server.java:logException(2428)) - IPC Server handler 29 on 8020, call org.apache.hadoop.hdfs.protocol.ClientProtocol.delete from scrambled.private.ip.address:56002 Call#156 Retry#0: org.apache.hadoop.hdfs.server.namenode.SafeModeException: Cannot delete /tmp/hive/anonymous/5ba0c62f-b60b-4ecb-be71-42137e023acd. Name node is in safe mode.
The reported blocks 0 needs additional 1429 blocks to reach the threshold 1.0000 of total blocks 1428.
The number of live datanodes 0 has reached the minimum number 0. Safe mode will be turned off automatically once the thresholds have been reached.
2019-10-29 00:34:52,693 INFO  ipc.Server (Server.java:logException(2428)) - IPC Server handler 68 on 8020, call org.apache.hadoop.hdfs.protocol.ClientProtocol.mkdirs from scrambled.private.ip.address:56006 Call#161 Retry#0: org.apache.hadoop.hdfs.server.namenode.SafeModeException: Cannot create directory /tmp/hive/anonymous/2de6c6f0-f9dc-4702-a350-867f384a3e9f. Name node is in safe mode.
The reported blocks 0 needs additional 1429 blocks to reach the threshold 1.0000 of total blocks 1428.
The number of live datanodes 0 has reached the minimum number 0. Safe mode will be turned off automatically once the thresholds have been reached.
2019-10-29 00:34:52,697 INFO  ipc.Server (Server.java:logException(2428)) - IPC Server handler 18 on 8020, call org.apache.hadoop.hdfs.protocol.ClientProtocol.delete from scrambled.private.ip.address:56006 Call#162 Retry#0: org.apache.hadoop.hdfs.server.namenode.SafeModeException: Cannot delete /tmp/hive/anonymous/2de6c6f0-f9dc-4702-a350-867f384a3e9f. Name node is in safe mode.
The reported blocks 0 needs additional 1429 blocks to reach the threshold 1.0000 of total blocks 1428.
The number of live datanodes 0 has reached the minimum number 0. Safe mode will be turned off automatically once the thresholds have been reached.
2019-10-29 00:34:54,626 INFO  ipc.Server (Server.java:logException(2428)) - IPC Server handler 4 on 8020, call org.apache.hadoop.hdfs.protocol.ClientProtocol.mkdirs from scrambled.private.ip.address:56010 Call#4 Retry#0: org.apache.hadoop.hdfs.server.namenode.SafeModeException: Cannot create directory /tmp/hive/ambari-qa/e051ac6d-5230-464a-a5fa-b3bb58d035f7. Name node is in safe mode.
The reported blocks 0 needs additional 1429 blocks to reach the threshold 1.0000 of total blocks 1428.
The number of live datanodes 0 has reached the minimum number 0. Safe mode will be turned off automatically once the thresholds have been reached.
2019-10-29 00:36:30,748 INFO  timeline.HadoopTimelineMetricsSink (AbstractTimelineMetricsSink.java:getCurrentCollectorHost(278)) - No live collector to send metrics to. Metrics to be sent will be discarded. This message will be skipped for the next 20 times.
2019-10-29 00:37:50,753 INFO  timeline.HadoopTimelineMetricsSink (AbstractTimelineMetricsSink.java:getCurrentCollectorHost(278)) - No live collector to send metrics to. Metrics to be sent will be discarded. This message will be skipped for the next 20 times.
2019-10-29 00:37:52,617 INFO  ipc.Server (Server.java:logException(2428)) - IPC Server handler 29 on 8020, call org.apache.hadoop.hdfs.protocol.ClientProtocol.mkdirs from scrambled.private.ip.address:56826 Call#167 Retry#0: org.apache.hadoop.hdfs.server.namenode.SafeModeException: Cannot create directory /tmp/hive/anonymous/cc6ebafd-18c8-46ea-a6d7-62397735efb0. Name node is in safe mode.
The reported blocks 0 needs additional 1429 blocks to reach the threshold 1.0000 of total blocks 1428.
The number of live datanodes 0 has reached the minimum number 0. Safe mode will be turned off automatically once the thresholds have been reached.
2019-10-29 00:37:52,622 INFO  ipc.Server (Server.java:logException(2428)) - IPC Server handler 49 on 8020, call org.apache.hadoop.hdfs.protocol.ClientProtocol.delete from scrambled.private.ip.address:56826 Call#168 Retry#0: org.apache.hadoop.hdfs.server.namenode.SafeModeException: Cannot delete /tmp/hive/anonymous/cc6ebafd-18c8-46ea-a6d7-62397735efb0. Name node is in safe mode.
The reported blocks 0 needs additional 1429 blocks to reach the threshold 1.0000 of total blocks 1428.
The number of live datanodes 0 has reached the minimum number 0. Safe mode will be turned off automatically once the thresholds have been reached.
2019-10-29 00:37:52,708 INFO  ipc.Server (Server.java:logException(2428)) - IPC Server handler 40 on 8020, call org.apache.hadoop.hdfs.protocol.ClientProtocol.mkdirs from scrambled.private.ip.address:56830 Call#173 Retry#0: org.apache.hadoop.hdfs.server.namenode.SafeModeException: Cannot create directory /tmp/hive/anonymous/6552fb22-a79a-47da-989a-87c029dad677. Name node is in safe mode.
The reported blocks 0 needs additional 1429 blocks to reach the threshold 1.0000 of total blocks 1428.
The number of live datanodes 0 has reached the minimum number 0. Safe mode will be turned off automatically once the thresholds have been reached.
2019-10-29 00:37:52,711 INFO  ipc.Server (Server.java:logException(2428)) - IPC Server handler 25 on 8020, call org.apache.hadoop.hdfs.protocol.ClientProtocol.delete from scrambled.private.ip.address:56830 Call#174 Retry#0: org.apache.hadoop.hdfs.server.namenode.SafeModeException: Cannot delete /tmp/hive/anonymous/6552fb22-a79a-47da-989a-87c029dad677. Name node is in safe mode.
The reported blocks 0 needs additional 1429 blocks to reach the threshold 1.0000 of total blocks 1428.
The number of live datanodes 0 has reached the minimum number 0. Safe mode will be turned off automatically once the thresholds have been reached.
2019-10-29 00:37:54,749 INFO  ipc.Server (Server.java:logException(2428)) - IPC Server handler 59 on 8020, call org.apache.hadoop.hdfs.protocol.ClientProtocol.mkdirs from scrambled.private.ip.address:56834 Call#4 Retry#0: org.apache.hadoop.hdfs.server.namenode.SafeModeException: Cannot create directory /tmp/hive/ambari-qa/1365213f-4766-468a-8a1f-9d5100cc5bad. Name node is in safe mode.
The reported blocks 0 needs additional 1429 blocks to reach the threshold 1.0000 of total blocks 1428.
The number of live datanodes 0 has reached the minimum number 0. Safe mode will be turned off automatically once the thresholds have been reached.

 

What exactly is going on and how do I go about troubleshooting this? I tried increasing the HeapSize for both NameNode and DataNode as well. The GC message from DataNode disappear but I still see them in NameNode logs when it reads iNODES.

 

Any help will be greatly appreciated.

5 REPLIES 5
Highlighted

Re: How to troubleshoot slow NameNode start up?

Mentor

@Elephanta 

The name node start is usually faster than what you are experiencing frm the logs I am seeing maybe the root cause. During startup the namenode read the fsimage which is the last best cluster images usually a combination of fsimage and the last edits log.  You will need to combine the current fsimage and the edits log in the below steps. 

 

 

Name node is in safe mode.

 

 

When the name node is in this state it's a security future that disables any change that the namenode metadata that it can't register,/record to note all the cluster wise location, state, permissions,ownership etc are stored in the metadata which is stored in the name node. You will need to do the following steps as root user while the cluster is freezing during startup.

# su - hdfs

Get the current state

$ hdfs hdfsadmin -safemode get 

That will confirm that the namenode is in safe mode, you will need to force the creation of a hdfs image point in time by saving the namespace to create a new fsimage
$ hdfs dfsadmin -saveNamespace

confirm safemode is off

$ hdfs hdfsadmin -safemode get 

This time the output should be Off  leave safe mode and next time the startup should be much faster
$ hdfs hdfsadmin -safemode get
You might need to tune the memory allocated to the name node it seems the numebr of files to manage has increased hence the need to reconfigure the memory  have a look at the below link for guidance..
Configuring NameNode Heap Size to estimate the memory required for the namenode 

Highlighted

Re: How to troubleshoot slow NameNode start up?

Explorer

@SheltonThank you for your response. Much appreciated. Thank you for the name node heap tuning link as well. I am busy checking it out and will post an updated.

 

During my investigation yesterday, I also found out there were millions of pending deletes. As I mentioned, I deleted about 23 million blocks down to about 1400. The JMX console says there still are about 11 million blocks still waiting to be deleted - I'm assuming the blockreport would also account for these until they are actually deleted? Could that be a reason why its still slow?

Its deleting about 600 nodes/second which seems okay, but I guess it will be a while until everything is deleted. Is this normal?

 

 

Highlighted

Re: How to troubleshoot slow NameNode start up?

Mentor

@Elephanta 

Any updates?

Why the high number of block deletes are those under replicated blocks?  There is many parameters to evaluate like the number of data nodes, replication factor, and rack awareness!

I would be happy to get your feedback 

Highlighted

Re: How to troubleshoot slow NameNode start up?

Explorer

Yes, it took enormously long to get rid of those deleted block and it only finished a few hours ago.

No..those blocks were essentially a massive folder (almost 1 TB) that was deleted by someone. Its slow because its just a single node and a single machine setup. Basically 1 name node, 1 secondary name node, 1 data node all all on the same box (HDP 2.6 + HCP).

 

Are there any tuning/diagnostic guides available with regards to the parameters that you mentioned?

 

Thank you for your help and assistance so far.

 

 

Highlighted

Re: How to troubleshoot slow NameNode start up?

Mentor

@Elephanta 

 

Okay, now with that information I get a better understanding and picture. By default, HDP 2.6 has a replication factor of 3  so it's looking to place the other 2 copies on different data nodes that the existing unless you create new files with a replication factor of 1 you will continue to get the unreplicated block errors but now that you know it's manageable.

Maybe next time you delete files in hdfs use the -skipTrash option

hdfs dfs -rm -skipTrash /path/to/hdfs/file/to/remove/permanently

or emtpyting the existing .Trash

Options:

Change the replication factor of a file  at creatio:

hdfs dfs –setrep –w 1  /user/hdfs/file.txt

 

Or  change the replication factor of a directory 

hdfs dfs -setrep -R 1  /user/hdfs/your_dir


Changing the replication factor for a directory will only affect the existing files and the new files under the directory will get created with the default replication factor  in dfs.replication from hdfs-site.xml  maybe in your case this is what you should change to 1 as this takes effect cluster-wide for your Dev environment
Happy hadooping