Created on 10-28-2019 04:15 PM - last edited on 10-28-2019 08:49 PM by ask_bill_brooks
Hi,
I have a fairly small setup (HDP 2.6) with roughly 1429 blocks on a 15 TB HDD. The system has 512 GB RAM and 128 cores (256 threads).
Over last few days, I've seen the startup of entire HDP setup go from about 10 minutes to several hours. The culprit turned out to be the NameNode.When the box was first setup without any data, the entire HDP + HCP setup would startup in about 10 minutes (including data and name nodes). We start testing with large volumes of data and over time our block went over 23 million. At this point the system took around 3 hours to start. This was mostly due to NameNode startup time, which seems understandable given the large number of blocks.
However, even after deleting all the folders/blocks and leaving behind just 1429 blocks, the system is still taking over 50 minutes to start name node and come out of Safe Mode automatically.
The DataNode logs pause after the Replica Cache line below and then start displaying "Detected pause in JVM or host machine (eg GC)".
************************************************************/
2019-10-29 00:30:01,711 INFO datanode.DataNode (LogAdapter.java:info(47)) - STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting DataNode
STARTUP_MSG: user = hdfs
STARTUP_MSG: host = xxxx.corp.com/scrambled.private.ip.address
STARTUP_MSG: args = []
STARTUP_MSG: version = 2.7.3.2.6.5.1100-53
STARTUP_MSG: classpath = omitted for clarity
STARTUP_MSG: build = git@github.com:hortonworks/hadoop.git -r 3091053c59a62c82d82c9f778c48bde5ef0a89a1; compiled by 'jenkins' on 2019-03-13T15:40Z
STARTUP_MSG: java = 1.8.0_112
************************************************************/
2019-10-29 00:30:01,721 INFO datanode.DataNode (LogAdapter.java:info(47)) - registered UNIX signal handlers for [TERM, HUP, INT]
2019-10-29 00:30:02,253 INFO checker.ThrottledAsyncChecker (ThrottledAsyncChecker.java:schedule(122)) - Scheduling a check for [DISK]file:/hadoop/hdfs/data/
2019-10-29 00:30:02,405 INFO impl.MetricsConfig (MetricsConfig.java:loadFirst(112)) - loaded properties from hadoop-metrics2.properties
2019-10-29 00:30:03,985 INFO timeline.HadoopTimelineMetricsSink (HadoopTimelineMetricsSink.java:init(82)) - Initializing Timeline metrics sink.
2019-10-29 00:30:03,987 INFO timeline.HadoopTimelineMetricsSink (HadoopTimelineMetricsSink.java:init(102)) - Identified hostname = xxxx.corp.com, serviceName = datanode
2019-10-29 00:30:04,133 INFO timeline.HadoopTimelineMetricsSink (HadoopTimelineMetricsSink.java:init(128)) - No suitable collector found.
2019-10-29 00:30:04,140 INFO impl.MetricsSinkAdapter (MetricsSinkAdapter.java:start(206)) - Sink timeline started
2019-10-29 00:30:04,186 INFO impl.MetricsSystemImpl (MetricsSystemImpl.java:startTimer(376)) - Scheduled snapshot period at 10 second(s).
2019-10-29 00:30:04,186 INFO impl.MetricsSystemImpl (MetricsSystemImpl.java:start(192)) - DataNode metrics system started
2019-10-29 00:30:04,189 INFO datanode.BlockScanner (BlockScanner.java:<init>(180)) - Initialized block scanner with targetBytesPerSec 1048576
2019-10-29 00:30:04,193 INFO common.Util (Util.java:isDiskStatsEnabled(111)) - dfs.datanode.fileio.profiling.sampling.percentage set to 0. Disabling file IO profiling
2019-10-29 00:30:04,197 INFO datanode.DataNode (DataNode.java:<init>(444)) - File descriptor passing is enabled.
2019-10-29 00:30:04,197 INFO datanode.DataNode (DataNode.java:<init>(455)) - Configured hostname is xxxx.corp.com
2019-10-29 00:30:04,197 INFO common.Util (Util.java:isDiskStatsEnabled(111)) - dfs.datanode.fileio.profiling.sampling.percentage set to 0. Disabling file IO profiling
2019-10-29 00:30:04,198 WARN conf.Configuration (Configuration.java:getTimeDurationHelper(1659)) - No unit for dfs.datanode.outliers.report.interval(1800000) assuming MILLISECONDS
2019-10-29 00:30:04,204 INFO datanode.DataNode (DataNode.java:startDataNode(1251)) - Starting DataNode with maxLockedMemory = 0
2019-10-29 00:30:04,221 INFO datanode.DataNode (DataNode.java:initDataXceiver(1028)) - Opened streaming server at /0.0.0.0:50010
2019-10-29 00:30:04,223 INFO datanode.DataNode (DataXceiverServer.java:<init>(78)) - Balancing bandwith is 6250000 bytes/s
2019-10-29 00:30:04,223 INFO datanode.DataNode (DataXceiverServer.java:<init>(79)) - Number threads for balancing is 5
2019-10-29 00:30:04,225 INFO datanode.DataNode (DataXceiverServer.java:<init>(78)) - Balancing bandwith is 6250000 bytes/s
2019-10-29 00:30:04,225 INFO datanode.DataNode (DataXceiverServer.java:<init>(79)) - Number threads for balancing is 5
2019-10-29 00:30:04,226 INFO datanode.DataNode (DataNode.java:initDataXceiver(1043)) - Listening on UNIX domain socket: /var/lib/hadoop-hdfs/dn_socket
2019-10-29 00:30:04,296 INFO mortbay.log (Slf4jLog.java:info(67)) - Logging to org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog
2019-10-29 00:30:04,304 INFO server.AuthenticationFilter (AuthenticationFilter.java:constructSecretProvider(296)) - Unable to initialize FileSignerSecretProvider, falling back to use random secrets.
2019-10-29 00:30:04,308 INFO http.HttpRequestLog (HttpRequestLog.java:getRequestLog(80)) - Http request log for http.requests.datanode is not defined
2019-10-29 00:30:04,313 INFO http.HttpServer2 (HttpServer2.java:addGlobalFilter(788)) - Added global filter 'safety' (class=org.apache.hadoop.http.HttpServer2$QuotingInputFilter)
2019-10-29 00:30:04,315 INFO http.HttpServer2 (HttpServer2.java:addFilter(763)) - Added filter static_user_filter (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to context datanode
2019-10-29 00:30:04,316 INFO http.HttpServer2 (HttpServer2.java:addFilter(771)) - Added filter static_user_filter (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to context static
2019-10-29 00:30:04,316 INFO http.HttpServer2 (HttpServer2.java:addFilter(771)) - Added filter static_user_filter (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to context logs
2019-10-29 00:30:04,316 INFO security.HttpCrossOriginFilterInitializer (HttpCrossOriginFilterInitializer.java:initFilter(49)) - CORS filter not enabled. Please set hadoop.http.cross-origin.enabled to 'true' to enable it
2019-10-29 00:30:04,337 INFO http.HttpServer2 (HttpServer2.java:bindListener(986)) - Jetty bound to port 43272
2019-10-29 00:30:04,338 INFO mortbay.log (Slf4jLog.java:info(67)) - jetty-6.1.26.hwx
2019-10-29 00:30:04,511 INFO mortbay.log (Slf4jLog.java:info(67)) - Started HttpServer2$SelectChannelConnectorWithSafeStartup@localhost:43272
2019-10-29 00:30:07,643 INFO web.DatanodeHttpServer (DatanodeHttpServer.java:start(233)) - Listening HTTP traffic on /0.0.0.0:50075
2019-10-29 00:30:07,647 INFO util.JvmPauseMonitor (JvmPauseMonitor.java:run(179)) - Starting JVM pause monitor
2019-10-29 00:30:08,366 INFO datanode.DataNode (DataNode.java:startDataNode(1277)) - dnUserName = hdfs
2019-10-29 00:30:08,366 INFO datanode.DataNode (DataNode.java:startDataNode(1278)) - supergroup = hdfs
2019-10-29 00:30:08,579 INFO ipc.CallQueueManager (CallQueueManager.java:<init>(75)) - Using callQueue: class java.util.concurrent.LinkedBlockingQueue scheduler: class org.apache.hadoop.ipc.DefaultRpcScheduler
2019-10-29 00:30:08,734 INFO ipc.Server (Server.java:run(821)) - Starting Socket Reader #1 for port 8010
2019-10-29 00:30:09,244 INFO datanode.DataNode (DataNode.java:initIpcServer(941)) - Opened IPC server at /0.0.0.0:8010
2019-10-29 00:30:09,258 INFO datanode.DataNode (BlockPoolManager.java:refreshNamenodes(152)) - Refresh request received for nameservices: null
2019-10-29 00:30:09,274 INFO datanode.DataNode (BlockPoolManager.java:doRefreshNamenodes(201)) - Starting BPOfferServices for nameservices: <default>
2019-10-29 00:30:09,430 INFO datanode.DataNode (BPServiceActor.java:run(761)) - Block pool <registering> (Datanode Uuid unassigned) service to xxxx.corp.com/scrambled.private.ip.address:8020 starting to offer service
2019-10-29 00:30:09,434 INFO ipc.Server (Server.java:run(1064)) - IPC Server Responder: starting
2019-10-29 00:30:09,434 INFO ipc.Server (Server.java:run(900)) - IPC Server listener on 8010: starting
2019-10-29 00:30:10,930 INFO common.Storage (DataStorage.java:getParallelVolumeLoadThreadsNum(384)) - Using 1 threads to upgrade data directories (dfs.datanode.parallel.volumes.load.threads.num=1, dataDirs=1)
2019-10-29 00:30:10,962 INFO common.Storage (Storage.java:tryLock(776)) - Lock on /hadoop/hdfs/data/in_use.lock acquired by nodename 210295@xxxx.corp.com
2019-10-29 00:30:11,121 INFO common.Storage (BlockPoolSliceStorage.java:recoverTransitionRead(250)) - Analyzing storage directories for bpid BP-814497463-127.0.0.1-1558792659773
2019-10-29 00:30:11,121 INFO common.Storage (Storage.java:lock(735)) - Locking is disabled for /hadoop/hdfs/data/current/BP-814497463-127.0.0.1-1558792659773
2019-10-29 00:30:11,139 INFO datanode.DataNode (DataNode.java:initStorage(1546)) - Setting up storage: nsid=875919329;bpid=BP-814497463-127.0.0.1-1558792659773;lv=-56;nsInfo=lv=-63;cid=CID-49b9105e-fc0d-4ea4-9d2f-caceb95ce4bb;nsid=875919329;c=0;bpid=BP-814497463-127.0.0.1-1558792659773;dnuuid=0aff4a22-3f1a-485b-9aec-46fd881dfab0
2019-10-29 00:30:11,523 INFO impl.FsDatasetImpl (FsVolumeList.java:addVolume(295)) - Added new volume: DS-ea7ed3be-90ad-4424-a00c-577601814d81
2019-10-29 00:30:11,523 INFO impl.FsDatasetImpl (FsDatasetImpl.java:addVolume(426)) - Added volume - /hadoop/hdfs/data/current, StorageType: DISK
2019-10-29 00:30:11,527 INFO impl.FsDatasetImpl (FsDatasetImpl.java:registerMBean(2203)) - Registered FSDatasetState MBean
2019-10-29 00:30:11,711 INFO checker.ThrottledAsyncChecker (ThrottledAsyncChecker.java:schedule(122)) - Scheduling a check for /hadoop/hdfs/data/current
2019-10-29 00:30:11,719 INFO checker.DatasetVolumeChecker (DatasetVolumeChecker.java:checkAllVolumes(210)) - Scheduled health check for volume /hadoop/hdfs/data/current
2019-10-29 00:30:11,721 INFO impl.FsDatasetImpl (FsDatasetImpl.java:addBlockPool(2686)) - Adding block pool BP-814497463-127.0.0.1-1558792659773
2019-10-29 00:30:11,722 INFO impl.FsDatasetImpl (FsVolumeList.java:run(392)) - Scanning block pool BP-814497463-127.0.0.1-1558792659773 on volume /hadoop/hdfs/data/current...
2019-10-29 00:30:11,898 INFO impl.FsDatasetImpl (BlockPoolSlice.java:loadDfsUsed(251)) - Cached dfsUsed found for /hadoop/hdfs/data/current/BP-814497463-127.0.0.1-1558792659773/current: 414855315456
2019-10-29 00:30:11,901 INFO impl.FsDatasetImpl (FsVolumeList.java:run(397)) - Time taken to scan block pool BP-814497463-127.0.0.1-1558792659773 on /hadoop/hdfs/data/current: 178ms
2019-10-29 00:30:11,901 INFO impl.FsDatasetImpl (FsVolumeList.java:addBlockPool(423)) - Total time to scan all replicas for block pool BP-814497463-127.0.0.1-1558792659773: 180ms
2019-10-29 00:30:11,906 INFO impl.FsDatasetImpl (FsVolumeList.java:run(188)) - Adding replicas to map for block pool BP-814497463-127.0.0.1-1558792659773 on volume /hadoop/hdfs/data/current...
2019-10-29 00:30:11,906 INFO impl.BlockPoolSlice
(BlockPoolSlice.java:readReplicasFromCache(738)) - Replica Cache file: /hadoop/hdfs/data/current/BP-814497463-127.0.0.1-1558792659773/current/replicas doesn't exist
/** This is where I suspect the issue is **/
2019-10-29 00:31:24,607 INFO timeline.HadoopTimelineMetricsSink (AbstractTimelineMetricsSink.java:getCurrentCollectorHost(278)) - No live collector to send metrics to. Metrics to be sent will be discarded. This message will be skipped for the next 20 times.
2019-10-29 00:33:54,200 INFO timeline.HadoopTimelineMetricsSink (AbstractTimelineMetricsSink.java:getCurrentCollectorHost(278)) - No live collector to send metrics to. Metrics to be sent will be discarded. This message will be skipped for the next 20 times.
2019-10-29 00:37:24,712 INFO timeline.HadoopTimelineMetricsSink (AbstractTimelineMetricsSink.java:getCurrentCollectorHost(278)) - No live collector to send metrics to. Metrics to be sent will be discarded. This message will be skipped for the next 20 times.
2019-10-29 00:41:24,844 INFO timeline.HadoopTimelineMetricsSink (AbstractTimelineMetricsSink.java:getCurrentCollectorHost(278)) - No live collector to send metrics to. Metrics to be sent will be discarded. This message will be skipped for the next 20 times.
2019-10-29 00:45:14,801 INFO timeline.HadoopTimelineMetricsSink (AbstractTimelineMetricsSink.java:getCurrentCollectorHost(278)) - No live collector to send metrics to. Metrics to be sent will be discarded. This message will be skipped for the next 20 times.
2019-10-29 00:49:04,874 INFO timeline.HadoopTimelineMetricsSink (AbstractTimelineMetricsSink.java:getCurrentCollectorHost(278)) - No live collector to send metrics to. Metrics to be sent will be discarded. This message will be skipped for the next 20 times.
/** This is followed by several Detected pause in JVM or host machine (eg GC): pause of approximately 1068ms statements **/
The corresponding NameNode log shows the following and keeps repeating "The reported blocks 0 needs additional 1429 blocks to reach the threshold 1.0000 of total blocks 1428."
************************************************************/
2019-10-29 00:30:20,165 INFO namenode.NameNode (LogAdapter.java:info(47)) - STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG: user = hdfs
STARTUP_MSG: host = xxxx.corp.com/scrambled.private.ip.address
STARTUP_MSG: args = []
STARTUP_MSG: version = 2.7.3.2.6.5.1100-53
STARTUP_MSG: classpath = removed for brevity
STARTUP_MSG: build = git@github.com:hortonworks/hadoop.git -r 3091053c59a62c82d82c9f778c48bde5ef0a89a1; compiled by 'jenkins' on 2019-03-13T15:40Z
STARTUP_MSG: java = 1.8.0_112
************************************************************/
2019-10-29 00:30:20,173 INFO namenode.NameNode (LogAdapter.java:info(47)) - registered UNIX signal handlers for [TERM, HUP, INT]
2019-10-29 00:30:20,176 INFO namenode.NameNode (NameNode.java:createNameNode(1624)) - createNameNode []
2019-10-29 00:30:20,370 INFO impl.MetricsConfig (MetricsConfig.java:loadFirst(112)) - loaded properties from hadoop-metrics2.properties
2019-10-29 00:30:20,567 INFO timeline.HadoopTimelineMetricsSink (HadoopTimelineMetricsSink.java:init(82)) - Initializing Timeline metrics sink.
2019-10-29 00:30:20,568 INFO timeline.HadoopTimelineMetricsSink (HadoopTimelineMetricsSink.java:init(102)) - Identified hostname = xxxx.corp.com, serviceName = namenode
2019-10-29 00:30:20,682 INFO timeline.HadoopTimelineMetricsSink (HadoopTimelineMetricsSink.java:init(128)) - No suitable collector found.
2019-10-29 00:30:20,685 INFO timeline.HadoopTimelineMetricsSink (HadoopTimelineMetricsSink.java:init(180)) - RPC port properties configured: {8020=client}
2019-10-29 00:30:20,690 INFO impl.MetricsSinkAdapter (MetricsSinkAdapter.java:start(206)) - Sink timeline started
2019-10-29 00:30:20,743 INFO impl.MetricsSystemImpl (MetricsSystemImpl.java:startTimer(376)) - Scheduled snapshot period at 10 second(s).
2019-10-29 00:30:20,743 INFO impl.MetricsSystemImpl (MetricsSystemImpl.java:start(192)) - NameNode metrics system started
2019-10-29 00:30:20,747 INFO namenode.NameNode (NameNode.java:setClientNamenodeAddress(450)) - fs.defaultFS is hdfs://xxxx.corp.com:8020
2019-10-29 00:30:20,748 INFO namenode.NameNode (NameNode.java:setClientNamenodeAddress(470)) - Clients are to use xxxx.corp.com:8020 to access this namenode/service.
2019-10-29 00:30:20,866 INFO util.JvmPauseMonitor (JvmPauseMonitor.java:run(179)) - Starting JVM pause monitor
2019-10-29 00:30:20,874 INFO hdfs.DFSUtil (DFSUtil.java:httpServerTemplateForNNAndJN(1803)) - Starting Web-server for hdfs at: <a href="http://xxxx.corp.com:50070" target="_blank">http://xxxx.corp.com:50070</a>
2019-10-29 00:30:20,913 INFO mortbay.log (Slf4jLog.java:info(67)) - Logging to org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog
2019-10-29 00:30:20,923 INFO server.AuthenticationFilter (AuthenticationFilter.java:constructSecretProvider(296)) - Unable to initialize FileSignerSecretProvider, falling back to use random secrets.
2019-10-29 00:30:20,927 INFO http.HttpRequestLog (HttpRequestLog.java:getRequestLog(80)) - Http request log for http.requests.namenode is not defined
2019-10-29 00:30:20,931 INFO http.HttpServer2 (HttpServer2.java:addGlobalFilter(788)) - Added global filter 'safety' (class=org.apache.hadoop.http.HttpServer2$QuotingInputFilter)
2019-10-29 00:30:20,933 INFO http.HttpServer2 (HttpServer2.java:addFilter(763)) - Added filter static_user_filter (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to context hdfs
2019-10-29 00:30:20,933 INFO http.HttpServer2 (HttpServer2.java:addFilter(771)) - Added filter static_user_filter (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to context logs
2019-10-29 00:30:20,933 INFO http.HttpServer2 (HttpServer2.java:addFilter(771)) - Added filter static_user_filter (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to context static
2019-10-29 00:30:20,934 INFO security.HttpCrossOriginFilterInitializer (HttpCrossOriginFilterInitializer.java:initFilter(49)) - CORS filter not enabled. Please set hadoop.http.cross-origin.enabled to 'true' to enable it
2019-10-29 00:30:20,953 INFO http.HttpServer2 (NameNodeHttpServer.java:initWebHdfs(93)) - Added filter 'org.apache.hadoop.hdfs.web.AuthFilter' (class=org.apache.hadoop.hdfs.web.AuthFilter)
2019-10-29 00:30:20,954 INFO http.HttpServer2 (HttpServer2.java:addJerseyResourcePackage(687)) - addJerseyResourcePackage: packageName=org.apache.hadoop.hdfs.server.namenode.web.resources;org.apache.hadoop.hdfs.web.resources, pathSpec=/webhdfs/v1/*
2019-10-29 00:30:20,961 INFO http.HttpServer2 (HttpServer2.java:bindListener(986)) - Jetty bound to port 50070
2019-10-29 00:30:20,962 INFO mortbay.log (Slf4jLog.java:info(67)) - jetty-6.1.26.hwx
2019-10-29 00:30:20,986 WARN mortbay.log (Slf4jLog.java:warn(76)) - Can't reuse /tmp/Jetty_xxxx_corp_com_50070_hdfs____ggu70m, using /tmp/Jetty_xxxx_corp_com_50070_hdfs____ggu70m_2845921744604868870
2019-10-29 00:30:21,121 INFO mortbay.log (Slf4jLog.java:info(67)) - Started HttpServer2$SelectChannelConnectorWithSafeStartup@xxxx.corp.com:50070
2019-10-29 00:30:21,143 WARN common.Util (Util.java:stringAsURI(57)) - Path /hadoop/hdfs/namenode should be specified as a URI in configuration files. Please update hdfs configuration.
2019-10-29 00:30:21,143 WARN common.Util (Util.java:stringAsURI(57)) - Path /hadoop/hdfs/namenode should be specified as a URI in configuration files. Please update hdfs configuration.
2019-10-29 00:30:21,143 WARN namenode.FSNamesystem (FSNamesystem.java:checkConfiguration(690)) - Only one image storage directory (dfs.namenode.name.dir) configured. Beware of data loss due to lack of redundant storage directories!
2019-10-29 00:30:21,143 WARN namenode.FSNamesystem (FSNamesystem.java:checkConfiguration(695)) - Only one namespace edits storage directory (dfs.namenode.edits.dir) configured. Beware of data loss due to lack of redundant storage directories!
2019-10-29 00:30:21,148 WARN common.Util (Util.java:stringAsURI(57)) - Path /hadoop/hdfs/namenode should be specified as a URI in configuration files. Please update hdfs configuration.
2019-10-29 00:30:21,148 WARN common.Util (Util.java:stringAsURI(57)) - Path /hadoop/hdfs/namenode should be specified as a URI in configuration files. Please update hdfs configuration.
2019-10-29 00:30:21,153 WARN common.Storage (NNStorage.java:setRestoreFailedStorage(208)) - set restore failed storage to true
2019-10-29 00:30:21,172 INFO namenode.FSEditLog (FSEditLog.java:newInstance(225)) - Edit logging is async:false
2019-10-29 00:30:21,176 INFO namenode.FSNamesystem (FSNamesystem.java:<init>(759)) - No KeyProvider found.
2019-10-29 00:30:21,176 INFO namenode.FSNamesystem (FSNamesystem.java:<init>(765)) - Enabling async auditlog
2019-10-29 00:30:21,178 INFO namenode.FSNamesystem (FSNamesystem.java:<init>(769)) - fsLock is fair:false
2019-10-29 00:30:21,204 INFO blockmanagement.HeartbeatManager (HeartbeatManager.java:<init>(91)) - Setting heartbeat recheck interval to 30000 since dfs.namenode.stale.datanode.interval is less than dfs.namenode.heartbeat.recheck-interval
2019-10-29 00:30:21,207 INFO common.Util (Util.java:isDiskStatsEnabled(111)) - dfs.datanode.fileio.profiling.sampling.percentage set to 0. Disabling file IO profiling
2019-10-29 00:30:21,214 INFO blockmanagement.DatanodeManager (DatanodeManager.java:<init>(274)) - dfs.block.invalidate.limit=1000
2019-10-29 00:30:21,214 INFO blockmanagement.DatanodeManager (DatanodeManager.java:<init>(280)) - dfs.namenode.datanode.registration.ip-hostname-check=true
2019-10-29 00:30:21,215 INFO blockmanagement.BlockManager (InvalidateBlocks.java:printBlockDeletionTime(71)) - dfs.namenode.startup.delay.block.deletion.sec is set to 000:01:00:00.000
2019-10-29 00:30:21,215 INFO blockmanagement.BlockManager (InvalidateBlocks.java:printBlockDeletionTime(76)) - The block deletion will start around 2019 Oct 29 01:30:21
2019-10-29 00:30:21,217 INFO util.GSet (LightWeightGSet.java:computeCapacity(395)) - Computing capacity for map BlocksMap
2019-10-29 00:30:21,217 INFO util.GSet (LightWeightGSet.java:computeCapacity(396)) - VM type = 64-bit
2019-10-29 00:30:21,218 INFO util.GSet (LightWeightGSet.java:computeCapacity(397)) - 2.0% max memory 1011.3 MB = 20.2 MB
2019-10-29 00:30:21,218 INFO util.GSet (LightWeightGSet.java:computeCapacity(402)) - capacity = 2^21 = 2097152 entries
2019-10-29 00:30:21,231 INFO blockmanagement.BlockManager (BlockManager.java:createBlockTokenSecretManager(409)) - dfs.block.access.token.enable=true
2019-10-29 00:30:21,231 INFO blockmanagement.BlockManager (BlockManager.java:createBlockTokenSecretManager(430)) - dfs.block.access.key.update.interval=600 min(s), dfs.block.access.token.lifetime=600 min(s), dfs.encrypt.data.transfer.algorithm=null
2019-10-29 00:30:21,354 INFO blockmanagement.BlockManager (BlockManager.java:<init>(395)) - defaultReplication = 1
2019-10-29 00:30:21,354 INFO blockmanagement.BlockManager (BlockManager.java:<init>(396)) - maxReplication = 50
2019-10-29 00:30:21,354 INFO blockmanagement.BlockManager (BlockManager.java:<init>(397)) - minReplication = 1
2019-10-29 00:30:21,354 INFO blockmanagement.BlockManager (BlockManager.java:<init>(398)) - maxReplicationStreams = 2
2019-10-29 00:30:21,354 INFO blockmanagement.BlockManager (BlockManager.java:<init>(399)) - replicationRecheckInterval = 3000
2019-10-29 00:30:21,354 INFO blockmanagement.BlockManager (BlockManager.java:<init>(400)) - encryptDataTransfer = false
2019-10-29 00:30:21,354 INFO blockmanagement.BlockManager (BlockManager.java:<init>(401)) - maxNumBlocksToLog = 1000
2019-10-29 00:30:21,360 INFO namenode.FSNamesystem (FSNamesystem.java:<init>(790)) - fsOwner = hdfs (auth:SIMPLE)
2019-10-29 00:30:21,360 INFO namenode.FSNamesystem (FSNamesystem.java:<init>(791)) - supergroup = hdfs
2019-10-29 00:30:21,360 INFO namenode.FSNamesystem (FSNamesystem.java:<init>(792)) - isPermissionEnabled = true
2019-10-29 00:30:21,360 INFO namenode.FSNamesystem (FSNamesystem.java:<init>(803)) - HA Enabled: false
2019-10-29 00:30:21,361 INFO namenode.FSNamesystem (FSNamesystem.java:<init>(843)) - Append Enabled: true
2019-10-29 00:30:21,388 INFO util.GSet (LightWeightGSet.java:computeCapacity(395)) - Computing capacity for map INodeMap
2019-10-29 00:30:21,388 INFO util.GSet (LightWeightGSet.java:computeCapacity(396)) - VM type = 64-bit
2019-10-29 00:30:21,388 INFO util.GSet (LightWeightGSet.java:computeCapacity(397)) - 1.0% max memory 1011.3 MB = 10.1 MB
2019-10-29 00:30:21,389 INFO util.GSet (LightWeightGSet.java:computeCapacity(402)) - capacity = 2^20 = 1048576 entries
2019-10-29 00:30:21,393 INFO namenode.FSDirectory (FSDirectory.java:<init>(256)) - ACLs enabled? false
2019-10-29 00:30:21,393 INFO namenode.FSDirectory (FSDirectory.java:<init>(260)) - XAttrs enabled? true
2019-10-29 00:30:21,393 INFO namenode.FSDirectory (FSDirectory.java:<init>(268)) - Maximum size of an xattr: 16384
2019-10-29 00:30:21,393 INFO namenode.NameNode (FSDirectory.java:<init>(321)) - Caching file names occuring more than 10 times
2019-10-29 00:30:21,399 INFO util.GSet (LightWeightGSet.java:computeCapacity(395)) - Computing capacity for map cachedBlocks
2019-10-29 00:30:21,399 INFO util.GSet (LightWeightGSet.java:computeCapacity(396)) - VM type = 64-bit
2019-10-29 00:30:21,400 INFO util.GSet (LightWeightGSet.java:computeCapacity(397)) - 0.25% max memory 1011.3 MB = 2.5 MB
2019-10-29 00:30:21,400 INFO util.GSet (LightWeightGSet.java:computeCapacity(402)) - capacity = 2^18 = 262144 entries
2019-10-29 00:30:21,402 INFO namenode.FSNamesystem (FSNamesystem.java:<init>(5582)) - dfs.namenode.safemode.threshold-pct = 1.0
2019-10-29 00:30:21,402 INFO namenode.FSNamesystem (FSNamesystem.java:<init>(5583)) - dfs.namenode.safemode.min.datanodes = 0
2019-10-29 00:30:21,402 INFO namenode.FSNamesystem (FSNamesystem.java:<init>(5584)) - dfs.namenode.safemode.extension = 30000
2019-10-29 00:30:21,405 INFO metrics.TopMetrics (TopMetrics.java:logConf(76)) - NNTop conf: dfs.namenode.top.window.num.buckets = 10
2019-10-29 00:30:21,405 INFO metrics.TopMetrics (TopMetrics.java:logConf(78)) - NNTop conf: dfs.namenode.top.num.users = 10
2019-10-29 00:30:21,405 INFO metrics.TopMetrics (TopMetrics.java:logConf(80)) - NNTop conf: dfs.namenode.top.windows.minutes = 1,5,25
2019-10-29 00:30:21,408 INFO namenode.FSNamesystem (FSNamesystem.java:initRetryCache(971)) - Retry cache on namenode is enabled
2019-10-29 00:30:21,408 INFO namenode.FSNamesystem (FSNamesystem.java:initRetryCache(979)) - Retry cache will use 0.03 of total heap and retry cache entry expiry time is 600000 millis
2019-10-29 00:30:21,410 INFO util.GSet (LightWeightGSet.java:computeCapacity(395)) - Computing capacity for map NameNodeRetryCache
2019-10-29 00:30:21,411 INFO util.GSet (LightWeightGSet.java:computeCapacity(396)) - VM type = 64-bit
2019-10-29 00:30:21,411 INFO util.GSet (LightWeightGSet.java:computeCapacity(397)) - 0.029999999329447746% max memory 1011.3 MB = 310.7 KB
2019-10-29 00:30:21,411 INFO util.GSet (LightWeightGSet.java:computeCapacity(402)) - capacity = 2^15 = 32768 entries
2019-10-29 00:30:21,456 INFO common.Storage (Storage.java:tryLock(776)) - Lock on /home/hadoop/hdfs/namenode/in_use.lock acquired by nodename 211070@xxxx.corp.com
2019-10-29 00:30:21,503 INFO namenode.FileJournalManager (FileJournalManager.java:recoverUnfinalizedSegments(388)) - Recovering unfinalized segments in /home/hadoop/hdfs/namenode/current
2019-10-29 00:30:21,527 INFO namenode.FileJournalManager (FileJournalManager.java:finalizeLogSegment(142)) - Finalizing edits file /home/hadoop/hdfs/namenode/current/edits_inprogress_0000000000199282266 -> /home/hadoop/hdfs/namenode/current/edits_0000000000199282266-0000000000199282266
2019-10-29 00:30:21,532 INFO namenode.FSImage (FSImage.java:loadFSImageFile(745)) - Planning to load image: FSImageFile(file=/home/hadoop/hdfs/namenode/current/fsimage_0000000000199282232, cpktTxId=0000000000199282232)
2019-10-29 00:30:21,562 INFO namenode.FSImageFormatPBINode (FSImageFormatPBINode.java:loadINodeSection(257)) - Loading 1993 INodes.
2019-10-29 00:30:21,724 INFO namenode.FSImageFormatProtobuf (FSImageFormatProtobuf.java:load(184)) - Loaded FSImage in 0 seconds.
2019-10-29 00:30:21,725 INFO namenode.FSImage (FSImage.java:loadFSImage(911)) - Loaded image for txid 199282232 from /home/hadoop/hdfs/namenode/current/fsimage_0000000000199282232
2019-10-29 00:30:21,725 INFO namenode.FSImage (FSImage.java:loadEdits(849)) - Reading org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream@63fd4873 expecting start txid #199282233
2019-10-29 00:30:21,726 INFO namenode.FSImage (FSEditLogLoader.java:loadFSEdits(142)) - Start loading edits file /home/hadoop/hdfs/namenode/current/edits_0000000000199282233-0000000000199282265
2019-10-29 00:30:21,729 INFO namenode.RedundantEditLogInputStream (RedundantEditLogInputStream.java:nextOp(177)) - Fast-forwarding stream '/home/hadoop/hdfs/namenode/current/edits_0000000000199282233-0000000000199282265' to transaction ID 199282233
2019-10-29 00:30:21,752 INFO namenode.FSImage (FSEditLogLoader.java:loadFSEdits(145)) - Edits file /home/hadoop/hdfs/namenode/current/edits_0000000000199282233-0000000000199282265 of size 1048576 edits # 33 loaded in 0 seconds
2019-10-29 00:30:21,752 INFO namenode.FSImage (FSImage.java:loadEdits(849)) - Reading org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream@1e11bc55 expecting start txid #199282266
2019-10-29 00:30:21,752 INFO namenode.FSImage (FSEditLogLoader.java:loadFSEdits(142)) - Start loading edits file /home/hadoop/hdfs/namenode/current/edits_0000000000199282266-0000000000199282266
2019-10-29 00:30:21,752 INFO namenode.RedundantEditLogInputStream (RedundantEditLogInputStream.java:nextOp(177)) - Fast-forwarding stream '/home/hadoop/hdfs/namenode/current/edits_0000000000199282266-0000000000199282266' to transaction ID 199282233
2019-10-29 00:30:21,753 INFO namenode.FSImage (FSEditLogLoader.java:loadFSEdits(145)) - Edits file /home/hadoop/hdfs/namenode/current/edits_0000000000199282266-0000000000199282266 of size 1048576 edits # 1 loaded in 0 seconds
2019-10-29 00:30:21,753 INFO namenode.FSNamesystem (FSNamesystem.java:loadFSImage(1083)) - Need to save fs image? false (staleImage=false, haEnabled=false, isRollingUpgrade=false)
2019-10-29 00:30:21,754 INFO namenode.FSEditLog (FSEditLog.java:startLogSegment(1294)) - Starting log segment at 199282267
2019-10-29 00:30:21,880 INFO namenode.NameCache (NameCache.java:initialized(143)) - initialized with 8 entries 214 lookups
2019-10-29 00:30:21,881 INFO namenode.FSNamesystem (FSNamesystem.java:loadFromDisk(731)) - Finished loading FSImage in 465 msecs
2019-10-29 00:30:22,002 INFO namenode.NameNode (NameNodeRpcServer.java:<init>(428)) - RPC server is binding to xxxx.corp.com:8020
2019-10-29 00:30:22,007 INFO ipc.CallQueueManager (CallQueueManager.java:<init>(75)) - Using callQueue: class java.util.concurrent.LinkedBlockingQueue scheduler: class org.apache.hadoop.ipc.DefaultRpcScheduler
2019-10-29 00:30:22,015 INFO ipc.Server (Server.java:run(821)) - Starting Socket Reader #1 for port 8020
2019-10-29 00:30:22,049 INFO namenode.FSNamesystem (FSNamesystem.java:registerMBean(6517)) - Registered FSNamesystemState MBean
2019-10-29 00:30:22,050 WARN common.Util (Util.java:stringAsURI(57)) - Path /hadoop/hdfs/namenode should be specified as a URI in configuration files. Please update hdfs configuration.
2019-10-29 00:30:22,064 INFO namenode.LeaseManager (LeaseManager.java:getNumUnderConstructionBlocks(139)) - Number of blocks under construction: 0
2019-10-29 00:30:22,065 INFO hdfs.StateChange (FSNamesystem.java:reportStatus(5952)) - STATE* Safe mode ON.
The reported blocks 0 needs additional 1429 blocks to reach the threshold 1.0000 of total blocks 1428.
The number of live datanodes 0 has reached the minimum number 0. Safe mode will be turned off automatically once the thresholds have been reached.
2019-10-29 00:30:22,075 INFO blockmanagement.DatanodeDescriptor (DatanodeDescriptor.java:updateHeartbeatState(401)) - Number of failed storage changes from 0 to 0
2019-10-29 00:30:22,077 INFO block.BlockTokenSecretManager (BlockTokenSecretManager.java:updateKeys(222)) - Updating block keys
2019-10-29 00:30:22,095 INFO ipc.Server (Server.java:run(1064)) - IPC Server Responder: starting
2019-10-29 00:30:22,095 INFO ipc.Server (Server.java:run(900)) - IPC Server listener on 8020: starting
2019-10-29 00:30:22,115 INFO namenode.NameNode (NameNode.java:startCommonServices(885)) - NameNode RPC up at: xxxx.corp.com/scrambled.private.ip.address:8020
2019-10-29 00:30:22,116 INFO namenode.FSNamesystem (FSNamesystem.java:startActiveServices(1191)) - Starting services required for active state
2019-10-29 00:30:22,116 INFO namenode.FSDirectory (FSDirectory.java:updateCountForQuota(708)) - Initializing quota with 4 thread(s)
2019-10-29 00:30:22,127 INFO namenode.FSDirectory (FSDirectory.java:updateCountForQuota(717)) - Quota initialization completed in 11 milliseconds
name space=1995
storage space=6473571992
storage types=RAM_DISK=0, SSD=0, DISK=0, ARCHIVE=0
2019-10-29 00:30:22,131 INFO blockmanagement.CacheReplicationMonitor (CacheReplicationMonitor.java:run(161)) - Starting CacheReplicationMonitor with interval 30000 milliseconds
2019-10-29 00:30:22,525 INFO fs.TrashPolicyDefault (TrashPolicyDefault.java:<init>(228)) - The configured checkpoint interval is 0 minutes. Using an interval of 60 minutes that is used for deletion instead
2019-10-29 00:31:40,751 INFO timeline.HadoopTimelineMetricsSink (AbstractTimelineMetricsSink.java:getCurrentCollectorHost(278)) - No live collector to send metrics to. Metrics to be sent will be discarded. This message will be skipped for the next 20 times.
2019-10-29 00:31:52,817 INFO ipc.Server (Server.java:logException(2428)) - IPC Server handler 29 on 8020, call org.apache.hadoop.hdfs.protocol.ClientProtocol.mkdirs from scrambled.private.ip.address:55080 Call#143 Retry#0: org.apache.hadoop.hdfs.server.namenode.SafeModeException: Cannot create directory /tmp/hive/anonymous/393adcb5-cd66-43a7-ab38-e759f5daf88e. Name node is in safe mode.
The reported blocks 0 needs additional 1429 blocks to reach the threshold 1.0000 of total blocks 1428.
The number of live datanodes 0 has reached the minimum number 0. Safe mode will be turned off automatically once the thresholds have been reached.
2019-10-29 00:31:52,823 INFO ipc.Server (Server.java:logException(2428)) - IPC Server handler 24 on 8020, call org.apache.hadoop.hdfs.protocol.ClientProtocol.delete from scrambled.private.ip.address:55080 Call#144 Retry#0: org.apache.hadoop.hdfs.server.namenode.SafeModeException: Cannot delete /tmp/hive/anonymous/393adcb5-cd66-43a7-ab38-e759f5daf88e. Name node is in safe mode.
The reported blocks 0 needs additional 1429 blocks to reach the threshold 1.0000 of total blocks 1428.
The number of live datanodes 0 has reached the minimum number 0. Safe mode will be turned off automatically once the thresholds have been reached.
2019-10-29 00:31:52,921 INFO ipc.Server (Server.java:logException(2428)) - IPC Server handler 44 on 8020, call org.apache.hadoop.hdfs.protocol.ClientProtocol.mkdirs from scrambled.private.ip.address:55084 Call#149 Retry#0: org.apache.hadoop.hdfs.server.namenode.SafeModeException: Cannot create directory /tmp/hive/anonymous/4b583a41-b3c9-4a6a-b8ee-afe8411c3861. Name node is in safe mode.
The reported blocks 0 needs additional 1429 blocks to reach the threshold 1.0000 of total blocks 1428.
The number of live datanodes 0 has reached the minimum number 0. Safe mode will be turned off automatically once the thresholds have been reached.
2019-10-29 00:31:52,926 INFO ipc.Server (Server.java:logException(2428)) - IPC Server handler 68 on 8020, call org.apache.hadoop.hdfs.protocol.ClientProtocol.delete from scrambled.private.ip.address:55084 Call#150 Retry#0: org.apache.hadoop.hdfs.server.namenode.SafeModeException: Cannot delete /tmp/hive/anonymous/4b583a41-b3c9-4a6a-b8ee-afe8411c3861. Name node is in safe mode.
The reported blocks 0 needs additional 1429 blocks to reach the threshold 1.0000 of total blocks 1428.
The number of live datanodes 0 has reached the minimum number 0. Safe mode will be turned off automatically once the thresholds have been reached.
2019-10-29 00:31:54,849 INFO ipc.Server (Server.java:logException(2428)) - IPC Server handler 38 on 8020, call org.apache.hadoop.hdfs.protocol.ClientProtocol.mkdirs from scrambled.private.ip.address:55088 Call#4 Retry#0: org.apache.hadoop.hdfs.server.namenode.SafeModeException: Cannot create directory /tmp/hive/ambari-qa/c35cf654-0bcf-48ff-8873-0d1447404f8c. Name node is in safe mode.
The reported blocks 0 needs additional 1429 blocks to reach the threshold 1.0000 of total blocks 1428.
The number of live datanodes 0 has reached the minimum number 0. Safe mode will be turned off automatically once the thresholds have been reached.
2019-10-29 00:33:00,753 INFO timeline.HadoopTimelineMetricsSink (AbstractTimelineMetricsSink.java:getCurrentCollectorHost(278)) - No live collector to send metrics to. Metrics to be sent will be discarded. This message will be skipped for the next 20 times.
2019-10-29 00:34:10,752 INFO timeline.HadoopTimelineMetricsSink (AbstractTimelineMetricsSink.java:getCurrentCollectorHost(278)) - No live collector to send metrics to. Metrics to be sent will be discarded. This message will be skipped for the next 20 times.
2019-10-29 00:34:52,597 INFO ipc.Server (Server.java:logException(2428)) - IPC Server handler 24 on 8020, call org.apache.hadoop.hdfs.protocol.ClientProtocol.mkdirs from scrambled.private.ip.address:56002 Call#155 Retry#0: org.apache.hadoop.hdfs.server.namenode.SafeModeException: Cannot create directory /tmp/hive/anonymous/5ba0c62f-b60b-4ecb-be71-42137e023acd. Name node is in safe mode.
The reported blocks 0 needs additional 1429 blocks to reach the threshold 1.0000 of total blocks 1428.
The number of live datanodes 0 has reached the minimum number 0. Safe mode will be turned off automatically once the thresholds have been reached.
2019-10-29 00:34:52,601 INFO ipc.Server (Server.java:logException(2428)) - IPC Server handler 29 on 8020, call org.apache.hadoop.hdfs.protocol.ClientProtocol.delete from scrambled.private.ip.address:56002 Call#156 Retry#0: org.apache.hadoop.hdfs.server.namenode.SafeModeException: Cannot delete /tmp/hive/anonymous/5ba0c62f-b60b-4ecb-be71-42137e023acd. Name node is in safe mode.
The reported blocks 0 needs additional 1429 blocks to reach the threshold 1.0000 of total blocks 1428.
The number of live datanodes 0 has reached the minimum number 0. Safe mode will be turned off automatically once the thresholds have been reached.
2019-10-29 00:34:52,693 INFO ipc.Server (Server.java:logException(2428)) - IPC Server handler 68 on 8020, call org.apache.hadoop.hdfs.protocol.ClientProtocol.mkdirs from scrambled.private.ip.address:56006 Call#161 Retry#0: org.apache.hadoop.hdfs.server.namenode.SafeModeException: Cannot create directory /tmp/hive/anonymous/2de6c6f0-f9dc-4702-a350-867f384a3e9f. Name node is in safe mode.
The reported blocks 0 needs additional 1429 blocks to reach the threshold 1.0000 of total blocks 1428.
The number of live datanodes 0 has reached the minimum number 0. Safe mode will be turned off automatically once the thresholds have been reached.
2019-10-29 00:34:52,697 INFO ipc.Server (Server.java:logException(2428)) - IPC Server handler 18 on 8020, call org.apache.hadoop.hdfs.protocol.ClientProtocol.delete from scrambled.private.ip.address:56006 Call#162 Retry#0: org.apache.hadoop.hdfs.server.namenode.SafeModeException: Cannot delete /tmp/hive/anonymous/2de6c6f0-f9dc-4702-a350-867f384a3e9f. Name node is in safe mode.
The reported blocks 0 needs additional 1429 blocks to reach the threshold 1.0000 of total blocks 1428.
The number of live datanodes 0 has reached the minimum number 0. Safe mode will be turned off automatically once the thresholds have been reached.
2019-10-29 00:34:54,626 INFO ipc.Server (Server.java:logException(2428)) - IPC Server handler 4 on 8020, call org.apache.hadoop.hdfs.protocol.ClientProtocol.mkdirs from scrambled.private.ip.address:56010 Call#4 Retry#0: org.apache.hadoop.hdfs.server.namenode.SafeModeException: Cannot create directory /tmp/hive/ambari-qa/e051ac6d-5230-464a-a5fa-b3bb58d035f7. Name node is in safe mode.
The reported blocks 0 needs additional 1429 blocks to reach the threshold 1.0000 of total blocks 1428.
The number of live datanodes 0 has reached the minimum number 0. Safe mode will be turned off automatically once the thresholds have been reached.
2019-10-29 00:36:30,748 INFO timeline.HadoopTimelineMetricsSink (AbstractTimelineMetricsSink.java:getCurrentCollectorHost(278)) - No live collector to send metrics to. Metrics to be sent will be discarded. This message will be skipped for the next 20 times.
2019-10-29 00:37:50,753 INFO timeline.HadoopTimelineMetricsSink (AbstractTimelineMetricsSink.java:getCurrentCollectorHost(278)) - No live collector to send metrics to. Metrics to be sent will be discarded. This message will be skipped for the next 20 times.
2019-10-29 00:37:52,617 INFO ipc.Server (Server.java:logException(2428)) - IPC Server handler 29 on 8020, call org.apache.hadoop.hdfs.protocol.ClientProtocol.mkdirs from scrambled.private.ip.address:56826 Call#167 Retry#0: org.apache.hadoop.hdfs.server.namenode.SafeModeException: Cannot create directory /tmp/hive/anonymous/cc6ebafd-18c8-46ea-a6d7-62397735efb0. Name node is in safe mode.
The reported blocks 0 needs additional 1429 blocks to reach the threshold 1.0000 of total blocks 1428.
The number of live datanodes 0 has reached the minimum number 0. Safe mode will be turned off automatically once the thresholds have been reached.
2019-10-29 00:37:52,622 INFO ipc.Server (Server.java:logException(2428)) - IPC Server handler 49 on 8020, call org.apache.hadoop.hdfs.protocol.ClientProtocol.delete from scrambled.private.ip.address:56826 Call#168 Retry#0: org.apache.hadoop.hdfs.server.namenode.SafeModeException: Cannot delete /tmp/hive/anonymous/cc6ebafd-18c8-46ea-a6d7-62397735efb0. Name node is in safe mode.
The reported blocks 0 needs additional 1429 blocks to reach the threshold 1.0000 of total blocks 1428.
The number of live datanodes 0 has reached the minimum number 0. Safe mode will be turned off automatically once the thresholds have been reached.
2019-10-29 00:37:52,708 INFO ipc.Server (Server.java:logException(2428)) - IPC Server handler 40 on 8020, call org.apache.hadoop.hdfs.protocol.ClientProtocol.mkdirs from scrambled.private.ip.address:56830 Call#173 Retry#0: org.apache.hadoop.hdfs.server.namenode.SafeModeException: Cannot create directory /tmp/hive/anonymous/6552fb22-a79a-47da-989a-87c029dad677. Name node is in safe mode.
The reported blocks 0 needs additional 1429 blocks to reach the threshold 1.0000 of total blocks 1428.
The number of live datanodes 0 has reached the minimum number 0. Safe mode will be turned off automatically once the thresholds have been reached.
2019-10-29 00:37:52,711 INFO ipc.Server (Server.java:logException(2428)) - IPC Server handler 25 on 8020, call org.apache.hadoop.hdfs.protocol.ClientProtocol.delete from scrambled.private.ip.address:56830 Call#174 Retry#0: org.apache.hadoop.hdfs.server.namenode.SafeModeException: Cannot delete /tmp/hive/anonymous/6552fb22-a79a-47da-989a-87c029dad677. Name node is in safe mode.
The reported blocks 0 needs additional 1429 blocks to reach the threshold 1.0000 of total blocks 1428.
The number of live datanodes 0 has reached the minimum number 0. Safe mode will be turned off automatically once the thresholds have been reached.
2019-10-29 00:37:54,749 INFO ipc.Server (Server.java:logException(2428)) - IPC Server handler 59 on 8020, call org.apache.hadoop.hdfs.protocol.ClientProtocol.mkdirs from scrambled.private.ip.address:56834 Call#4 Retry#0: org.apache.hadoop.hdfs.server.namenode.SafeModeException: Cannot create directory /tmp/hive/ambari-qa/1365213f-4766-468a-8a1f-9d5100cc5bad. Name node is in safe mode.
The reported blocks 0 needs additional 1429 blocks to reach the threshold 1.0000 of total blocks 1428.
The number of live datanodes 0 has reached the minimum number 0. Safe mode will be turned off automatically once the thresholds have been reached.
What exactly is going on and how do I go about troubleshooting this? I tried increasing the HeapSize for both NameNode and DataNode as well. The GC message from DataNode disappear but I still see them in NameNode logs when it reads iNODES.
Any help will be greatly appreciated.
Created on 10-28-2019 10:21 PM - edited 10-28-2019 11:24 PM
The name node start is usually faster than what you are experiencing frm the logs I am seeing maybe the root cause. During startup the namenode read the fsimage which is the last best cluster images usually a combination of fsimage and the last edits log. You will need to combine the current fsimage and the edits log in the below steps.
Name node is in safe mode.
When the name node is in this state it's a security future that disables any change that the namenode metadata that it can't register,/record to note all the cluster wise location, state, permissions,ownership etc are stored in the metadata which is stored in the name node. You will need to do the following steps as root user while the cluster is freezing during startup.
# su - hdfs
Get the current state
$ hdfs hdfsadmin -safemode get
That will confirm that the namenode is in safe mode, you will need to force the creation of a hdfs image point in time by saving the namespace to create a new fsimage
$ hdfs dfsadmin -saveNamespace
confirm safemode is off
$ hdfs hdfsadmin -safemode get
This time the output should be Off leave safe mode and next time the startup should be much faster
$ hdfs hdfsadmin -safemode get
You might need to tune the memory allocated to the name node it seems the numebr of files to manage has increased hence the need to reconfigure the memory have a look at the below link for guidance..
Configuring NameNode Heap Size to estimate the memory required for the namenode
Created 10-29-2019 03:02 AM
@SheltonThank you for your response. Much appreciated. Thank you for the name node heap tuning link as well. I am busy checking it out and will post an updated.
During my investigation yesterday, I also found out there were millions of pending deletes. As I mentioned, I deleted about 23 million blocks down to about 1400. The JMX console says there still are about 11 million blocks still waiting to be deleted - I'm assuming the blockreport would also account for these until they are actually deleted? Could that be a reason why its still slow?
Its deleting about 600 nodes/second which seems okay, but I guess it will be a while until everything is deleted. Is this normal?
Created 10-29-2019 01:03 PM
Any updates?
Why the high number of block deletes are those under replicated blocks? There is many parameters to evaluate like the number of data nodes, replication factor, and rack awareness!
I would be happy to get your feedback
Created 10-29-2019 02:29 PM
Yes, it took enormously long to get rid of those deleted block and it only finished a few hours ago.
No..those blocks were essentially a massive folder (almost 1 TB) that was deleted by someone. Its slow because its just a single node and a single machine setup. Basically 1 name node, 1 secondary name node, 1 data node all all on the same box (HDP 2.6 + HCP).
Are there any tuning/diagnostic guides available with regards to the parameters that you mentioned?
Thank you for your help and assistance so far.
Created 10-29-2019 10:19 PM
Okay, now with that information I get a better understanding and picture. By default, HDP 2.6 has a replication factor of 3 so it's looking to place the other 2 copies on different data nodes that the existing unless you create new files with a replication factor of 1 you will continue to get the unreplicated block errors 🙂 but now that you know it's manageable.
Maybe next time you delete files in hdfs use the -skipTrash option
hdfs dfs -rm -skipTrash /path/to/hdfs/file/to/remove/permanently
or emtpyting the existing .Trash
Options:
Change the replication factor of a file at creatio:
hdfs dfs –setrep –w 1 /user/hdfs/file.txt
Or change the replication factor of a directory
hdfs dfs -setrep -R 1 /user/hdfs/your_dir
Changing the replication factor for a directory will only affect the existing files and the new files under the directory will get created with the default replication factor in dfs.replication from hdfs-site.xml maybe in your case this is what you should change to 1 as this takes effect cluster-wide for your Dev environment
Happy hadooping