Member since
04-03-2018
2
Posts
0
Kudos Received
0
Solutions
02-11-2020
08:54 AM
Problem Description (The Crash) :
During an HDFS rolling restart, the Standby NameNode (SBNN) failed to load the FsImage, causing the SBNN to crash and interrupting the process of rolling restart (that is actually good news). At this stage, the SBNN is down (if it is shown as “started” in CM, you can manually stop it via the CM Web UI), but the Active NameNode (ANN) is still active and operates the HDFS service properly.
This means,
The service is still up and the client can issue Read / Write requests over HDFS.
Do NOT by any means try to restart the ANN.
On the SBNN logs, the following stack trace can be observed:
2020-02-10 13:51:56,845 ERROR org.apache.hadoop.hdfs.server.namenode.FSImage: Failed to load image from FSImageFile(file=/data/nn2/dfs/nn/current/fsimage_0000000007432739660, cpktTxId=0000000007432739660)
java.lang.NullPointerException
at org.apache.hadoop.hdfs.server.namenode.INodeDirectory.addChild(INodeDirectory.java:536)
at org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.addToParent(FSImageFormatPBINode.java:274)
at org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.loadINodeDirectorySection(FSImageFormatPBINode.java:211)
at org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.loadInternal(FSImageFormatProtobuf.java:265)
at org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.load(FSImageFormatProtobuf.java:184)
at org.apache.hadoop.hdfs.server.namenode.FSImageFormat$LoaderDelegator.load(FSImageFormat.java:226)
...
2020-02-10 13:51:57,018 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: Failed to start namenode.
Root Cause
While the stack trace displays a NullPointerException, it often means that the FsImage is corrupted and the SBNN is not able to parse it correctly.
Solution
To resolve this issue, manually “bootstrap” the SBNN base on the ANN. To do that, copy the content of the ANN dfs.namenode.name.dir and paste it in the SBNN dfs.namenode.name.dir.
If this does not work, it means that the ANN FsImage is also corrupted and needs to be repaired before performing the following steps:
Put the ANN in safemode to prevent any write on the FS: hdfs dfsadmin -fs hdfs://<ANN_FQDN>:<ANN_PORT> -safemode enter
Save the namespace of the ANN to create a new FsImage: hdfs dfsadmin -fs hdfs://<ANN_FQDN>:<ANN_PORT> -saveNamespace
Check the value of dfs.namenode.name.dir for the ANN and SBNN: dfs.namenode.name.dir = /data/nn1/dfs/nn,/data/nn2/dfs/nn
Take a backup the content of the SBNN dfs.namenode.name.dir: cp -pr /data/nn1/dfs/nn /data/nn1/dfs/nn_bak_<date_time>
cp -pr /data/nn2/dfs/nn /data/nn2/dfs/nn_bak_<date_time>
Carefully clear the content of the SBNN dfs.namenode.name.dir: rm -rf /data/nn1/dfs/nn
rm -rf /data/nn2/dfs/nn
From the ANN, scp the content of the ANN dfs.namenode.name.dir into the SBNN: scp -pr /data/nn1/dfs/nn SBNN:/data/nn1/dfs
scp -pr /data/nn2/dfs/nn SBNN:/data/nn2/dfs
On the SBNN delete the lock file on the dfs.namenode.name.dir: rm -f /data/nn1/dfs/nn/in_use.lock
rm -f /data/nn2/dfs/nn/in_use.lock
Start the SBNN from the CM UI.
While the SBNN is starting and the FsImage is loaded, leave the safemode on the ANN:
hdfs dfsadmin -fs hdfs://<ANN_FQDN>:<ANN_PORT> -safemode leave
The SBNN should start normally (if the ZKFC is not started, start it to notify both namenode that the SBNN is back on track).
Now, complete the HDFS rolling restart.
If the SBNN fails to start with the same stack trace as above (failed to load FsImage), this means that the FsImage of the ANN is also corrupted and needs to be fixed before doing the above steps.
Conclusion
When this issue occurs in a rolling restart (i.e only one of the namenode is down), it is possible to solve it with minimum downtime (only the safemode will disturb the running applications).
The fact that the FsImage was preserved on the ANN allowed us to “bootstrap” it to the SBNN and let it play the subsequent edit present in the dfs.namenode.name.dir, bringing it up to date and starting doing its actual job: doing checkpoints.
... View more
Labels:
12-19-2018
01:34 PM
Hey folks,
I was recently doing some basic benchmarks and then I tried to use our classical DFSIO suite but I found myself stuck with the following error :
main : run as user is hdfs
main : requested yarn user is hdfs
Requested user hdfs is banned
As TestDFSIO default output dir is /benchmark/TestDFSIO and it is automatically created with the following permission : inode="/benchmarks/TestDFSIO/io_control/in_file_test_io_0":hdfs:hdfs:drwxr-xr-x The use of hdfs user is mandatory. At least that's what i thought... Why HDFS user is banned ? After looking around some configuration files I found the banned.user property in :
/etc/hadoop/cong/container-executor.cfg
#/*
# * Licensed to the Apache Software Foundation (ASF) under one
# * or more contributor license agreements. See the NOTICE file
# * distributed with this work for additional information
# * regarding copyright ownership. The ASF licenses this file
# * to you under the Apache License, Version 2.0 (the
# * "License"); you may not use this file except in compliance
# * with the License. You may obtain a copy of the License at
# *
# * http://www.apache.org/licenses/LICENSE-2.0
# *
# * Unless required by applicable law or agreed to in writing, software
# * distributed under the License is distributed on an "AS IS" BASIS,
# * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# * See the License for the specific language governing permissions and
# * limitations under the License.
# */
yarn.nodemanager.local-dirs=/hadoop/yarn/local
yarn.nodemanager.log-dirs=/hadoop/yarn/log
yarn.nodemanager.linux-container-executor.group=hadoop
banned.users=hdfs,yarn,mapred,bin
As we can see, banned.users is populated with hdfs,yarn,mapred. This value is inherited from the J2 file : /var/lib/ambari-server/resources/common-services/YARN/2.1.0.2.0/package/templates/container-executor.cfg.j2
yarn.nodemanager.local-dirs={{nm_local_dirs}}
yarn.nodemanager.log-dirs={{nm_log_dirs}}
yarn.nodemanager.linux-container-executor.group={{yarn_executor_container_group}}
banned.users=hdfs,yarn,mapred,bin
min.user.id={{min_user_id}}
So, at this point, there is two solutions in order to solve our original issue. The first : remove hdfs from banned.users (not recommended) The second : find a way to change the basedir or TestDFSIO. And this is what we are going to do. TestDFSIO : is the output dir really hardcoded ? If we look closer to the usage function of TestDFSIO there is no simple option to change the basedir, it seems that the default dir /benchmarks/TestDFSIO is hardcoded in the jar itself. And it is WAS ! The possibility to change the output dir of TestDFSIO was asked in MAPREDUCE-1614 and incorporate in MAPREDUCE-1832. So now, it possible to use : -Dtest.build.data=/path/of_output_dir
Example : hadoop jar /usr/hdp/current/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient-tests.jar TestDFSIO -Dtest.build.data=/user/ambari-qa/TestDFSIO -write -nrFiles 10 -fileSize 1000 -resFile /root/dfsio_result.log
18/12/18 13:49:59 INFO fs.TestDFSIO: TestDFSIO.1.8
18/12/18 13:49:59 INFO fs.TestDFSIO: nrFiles = 10
18/12/18 13:49:59 INFO fs.TestDFSIO: nrBytes (MB) = 1000.0
18/12/18 13:49:59 INFO fs.TestDFSIO: bufferSize = 1000000
18/12/18 13:49:59 INFO fs.TestDFSIO: baseDir = /user/ambari-qa/TestDFSIO
18/12/18 13:50:00 INFO fs.TestDFSIO: creating control file: 1048576000 bytes, 10 files
18/12/18 13:50:01 INFO fs.TestDFSIO: created control files for: 10 files
...
18/12/18 13:50:02 INFO mapreduce.JobSubmitter: Kind: HDFS_DELEGATION_TOKEN, Service: X.X.X.X:8020, Ident: (HDFS_DELEGATION_TOKEN token 33 for ambari-qa)
...
18/12/18 13:50:36 INFO fs.TestDFSIO: ----- TestDFSIO ----- : write
18/12/18 13:50:36 INFO fs.TestDFSIO: Date & time: Tue Dec 18 13:50:36 UTC 2018
18/12/18 13:50:36 INFO fs.TestDFSIO: Number of files: 10
18/12/18 13:50:36 INFO fs.TestDFSIO: Total MBytes processed: 10000.0
18/12/18 13:50:36 INFO fs.TestDFSIO: Throughput mb/sec: 145.73223159766246
18/12/18 13:50:36 INFO fs.TestDFSIO: Average IO rate mb/sec: 153.26971435546875
18/12/18 13:50:36 INFO fs.TestDFSIO: IO rate std deviation: 39.996241684601024
18/12/18 13:50:36 INFO fs.TestDFSIO: Test exec time sec: 35.042
18/12/18 13:50:36 INFO fs.TestDFSIO:
Conclusion Basic actions, like benchmarks should not change the default configuration of your cluster. Always try to tune/custom your basics action to fit your cluster rather than the opposit .
... View more