Community Articles

Find and share helpful community-sourced technical articles.
Announcements
Celebrating as our community reaches 100,000 members! Thank you!
avatar
Cloudera Employee

Hey folks,

I was recently doing some basic benchmarks and then I tried to use our classical DFSIO suite but I found myself stuck with the following error :

main : run as user is hdfs
main : requested yarn user is hdfs
Requested user hdfs is banned

As TestDFSIO default output dir is /benchmark/TestDFSIO and it is automatically created with the following permission :

inode="/benchmarks/TestDFSIO/io_control/in_file_test_io_0":hdfs:hdfs:drwxr-xr-x

The use of hdfs user is mandatory. At least that's what i thought...

Why HDFS user is banned ?

After looking around some configuration files I found the banned.user property in :

/etc/hadoop/cong/container-executor.cfg
#/*
# * Licensed to the Apache Software Foundation (ASF) under one
# * or more contributor license agreements.  See the NOTICE file
# * distributed with this work for additional information
# * regarding copyright ownership.  The ASF licenses this file
# * to you under the Apache License, Version 2.0 (the
# * "License"); you may not use this file except in compliance
# * with the License.  You may obtain a copy of the License at
# *
# *     http://www.apache.org/licenses/LICENSE-2.0
# *
# * Unless required by applicable law or agreed to in writing, software
# * distributed under the License is distributed on an "AS IS" BASIS,
# * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# * See the License for the specific language governing permissions and
# * limitations under the License.
# */
yarn.nodemanager.local-dirs=/hadoop/yarn/local
yarn.nodemanager.log-dirs=/hadoop/yarn/log
yarn.nodemanager.linux-container-executor.group=hadoop
banned.users=hdfs,yarn,mapred,bin

As we can see, banned.users is populated with hdfs,yarn,mapred. This value is inherited from the J2 file :

/var/lib/ambari-server/resources/common-services/YARN/2.1.0.2.0/package/templates/container-executor.cfg.j2

yarn.nodemanager.local-dirs={{nm_local_dirs}}
yarn.nodemanager.log-dirs={{nm_log_dirs}}
yarn.nodemanager.linux-container-executor.group={{yarn_executor_container_group}}
banned.users=hdfs,yarn,mapred,bin
min.user.id={{min_user_id}}

So, at this point, there is two solutions in order to solve our original issue.

The first : remove hdfs from banned.users (not recommended)

The second : find a way to change the basedir or TestDFSIO. And this is what we are going to do.

TestDFSIO : is the output dir really hardcoded ?

If we look closer to the usage function of TestDFSIO there is no simple option to change the basedir, it seems that the default dir /benchmarks/TestDFSIO is hardcoded in the jar itself.

And it is WAS !

The possibility to change the output dir of TestDFSIO was asked in MAPREDUCE-1614 and incorporate in MAPREDUCE-1832. So now, it possible to use :

-Dtest.build.data=/path/of_output_dir

Example :

hadoop jar /usr/hdp/current/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient-tests.jar TestDFSIO -Dtest.build.data=/user/ambari-qa/TestDFSIO -write -nrFiles 10 -fileSize 1000 -resFile /root/dfsio_result.log

18/12/18 13:49:59 INFO fs.TestDFSIO: TestDFSIO.1.8
18/12/18 13:49:59 INFO fs.TestDFSIO: nrFiles = 10
18/12/18 13:49:59 INFO fs.TestDFSIO: nrBytes (MB) = 1000.0
18/12/18 13:49:59 INFO fs.TestDFSIO: bufferSize = 1000000
18/12/18 13:49:59 INFO fs.TestDFSIO: baseDir = /user/ambari-qa/TestDFSIO
18/12/18 13:50:00 INFO fs.TestDFSIO: creating control file: 1048576000 bytes, 10 files
18/12/18 13:50:01 INFO fs.TestDFSIO: created control files for: 10 files
...
18/12/18 13:50:02 INFO mapreduce.JobSubmitter: Kind: HDFS_DELEGATION_TOKEN, Service: X.X.X.X:8020, Ident: (HDFS_DELEGATION_TOKEN token 33 for ambari-qa)
...
18/12/18 13:50:36 INFO fs.TestDFSIO: ----- TestDFSIO ----- : write
18/12/18 13:50:36 INFO fs.TestDFSIO:            Date & time: Tue Dec 18 13:50:36 UTC 2018
18/12/18 13:50:36 INFO fs.TestDFSIO:        Number of files: 10
18/12/18 13:50:36 INFO fs.TestDFSIO: Total MBytes processed: 10000.0
18/12/18 13:50:36 INFO fs.TestDFSIO:      Throughput mb/sec: 145.73223159766246
18/12/18 13:50:36 INFO fs.TestDFSIO: Average IO rate mb/sec: 153.26971435546875
18/12/18 13:50:36 INFO fs.TestDFSIO:  IO rate std deviation: 39.996241684601024
18/12/18 13:50:36 INFO fs.TestDFSIO:     Test exec time sec: 35.042
18/12/18 13:50:36 INFO fs.TestDFSIO:

Conclusion

Basic actions, like benchmarks should not change the default configuration of your cluster. Always try to tune/custom your basics action to fit your cluster rather than the opposit .

3,799 Views
0 Kudos
Comments
avatar
Rising Star

The following always worked for me:

kinit -kt hdfs.keytab hdfs
hadoop fs -mkdir /benchmarks
hadoop fs -chmod 0777 /benchmarks

You can always lock down the directory permissions to only allow a certain group to write to this directory.