Support Questions

Find answers, ask questions, and share your expertise

Capacity scheduler - error while executing job

avatar
Expert Contributor

Hi. I try using YARN Queue Manager but I've got an issue.

I created group 'developers' (groupadd command), then I created user 'mgrabowski' which is a member of that group. Then I executed command 'build/env/bin/hue useradmin_sync_with_unix'. In YARN Queue Manager I created queue 'q_developers' and I set the parameter 'Queue Mappings' on 'g:developers:q_developers'.

When I try executing Hive query in Hue I've got an error.

 0, text: $root.design.watch.logs().join('\n')">ERROR : Failed to execute tez graph.
org.apache.tez.dag.api.TezException: org.apache.hadoop.yarn.exceptions.YarnException: Failed to submit application_1469000725460_0018 to YARN : Failed to submit application application_1469000725460_0018 submitted by user mgrabowski reason: No groups found for user mgrabowski
	at org.apache.tez.client.TezClient.start(TezClient.java:413)
	at org.apache.hadoop.hive.ql.exec.tez.TezSessionState.open(TezSessionState.java:196)
	at org.apache.hadoop.hive.ql.exec.tez.TezSessionPoolManager.closeAndOpen(TezSessionPoolManager.java:311)
	at org.apache.hadoop.hive.ql.exec.tez.TezTask.submit(TezTask.java:453)
	at org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(TezTask.java:180)
	at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160)
	at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:89)
	at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1720)
	at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1477)
	at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1254)
	at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1118)
	at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1113)
	at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:154)
	at org.apache.hive.service.cli.operation.SQLOperation.access$100(SQLOperation.java:71)
	at org.apache.hive.service.cli.operation.SQLOperation$1$1.run(SQLOperation.java:206)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
	at org.apache.hive.service.cli.operation.SQLOperation$1.run(SQLOperation.java:218)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.hadoop.yarn.exceptions.YarnException: Failed to submit application_1469000725460_0018 to YARN : Failed to submit application application_1469000725460_0018 submitted by user mgrabowski reason: No groups found for user mgrabowski
	at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.submitApplication(YarnClientImpl.java:271)
	at org.apache.tez.client.TezYarnClient.submitApplication(TezYarnClient.java:72)
	at org.apache.tez.client.TezClient.start(TezClient.java:408)
	... 23 more

Does anyone know why? It is weird that 'u:mgrabowski:developers' works fine.

1 ACCEPTED SOLUTION

avatar
Expert Contributor

Ok, I know what was the problem. I've got 3-node cluster, but I created user accounts only on Hue node (node 3). When I created user accounts on node 1 and 2, everything seems good.

But I'm not sure if it's a good solution. I have to create user accounts and groups on every node in my cluster? What if I will have 1000 nodes?

View solution in original post

9 REPLIES 9

avatar

It is throwing error for user mgrabowski

Have you allowed this user to submit jobs to yarn or is this user part of allowed groups?

avatar
Expert Contributor

This user is a 'developers' group member too.

When Queue Mappings equals 'u:matgrabowski:q_developers' everything works fine. But 'g:developers:q_developers' doesn't work.

avatar
Master Guru

I heard there is some group caching in HDFS. But it should be refreshed after 5 minutes

hadoop.security.groups.cache.secs

Any chance to restart hdfs/yarn to make sure thats not the problem?

avatar
Expert Contributor

Ok, I know what was the problem. I've got 3-node cluster, but I created user accounts only on Hue node (node 3). When I created user accounts on node 1 and 2, everything seems good.

But I'm not sure if it's a good solution. I have to create user accounts and groups on every node in my cluster? What if I will have 1000 nodes?

avatar
Rising Star

@Mateusz : In that case you may need to write a small shell script to create user or group or add user to a group since we will have ssh keys enabled. Else you may need to try expect command which will pass the password too. Hope it clarifies. If you are satisfied pls rate.

avatar
Master Guru

On big clusters people normally setup an ldap server. Ipa for example is free and simple. Look on github for the security workshops of Ali baijwa. Or as said below use a ssh script or ansible or pshell to run commands on all nodes. Note some more esoteric components of the stack require that usernames have the same uid on all nodes of the cluster.

https://github.com/abajwa-hw

avatar
Expert Contributor

Can you tell me which of components require that usernames have the same uid on all nodes and why? It's interesting.

avatar
Master Guru

Honestly if I knew I would have mentioned them :-). I setup a cluster with a simple shell script

ssh-all.sh:

for i in server1 server2 server3; do ssh $i S1; done

and created users manually on a small cluster before ( we only had ~10 users so it didn't seem worth it to setup LDAP ). I never bothered about uids and never ran into problems. But we used standard stuff oozie, hive ... and never ran into problems. But other people told me that some components don't take this well.

Honestly not sure which could be that I am sure that Namenode HA setup with NFS does not work because NFS depends on the same UID but I have problems thinking of a component that would need the same uids in an hadoop environment. HDFS does not care about uids. It cares about usernames.

avatar
Explorer

I had the same problem and I also solve it adding the user to all controler nodes.

Run a script for add them from on_to_all

______________________________________________________________________________________________

#!/bin/bash
# Linux/UNIX box with ssh key based login
SERVERS=/root/hadoop_hosts
# SSH User name
USR="root"
# Email
SUBJECT="Server user login report"
EMAIL="your_e-mail@here"
EMAILMESSAGE="/tmp/sshpool_`date +%Y%m%d-%H:%M`.txt"
# create new file
>$EMAILMESSAGE
# connect each host and pull up user listing
for host in `cat $SERVERS`
do
echo "--------------------------------" >>$EMAILMESSAGE
echo "* HOST: $host " >>$EMAILMESSAGE
echo "--------------------------------" >>$EMAILMESSAGE
###ssh $USR@$host w >> $EMAILMESSAGE
ssh -tq -o "BatchMode yes" $USR@$host $1 >> $EMAILMESSAGE
done
# send an email using /bin/mail
######/bin/mailx -s "$SUBJECT" "$EMAIL" < $EMAILMESSAGE
echo ">>>>>>>>>>>>>>>>>>>>>>>>>>><<<<<<<<<<<<<<<<<<<<<<<<<<<<"
echo ">>> check the output file " $EMAILMESSAGE

_________________________________________________________________________________________

put DNS servers names into /root/hadoop_hosts

Also in linux there is a good command called pssh to run comands in parallel in computer clusters

🙂