Support Questions

Find answers, ask questions, and share your expertise

MapReduce job failing after kerberos

avatar
Expert Contributor

I enabled kerberos on HDP 2.3.2 cluster using ambari 2.1.2.1 and then tried to run map reduce job on the edge node as a local user but the job failed:

Error Message:

Diagnostics: Application application_1456454501315_0001 initialization failed (exitCode=255) with output: main : command provided 0

main : run as user is xxxxx

main : requested yarn user is xxxxx

User xxxxx not found Failing this attempt. Failing the application. 16/02/25 18:42:28 INFO mapreduce.Job: Counters: 0 Job Finished in 7.915 seconds

My understanding is that we don't need the edge node local user anywhere else.. but I am not sure why my map reduce job is failing due to the user not being there on other nodes. please help

example mapreduce job:

XXXXX:~#yarn jar /usr/hdp/2.3.2.0-2950/hadoop-mapreduce/hadoop-mapreduce-examples-2.7.1.2.3.2.0-2950.jar pi 16 100000

1 ACCEPTED SOLUTION

avatar
Expert Contributor

jobs are running fine after i added the user to hadoop group on all the nodes .. but i am not sure adding the user account to the hadoop group would be a good idea ..

View solution in original post

20 REPLIES 20

avatar
Rising Star

@rbalam,

First check the java version and java version path env variable if not the same ten create soft link $JAVA_HOM/bin/java to /usr/bin/java

#java -version

Cross check steps for reference: [root@sandbox ~]# which java

/usr/lib/jvm/java-1.7.0-openjdk.x86_64/bin/java

[root@sandbox ~]# ls -l /usr/bin/java

lrwxrwxrwx 1 root root 22 2014-12-16 18:25 /usr/bin/java -> /etc/alternatives/java

[root@sandbox ~]# ls -l /etc/alternatives/java

lrwxrwxrwx 1 root root 46 2014-12-16 18:25 /etc/alternatives/java -> /usr/lib/jvm/jre-1.7.0-openjdk.x86_64/bin/java

Check the java home path is set properly

vi /etc/hadoop/conf/hadoop-env.sh

Run the simple pi mapreduce job

#yarn jar /usr/hdp/2.3.2.0-2950/hadoop-mapreduce/hadoop-mapreduce-examples.jar pi 1 1

avatar
Rising Star

@rbalam, if above solution not work then check the yarn-site.xml is set yarn.application.classpath properly. (required lib directory should be exists.

avatar
Rising Star

@rbalam, third steps should be check the permission of that user to read the classpath directory and hdfs folder

avatar
Contributor

Hello

 

Please how did u add users?

Actually i am using the active directory users and I just add them into Edge node using samba + kerberos

 

Now I have enabled kerberos on the hadoop hortonworks cluster => I got the same issue as yours

So may I add the same  user to all nodes? adduser? which group? how could it be resolved as an AD user?

 

Thanks

 

avatar

@asmarz,

One of our members posted a reply on how to add users in the thread you posted a similar question to later the same day.

 

As this is an older thread which was previously marked 'Solved', you would have a better chance of receiving a resolution by starting a new thread. This will also provide the opportunity for you to provide details specific to your environment about what you did in an attempt to add the relevant user accounts that could aid others in providing a more relevant, accurate answer to your question.

 

 

Bill Brooks, Community Moderator
Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.

avatar

I faced the same issue for Kerberos environment. It got resolved after I created the user on all the nodes.

avatar
Expert Contributor

jobs are running fine after i added the user to hadoop group on all the nodes .. but i am not sure adding the user account to the hadoop group would be a good idea ..

avatar
Contributor

@Neeraj Sabharwal

Hi,

I encountered a similar mistake, running spark, the user can not find!Please help me, thank you!

spark-submit \
--class org.apache.spark.examples.SparkPi \
--master yarn-client \
--executor-memory 1G \
--num-executors 1 \
--num-executors 2 \
--driver-memory 1g \
--executor-cores 1 \
--principal kadmin/admin@NGAA.COM \
--keytab   /home/test/sparktest/princpal/sparkjob.keytab \
/opt/cloudera/parcels/CDH/lib/spark/lib/spark-examples.jar 12   

error messages:

17/02/10 13:54:16 INFO security.UserGroupInformation: Login successful for user kadmin/admin@NGAA.COM using keytab file /home/test/sparktest/princpal/sparkjob.keytab
17/02/10 13:54:16 INFO spark.SparkContext: Running Spark version 1.6.0
17/02/10 13:54:16 INFO spark.SecurityManager: Changing view acls to: root,kadmin
17/02/10 13:54:16 INFO spark.SecurityManager: Changing modify acls to: root,kadmin
17/02/10 13:54:16 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root, kadmin); users with modify permissions: Set(root, kadmin)
17/02/10 13:54:17 INFO util.Utils: Successfully started service 'sparkDriver' on port 56214.
17/02/10 13:54:17 INFO slf4j.Slf4jLogger: Slf4jLogger started
17/02/10 13:54:17 INFO Remoting: Starting remoting
17/02/10 13:54:18 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriverActorSystem@10.10.100.51:40936]
17/02/10 13:54:18 INFO Remoting: Remoting now listens on addresses: [akka.tcp://sparkDriverActorSystem@10.10.100.51:40936]
17/02/10 13:54:18 INFO util.Utils: Successfully started service 'sparkDriverActorSystem' on port 40936.
17/02/10 13:54:18 INFO spark.SparkEnv: Registering MapOutputTracker
17/02/10 13:54:18 INFO spark.SparkEnv: Registering BlockManagerMaster
17/02/10 13:54:18 INFO storage.DiskBlockManager: Created local directory at /tmp/blockmgr-cf37cdde-4eab-4804-b84b-b5f937828aa7
17/02/10 13:54:18 INFO storage.MemoryStore: MemoryStore started with capacity 530.3 MB
17/02/10 13:54:18 INFO spark.SparkEnv: Registering OutputCommitCoordinator
17/02/10 13:54:19 INFO util.Utils: Successfully started service 'SparkUI' on port 4040.
17/02/10 13:54:19 INFO ui.SparkUI: Started SparkUI at http://10.10.100.51:4040
17/02/10 13:54:19 INFO spark.SparkContext: Added JAR file:/opt/cloudera/parcels/CDH/lib/spark/lib/spark-examples.jar at spark://10.10.100.51:56214/jars/spark-examples.jar with timestamp 1486706059601
17/02/10 13:54:19 INFO yarn.Client: Attempting to login to the Kerberos using principal: kadmin/admin@NGAA.COM and keytab: /home/test/sparktest/princpal/sparkjob.keytab
17/02/10 13:54:19 INFO client.RMProxy: Connecting to ResourceManager at hadoop1/10.10.100.51:8032
17/02/10 13:54:20 INFO yarn.Client: Requesting a new application from cluster with 4 NodeManagers
17/02/10 13:54:20 INFO yarn.Client: Verifying our application has not requested more than the maximum memory capability of the cluster (8192 MB per container)
17/02/10 13:54:20 INFO yarn.Client: Will allocate AM container, with 896 MB memory including 384 MB overhead
17/02/10 13:54:20 INFO yarn.Client: Setting up container launch context for our AM
17/02/10 13:54:20 INFO yarn.Client: Setting up the launch environment for our AM container
17/02/10 13:54:21 INFO yarn.Client: Credentials file set to: credentials-79afe260-414b-4df7-8242-3cd1a279dbc7
17/02/10 13:54:21 INFO yarn.YarnSparkHadoopUtil: getting token for namenode: hdfs://hadoop2:8020/user/kadmin/.sparkStaging/application_1486705141135_0002
17/02/10 13:54:21 INFO hdfs.DFSClient: Created HDFS_DELEGATION_TOKEN token 44 for kadmin on 10.10.100.52:8020
17/02/10 13:54:21 INFO yarn.Client: Renewal Interval set to 86400061
17/02/10 13:54:21 INFO yarn.Client: Preparing resources for our AM container
17/02/10 13:54:21 INFO yarn.YarnSparkHadoopUtil: getting token for namenode: hdfs://hadoop2:8020/user/kadmin/.sparkStaging/application_1486705141135_0002
17/02/10 13:54:21 INFO hdfs.DFSClient: Created HDFS_DELEGATION_TOKEN token 45 for kadmin on 10.10.100.52:8020
17/02/10 13:54:22 INFO hive.metastore: Trying to connect to metastore with URI thrift://hadoop1:9083
17/02/10 13:54:22 INFO hive.metastore: Opened a connection to metastore, current connections: 1
17/02/10 13:54:22 INFO hive.metastore: Connected to metastore.
17/02/10 13:54:22 INFO hive.metastore: Closed a connection to metastore, current connections: 0
17/02/10 13:54:23 INFO yarn.Client: To enable the AM to login from keytab, credentials are being copied over to the AM via the YARN Secure Distributed Cache.
17/02/10 13:54:23 INFO yarn.Client: Uploading resource file:/home/test/sparktest/princpal/sparkjob.keytab -> hdfs://hadoop2:8020/user/kadmin/.sparkStaging/application_1486705141135_0002/sparkjob.keytab
17/02/10 13:54:23 INFO yarn.Client: Uploading resource file:/tmp/spark-79d08367-6f8d-4cb3-813e-d450e90a3128/__spark_conf__4615276915023723512.zip -> hdfs://hadoop2:8020/user/kadmin/.sparkStaging/application_1486705141135_0002/__spark_conf__4615276915023723512.zip
17/02/10 13:54:23 INFO spark.SecurityManager: Changing view acls to: root,kadmin
17/02/10 13:54:23 INFO spark.SecurityManager: Changing modify acls to: root,kadmin
17/02/10 13:54:23 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root, kadmin); users with modify permissions: Set(root, kadmin)
17/02/10 13:54:23 INFO yarn.Client: Submitting application 2 to ResourceManager
17/02/10 13:54:23 INFO impl.YarnClientImpl: Submitted application application_1486705141135_0002
17/02/10 13:54:24 INFO yarn.Client: Application report for application_1486705141135_0002 (state: FAILED)
17/02/10 13:54:24 INFO yarn.Client: 
	 client token: N/A
	 diagnostics: Application application_1486705141135_0002 failed 2 times due to AM Container for appattempt_1486705141135_0002_000002 exited with  exitCode: -1000
For more detailed output, check application tracking page:http://hadoop1:8088/proxy/application_1486705141135_0002/Then, click on links to logs of each attempt.
Diagnostics: Application application_1486705141135_0002 initialization failed (exitCode=255) with output: main : command provided 0
main : run as user is kadmin
main : requested yarn user is kadmin
User kadmin not found

Failing this attempt. Failing the application.
	 ApplicationMaster host: N/A
	 ApplicationMaster RPC port: -1
	 queue: root.users.kadmin
	 start time: 1486706063635
	 final status: FAILED
	 tracking URL: http://hadoop1:8088/cluster/app/application_1486705141135_0002
	 user: kadmin
17/02/10 13:54:24 INFO yarn.Client: Deleting staging directory .sparkStaging/application_1486705141135_0002
17/02/10 13:54:24 ERROR spark.SparkContext: Error initializing SparkContext.
org.apache.spark.SparkException: Yarn application has already ended! It might have been killed or unable to launch application master.
	at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.waitForApplication(YarnClientSchedulerBackend.scala:124)
	at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:64)
	at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:144)
	at org.apache.spark.SparkContext.<init>(SparkContext.scala:541)
	at org.apache.spark.examples.SparkPi$.main(SparkPi.scala:29)
	at org.apache.spark.examples.SparkPi.main(SparkPi.scala)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
	at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
	at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
17/02/10 13:54:25 INFO ui.SparkUI: Stopped Spark web UI at http://10.10.100.51:4040
17/02/10 13:54:25 INFO cluster.YarnClientSchedulerBackend: Shutting down all executors
17/02/10 13:54:25 INFO cluster.YarnClientSchedulerBackend: Asking each executor to shut down
17/02/10 13:54:25 INFO cluster.YarnClientSchedulerBackend: Stopped
17/02/10 13:54:25 INFO spark.MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
17/02/10 13:54:25 ERROR util.Utils: Uncaught exception in thread main
java.lang.NullPointerException
	at org.apache.spark.network.shuffle.ExternalShuffleClient.close(ExternalShuffleClient.java:152)
	at org.apache.spark.storage.BlockManager.stop(BlockManager.scala:1231)
	at org.apache.spark.SparkEnv.stop(SparkEnv.scala:96)
	at org.apache.spark.SparkContext$$anonfun$stop$12.apply$mcV$sp(SparkContext.scala:1767)
	at org.apache.spark.util.Utils$.tryLogNonFatalError(Utils.scala:1230)
	at org.apache.spark.SparkContext.stop(SparkContext.scala:1766)
	at org.apache.spark.SparkContext.<init>(SparkContext.scala:613)
	at org.apache.spark.examples.SparkPi$.main(SparkPi.scala:29)
	at org.apache.spark.examples.SparkPi.main(SparkPi.scala)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
	at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
	at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
17/02/10 13:54:25 INFO spark.SparkContext: Successfully stopped SparkContext
Exception in thread "main" org.apache.spark.SparkException: Yarn application has already ended! It might have been killed or unable to launch application master.
	at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.waitForApplication(YarnClientSchedulerBackend.scala:124)
	at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:64)
	at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:144)
	at org.apache.spark.SparkContext.<init>(SparkContext.scala:541)
	at org.apache.spark.examples.SparkPi$.main(SparkPi.scala:29)
	at org.apache.spark.examples.SparkPi.main(SparkPi.scala)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
	at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
	at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
17/02/10 13:54:25 INFO storage.DiskBlockManager: Shutdown hook called
17/02/10 13:54:25 INFO util.ShutdownHookManager: Shutdown hook called
17/02/10 13:54:25 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-79d08367-6f8d-4cb3-813e-d450e90a3128/userFiles-58912a50-d060-42ec-8665-7a74c1be9a7b
17/02/10 13:54:25 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-79d08367-6f8d-4cb3-813e-d450e90a3128

Thanks

avatar
Contributor

This problem is caused by two reasons: (1) Each node did not add this ### Linux user and added it to the yarn user group. (2) nodemanager container directory permissions are not normal, this is due to the machine partition is not uniform. Solve as follows Execute on each machine ---> useradd -M ### usermod -a -G supergroup ### Finally, check each node machine node node nm directory permissions are the same!

avatar
Contributor

@rbalam Please refer to my approach.