Support Questions

Find answers, ask questions, and share your expertise
Announcements
Welcome to the upgraded Community! Read this blog to see What’s New!

MapReduce job failing after kerberos

avatar
Contributor

I enabled kerberos on HDP 2.3.2 cluster using ambari 2.1.2.1 and then tried to run map reduce job on the edge node as a local user but the job failed:

Error Message:

Diagnostics: Application application_1456454501315_0001 initialization failed (exitCode=255) with output: main : command provided 0

main : run as user is xxxxx

main : requested yarn user is xxxxx

User xxxxx not found Failing this attempt. Failing the application. 16/02/25 18:42:28 INFO mapreduce.Job: Counters: 0 Job Finished in 7.915 seconds

My understanding is that we don't need the edge node local user anywhere else.. but I am not sure why my map reduce job is failing due to the user not being there on other nodes. please help

example mapreduce job:

XXXXX:~#yarn jar /usr/hdp/2.3.2.0-2950/hadoop-mapreduce/hadoop-mapreduce-examples-2.7.1.2.3.2.0-2950.jar pi 16 100000

1 ACCEPTED SOLUTION

avatar
Contributor

jobs are running fine after i added the user to hadoop group on all the nodes .. but i am not sure adding the user account to the hadoop group would be a good idea ..

View solution in original post

20 REPLIES 20

avatar
Expert Contributor

My understanding is that in case of kerberos enabled cluster users/principal is required to be present on all the nodes.

Refer this https://community.hortonworks.com/questions/15160/adding-a-new-user-to-the-cluster.html

avatar
Contributor

@rbalam when your hadoop cluster integrated with Kerberos security then authenticated user must exist in the every node where the task runs. Refer link which already shared by "rahul pathak"

avatar
Contributor

Could you please confirm this again? if i need to have users on all the nodes in the cluster to run jobs successfully.. i could end up with quite a few users on all the nodes which may become a maintenance head-ache down the line ..

avatar
Contributor

@rbalam, yes I am sure, its will get resolve after adding user on other nodes also. Now first you have to resolve this issue then we can think on other problems.

First time you can try with one user which is exists on every node and see the output of mapreduce job.

You have to use centralized LDAP/Directory along with Kerberos server for user management to reduce maintenance head-ache.

avatar
@rbalam

It may not be related to kerberos.

yarn log -applicationid application_1456454501315_0001

avatar
Contributor

@Neeraj Sabharwal

I ran the job again and tried to get yarn logs .... here is what i see

xxxxx:~#yarn logs -applicationId application_1456457210711_0002 16/02/26 03:44:26 INFO impl.TimelineClientImpl: Timeline service address: http://yarntimelineserveraddress:8188/ws/v1/timeline/ /app-logs/xxxxx/logs/application_1456457210711_0002 does not have any log files.

Here is what I see on the ResourceManager UI

Application application_1456457210711_0002 failed 2 times due to AM Container for appattempt_1456457210711_0002_000002 exited with exitCode: -1000 For more detailed output, check application tracking page:http://resourcemanageruri:8088/cluster/app/application_1456457210711_0002Then, click on links to logs of each attempt.

Diagnostics: Application application_1456457210711_0002 initialization failed (exitCode=255) with output: main : command provided 0

main : run as user is xxxxx

main : requested yarn user is xxxxx

User xxxxx not found Failing this attempt.

Failing the application.

avatar
Contributor
@Vikas Gadade

I created the user on all the nodes but the job is still failing with the following output

xxxxx:/#yarn jar /usr/hdp/2.3.2.0-2950/hadoop-mapreduce/hadoop-mapreduce-examples.jar teragen 10000 /user/xxxxx/teraout8 16/02/26 10:52:18 INFO impl.TimelineClientImpl: Timeline service address: http://timelineuri:8188/ws/v1/timeline/ 16/02/26 10:52:18 INFO hdfs.DFSClient: Created HDFS_DELEGATION_TOKEN token 37 for rbalam on ha-hdfs:testnnhasvc 16/02/26 10:52:19 INFO security.TokenCache: Got dt for hdfs://testnnhasvc; Kind: HDFS_DELEGATION_TOKEN, Service: ha-hdfs:testnnhasvc, Ident: (HDFS_DELEGATION_TOKEN token 37 for rbalam) 16/02/26 10:52:19 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm2 16/02/26 10:52:20 INFO terasort.TeraSort: Generating 10000 using 2 16/02/26 10:52:21 INFO mapreduce.JobSubmitter: number of splits:2 16/02/26 10:52:22 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1456512672399_0001 16/02/26 10:52:22 INFO mapreduce.JobSubmitter: Kind: HDFS_DELEGATION_TOKEN, Service: ha-hdfs:testnnhasvc, Ident: (HDFS_DELEGATION_TOKEN token 37 for rbalam) 16/02/26 10:52:24 INFO impl.YarnClientImpl: Submitted application application_1456512672399_0001 16/02/26 10:52:24 INFO mapreduce.Job: The url to track the job: http://timelineuri:8188/ws/v1/timeline/ 16/02/26 10:52:24 INFO mapreduce.Job: Running job: job_1456512672399_0001 16/02/26 10:52:29 INFO mapreduce.Job: Job job_1456512672399_0001 running in uber mode : false 16/02/26 10:52:29 INFO mapreduce.Job: map 0% reduce 0% 16/02/26 10:52:29 INFO mapreduce.Job: Job job_1456512672399_0001 failed with state FAILED due to: Application application_1456512672399_0001 failed 2 times due to AM Container for appattempt_1456512672399_0001_000002 exited with exitCode: -1000 For more detailed output, check application tracking page:http://timlineserveruri:8088/cluster/app/application_1456512672399_0001Then, click on links to logs of each attempt.

Diagnostics: Application application_1456512672399_0001 initialization failed (exitCode=255) with output: main : command provided 0

main : run as user is xxxxx

main : requested yarn user is xxxxx

Failing this attempt. Failing the application. 16/02/26 10:52:29 INFO mapreduce.Job: Counters: 0

avatar
Contributor

@rbalam, your previous problem is resolved "User xxxxx not found Failing this attempt". here the containers are not launching but it should show a reason why, so you have debug yarn log. Usually this problems comes when you have different JAVA versions, classpath is not properly set or directory permission.

avatar
Contributor

yes. user not found issue is gone after i created the user on all the nodes. Do you know where I can look for which classpath/jars that has permissions issue?

avatar
Contributor

@rbalam,

First check the java version and java version path env variable if not the same ten create soft link $JAVA_HOM/bin/java to /usr/bin/java

#java -version

Cross check steps for reference: [root@sandbox ~]# which java

/usr/lib/jvm/java-1.7.0-openjdk.x86_64/bin/java

[root@sandbox ~]# ls -l /usr/bin/java

lrwxrwxrwx 1 root root 22 2014-12-16 18:25 /usr/bin/java -> /etc/alternatives/java

[root@sandbox ~]# ls -l /etc/alternatives/java

lrwxrwxrwx 1 root root 46 2014-12-16 18:25 /etc/alternatives/java -> /usr/lib/jvm/jre-1.7.0-openjdk.x86_64/bin/java

Check the java home path is set properly

vi /etc/hadoop/conf/hadoop-env.sh

Run the simple pi mapreduce job

#yarn jar /usr/hdp/2.3.2.0-2950/hadoop-mapreduce/hadoop-mapreduce-examples.jar pi 1 1

avatar
Contributor

@rbalam, if above solution not work then check the yarn-site.xml is set yarn.application.classpath properly. (required lib directory should be exists.

avatar
Contributor

@rbalam, third steps should be check the permission of that user to read the classpath directory and hdfs folder

avatar
Explorer

Hello

 

Please how did u add users?

Actually i am using the active directory users and I just add them into Edge node using samba + kerberos

 

Now I have enabled kerberos on the hadoop hortonworks cluster => I got the same issue as yours

So may I add the same  user to all nodes? adduser? which group? how could it be resolved as an AD user?

 

Thanks

 

avatar

@asmarz,

One of our members posted a reply on how to add users in the thread you posted a similar question to later the same day.

 

As this is an older thread which was previously marked 'Solved', you would have a better chance of receiving a resolution by starting a new thread. This will also provide the opportunity for you to provide details specific to your environment about what you did in an attempt to add the relevant user accounts that could aid others in providing a more relevant, accurate answer to your question.

 

 

Bill Brooks, Community Moderator
Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.

avatar
New Contributor

I faced the same issue for Kerberos environment. It got resolved after I created the user on all the nodes.

avatar
Contributor

jobs are running fine after i added the user to hadoop group on all the nodes .. but i am not sure adding the user account to the hadoop group would be a good idea ..

avatar
Rising Star

@Neeraj Sabharwal

Hi,

I encountered a similar mistake, running spark, the user can not find!Please help me, thank you!

spark-submit \
--class org.apache.spark.examples.SparkPi \
--master yarn-client \
--executor-memory 1G \
--num-executors 1 \
--num-executors 2 \
--driver-memory 1g \
--executor-cores 1 \
--principal kadmin/admin@NGAA.COM \
--keytab   /home/test/sparktest/princpal/sparkjob.keytab \
/opt/cloudera/parcels/CDH/lib/spark/lib/spark-examples.jar 12   

error messages:

17/02/10 13:54:16 INFO security.UserGroupInformation: Login successful for user kadmin/admin@NGAA.COM using keytab file /home/test/sparktest/princpal/sparkjob.keytab
17/02/10 13:54:16 INFO spark.SparkContext: Running Spark version 1.6.0
17/02/10 13:54:16 INFO spark.SecurityManager: Changing view acls to: root,kadmin
17/02/10 13:54:16 INFO spark.SecurityManager: Changing modify acls to: root,kadmin
17/02/10 13:54:16 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root, kadmin); users with modify permissions: Set(root, kadmin)
17/02/10 13:54:17 INFO util.Utils: Successfully started service 'sparkDriver' on port 56214.
17/02/10 13:54:17 INFO slf4j.Slf4jLogger: Slf4jLogger started
17/02/10 13:54:17 INFO Remoting: Starting remoting
17/02/10 13:54:18 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriverActorSystem@10.10.100.51:40936]
17/02/10 13:54:18 INFO Remoting: Remoting now listens on addresses: [akka.tcp://sparkDriverActorSystem@10.10.100.51:40936]
17/02/10 13:54:18 INFO util.Utils: Successfully started service 'sparkDriverActorSystem' on port 40936.
17/02/10 13:54:18 INFO spark.SparkEnv: Registering MapOutputTracker
17/02/10 13:54:18 INFO spark.SparkEnv: Registering BlockManagerMaster
17/02/10 13:54:18 INFO storage.DiskBlockManager: Created local directory at /tmp/blockmgr-cf37cdde-4eab-4804-b84b-b5f937828aa7
17/02/10 13:54:18 INFO storage.MemoryStore: MemoryStore started with capacity 530.3 MB
17/02/10 13:54:18 INFO spark.SparkEnv: Registering OutputCommitCoordinator
17/02/10 13:54:19 INFO util.Utils: Successfully started service 'SparkUI' on port 4040.
17/02/10 13:54:19 INFO ui.SparkUI: Started SparkUI at http://10.10.100.51:4040
17/02/10 13:54:19 INFO spark.SparkContext: Added JAR file:/opt/cloudera/parcels/CDH/lib/spark/lib/spark-examples.jar at spark://10.10.100.51:56214/jars/spark-examples.jar with timestamp 1486706059601
17/02/10 13:54:19 INFO yarn.Client: Attempting to login to the Kerberos using principal: kadmin/admin@NGAA.COM and keytab: /home/test/sparktest/princpal/sparkjob.keytab
17/02/10 13:54:19 INFO client.RMProxy: Connecting to ResourceManager at hadoop1/10.10.100.51:8032
17/02/10 13:54:20 INFO yarn.Client: Requesting a new application from cluster with 4 NodeManagers
17/02/10 13:54:20 INFO yarn.Client: Verifying our application has not requested more than the maximum memory capability of the cluster (8192 MB per container)
17/02/10 13:54:20 INFO yarn.Client: Will allocate AM container, with 896 MB memory including 384 MB overhead
17/02/10 13:54:20 INFO yarn.Client: Setting up container launch context for our AM
17/02/10 13:54:20 INFO yarn.Client: Setting up the launch environment for our AM container
17/02/10 13:54:21 INFO yarn.Client: Credentials file set to: credentials-79afe260-414b-4df7-8242-3cd1a279dbc7
17/02/10 13:54:21 INFO yarn.YarnSparkHadoopUtil: getting token for namenode: hdfs://hadoop2:8020/user/kadmin/.sparkStaging/application_1486705141135_0002
17/02/10 13:54:21 INFO hdfs.DFSClient: Created HDFS_DELEGATION_TOKEN token 44 for kadmin on 10.10.100.52:8020
17/02/10 13:54:21 INFO yarn.Client: Renewal Interval set to 86400061
17/02/10 13:54:21 INFO yarn.Client: Preparing resources for our AM container
17/02/10 13:54:21 INFO yarn.YarnSparkHadoopUtil: getting token for namenode: hdfs://hadoop2:8020/user/kadmin/.sparkStaging/application_1486705141135_0002
17/02/10 13:54:21 INFO hdfs.DFSClient: Created HDFS_DELEGATION_TOKEN token 45 for kadmin on 10.10.100.52:8020
17/02/10 13:54:22 INFO hive.metastore: Trying to connect to metastore with URI thrift://hadoop1:9083
17/02/10 13:54:22 INFO hive.metastore: Opened a connection to metastore, current connections: 1
17/02/10 13:54:22 INFO hive.metastore: Connected to metastore.
17/02/10 13:54:22 INFO hive.metastore: Closed a connection to metastore, current connections: 0
17/02/10 13:54:23 INFO yarn.Client: To enable the AM to login from keytab, credentials are being copied over to the AM via the YARN Secure Distributed Cache.
17/02/10 13:54:23 INFO yarn.Client: Uploading resource file:/home/test/sparktest/princpal/sparkjob.keytab -> hdfs://hadoop2:8020/user/kadmin/.sparkStaging/application_1486705141135_0002/sparkjob.keytab
17/02/10 13:54:23 INFO yarn.Client: Uploading resource file:/tmp/spark-79d08367-6f8d-4cb3-813e-d450e90a3128/__spark_conf__4615276915023723512.zip -> hdfs://hadoop2:8020/user/kadmin/.sparkStaging/application_1486705141135_0002/__spark_conf__4615276915023723512.zip
17/02/10 13:54:23 INFO spark.SecurityManager: Changing view acls to: root,kadmin
17/02/10 13:54:23 INFO spark.SecurityManager: Changing modify acls to: root,kadmin
17/02/10 13:54:23 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root, kadmin); users with modify permissions: Set(root, kadmin)
17/02/10 13:54:23 INFO yarn.Client: Submitting application 2 to ResourceManager
17/02/10 13:54:23 INFO impl.YarnClientImpl: Submitted application application_1486705141135_0002
17/02/10 13:54:24 INFO yarn.Client: Application report for application_1486705141135_0002 (state: FAILED)
17/02/10 13:54:24 INFO yarn.Client: 
	 client token: N/A
	 diagnostics: Application application_1486705141135_0002 failed 2 times due to AM Container for appattempt_1486705141135_0002_000002 exited with  exitCode: -1000
For more detailed output, check application tracking page:http://hadoop1:8088/proxy/application_1486705141135_0002/Then, click on links to logs of each attempt.
Diagnostics: Application application_1486705141135_0002 initialization failed (exitCode=255) with output: main : command provided 0
main : run as user is kadmin
main : requested yarn user is kadmin
User kadmin not found

Failing this attempt. Failing the application.
	 ApplicationMaster host: N/A
	 ApplicationMaster RPC port: -1
	 queue: root.users.kadmin
	 start time: 1486706063635
	 final status: FAILED
	 tracking URL: http://hadoop1:8088/cluster/app/application_1486705141135_0002
	 user: kadmin
17/02/10 13:54:24 INFO yarn.Client: Deleting staging directory .sparkStaging/application_1486705141135_0002
17/02/10 13:54:24 ERROR spark.SparkContext: Error initializing SparkContext.
org.apache.spark.SparkException: Yarn application has already ended! It might have been killed or unable to launch application master.
	at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.waitForApplication(YarnClientSchedulerBackend.scala:124)
	at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:64)
	at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:144)
	at org.apache.spark.SparkContext.<init>(SparkContext.scala:541)
	at org.apache.spark.examples.SparkPi$.main(SparkPi.scala:29)
	at org.apache.spark.examples.SparkPi.main(SparkPi.scala)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
	at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
	at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
17/02/10 13:54:25 INFO ui.SparkUI: Stopped Spark web UI at http://10.10.100.51:4040
17/02/10 13:54:25 INFO cluster.YarnClientSchedulerBackend: Shutting down all executors
17/02/10 13:54:25 INFO cluster.YarnClientSchedulerBackend: Asking each executor to shut down
17/02/10 13:54:25 INFO cluster.YarnClientSchedulerBackend: Stopped
17/02/10 13:54:25 INFO spark.MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
17/02/10 13:54:25 ERROR util.Utils: Uncaught exception in thread main
java.lang.NullPointerException
	at org.apache.spark.network.shuffle.ExternalShuffleClient.close(ExternalShuffleClient.java:152)
	at org.apache.spark.storage.BlockManager.stop(BlockManager.scala:1231)
	at org.apache.spark.SparkEnv.stop(SparkEnv.scala:96)
	at org.apache.spark.SparkContext$$anonfun$stop$12.apply$mcV$sp(SparkContext.scala:1767)
	at org.apache.spark.util.Utils$.tryLogNonFatalError(Utils.scala:1230)
	at org.apache.spark.SparkContext.stop(SparkContext.scala:1766)
	at org.apache.spark.SparkContext.<init>(SparkContext.scala:613)
	at org.apache.spark.examples.SparkPi$.main(SparkPi.scala:29)
	at org.apache.spark.examples.SparkPi.main(SparkPi.scala)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
	at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
	at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
17/02/10 13:54:25 INFO spark.SparkContext: Successfully stopped SparkContext
Exception in thread "main" org.apache.spark.SparkException: Yarn application has already ended! It might have been killed or unable to launch application master.
	at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.waitForApplication(YarnClientSchedulerBackend.scala:124)
	at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:64)
	at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:144)
	at org.apache.spark.SparkContext.<init>(SparkContext.scala:541)
	at org.apache.spark.examples.SparkPi$.main(SparkPi.scala:29)
	at org.apache.spark.examples.SparkPi.main(SparkPi.scala)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
	at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
	at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
17/02/10 13:54:25 INFO storage.DiskBlockManager: Shutdown hook called
17/02/10 13:54:25 INFO util.ShutdownHookManager: Shutdown hook called
17/02/10 13:54:25 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-79d08367-6f8d-4cb3-813e-d450e90a3128/userFiles-58912a50-d060-42ec-8665-7a74c1be9a7b
17/02/10 13:54:25 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-79d08367-6f8d-4cb3-813e-d450e90a3128

Thanks

avatar
Rising Star

This problem is caused by two reasons: (1) Each node did not add this ### Linux user and added it to the yarn user group. (2) nodemanager container directory permissions are not normal, this is due to the machine partition is not uniform. Solve as follows Execute on each machine ---> useradd -M ### usermod -a -G supergroup ### Finally, check each node machine node node nm directory permissions are the same!

avatar
Rising Star

@rbalam Please refer to my approach.

avatar
Explorer

hi, @yang jifei, were you able to solve your issue? I am having similar problem as well. Could you please help me?

Labels