Created 02-26-2016 02:51 AM
I enabled kerberos on HDP 2.3.2 cluster using ambari 2.1.2.1 and then tried to run map reduce job on the edge node as a local user but the job failed:
Error Message:
Diagnostics: Application application_1456454501315_0001 initialization failed (exitCode=255) with output: main : command provided 0
main : run as user is xxxxx
main : requested yarn user is xxxxx
User xxxxx not found Failing this attempt. Failing the application. 16/02/25 18:42:28 INFO mapreduce.Job: Counters: 0 Job Finished in 7.915 seconds
My understanding is that we don't need the edge node local user anywhere else.. but I am not sure why my map reduce job is failing due to the user not being there on other nodes. please help
example mapreduce job:
XXXXX:~#yarn jar /usr/hdp/2.3.2.0-2950/hadoop-mapreduce/hadoop-mapreduce-examples-2.7.1.2.3.2.0-2950.jar pi 16 100000
Created 02-26-2016 11:19 PM
jobs are running fine after i added the user to hadoop group on all the nodes .. but i am not sure adding the user account to the hadoop group would be a good idea ..
Created 02-26-2016 04:34 AM
My understanding is that in case of kerberos enabled cluster users/principal is required to be present on all the nodes.
Refer this https://community.hortonworks.com/questions/15160/adding-a-new-user-to-the-cluster.html
Created 02-26-2016 07:04 AM
@rbalam when your hadoop cluster integrated with Kerberos security then authenticated user must exist in the every node where the task runs. Refer link which already shared by "rahul pathak"
Created 02-26-2016 11:58 AM
Could you please confirm this again? if i need to have users on all the nodes in the cluster to run jobs successfully.. i could end up with quite a few users on all the nodes which may become a maintenance head-ache down the line ..
Created 02-26-2016 06:33 PM
@rbalam, yes I am sure, its will get resolve after adding user on other nodes also. Now first you have to resolve this issue then we can think on other problems.
First time you can try with one user which is exists on every node and see the output of mapreduce job.
You have to use centralized LDAP/Directory along with Kerberos server for user management to reduce maintenance head-ache.
Created 02-26-2016 07:41 AM
Created 02-26-2016 11:46 AM
I ran the job again and tried to get yarn logs .... here is what i see
xxxxx:~#yarn logs -applicationId application_1456457210711_0002 16/02/26 03:44:26 INFO impl.TimelineClientImpl: Timeline service address: http://yarntimelineserveraddress:8188/ws/v1/timeline/ /app-logs/xxxxx/logs/application_1456457210711_0002 does not have any log files.
Here is what I see on the ResourceManager UI
Application application_1456457210711_0002 failed 2 times due to AM Container for appattempt_1456457210711_0002_000002 exited with exitCode: -1000 For more detailed output, check application tracking page:http://resourcemanageruri:8088/cluster/app/application_1456457210711_0002Then, click on links to logs of each attempt.
Diagnostics: Application application_1456457210711_0002 initialization failed (exitCode=255) with output: main : command provided 0
main : run as user is xxxxx
main : requested yarn user is xxxxx
User xxxxx not found Failing this attempt.
Failing the application.
Created 02-26-2016 07:06 PM
I created the user on all the nodes but the job is still failing with the following output
xxxxx:/#yarn jar /usr/hdp/2.3.2.0-2950/hadoop-mapreduce/hadoop-mapreduce-examples.jar teragen 10000 /user/xxxxx/teraout8 16/02/26 10:52:18 INFO impl.TimelineClientImpl: Timeline service address: http://timelineuri:8188/ws/v1/timeline/ 16/02/26 10:52:18 INFO hdfs.DFSClient: Created HDFS_DELEGATION_TOKEN token 37 for rbalam on ha-hdfs:testnnhasvc 16/02/26 10:52:19 INFO security.TokenCache: Got dt for hdfs://testnnhasvc; Kind: HDFS_DELEGATION_TOKEN, Service: ha-hdfs:testnnhasvc, Ident: (HDFS_DELEGATION_TOKEN token 37 for rbalam) 16/02/26 10:52:19 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm2 16/02/26 10:52:20 INFO terasort.TeraSort: Generating 10000 using 2 16/02/26 10:52:21 INFO mapreduce.JobSubmitter: number of splits:2 16/02/26 10:52:22 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1456512672399_0001 16/02/26 10:52:22 INFO mapreduce.JobSubmitter: Kind: HDFS_DELEGATION_TOKEN, Service: ha-hdfs:testnnhasvc, Ident: (HDFS_DELEGATION_TOKEN token 37 for rbalam) 16/02/26 10:52:24 INFO impl.YarnClientImpl: Submitted application application_1456512672399_0001 16/02/26 10:52:24 INFO mapreduce.Job: The url to track the job: http://timelineuri:8188/ws/v1/timeline/ 16/02/26 10:52:24 INFO mapreduce.Job: Running job: job_1456512672399_0001 16/02/26 10:52:29 INFO mapreduce.Job: Job job_1456512672399_0001 running in uber mode : false 16/02/26 10:52:29 INFO mapreduce.Job: map 0% reduce 0% 16/02/26 10:52:29 INFO mapreduce.Job: Job job_1456512672399_0001 failed with state FAILED due to: Application application_1456512672399_0001 failed 2 times due to AM Container for appattempt_1456512672399_0001_000002 exited with exitCode: -1000 For more detailed output, check application tracking page:http://timlineserveruri:8088/cluster/app/application_1456512672399_0001Then, click on links to logs of each attempt.
Diagnostics: Application application_1456512672399_0001 initialization failed (exitCode=255) with output: main : command provided 0
main : run as user is xxxxx
main : requested yarn user is xxxxx
Failing this attempt. Failing the application. 16/02/26 10:52:29 INFO mapreduce.Job: Counters: 0
Created 02-26-2016 07:45 PM
@rbalam, your previous problem is resolved "User xxxxx not found Failing this attempt". here the containers are not launching but it should show a reason why, so you have debug yarn log. Usually this problems comes when you have different JAVA versions, classpath is not properly set or directory permission.
Created 02-26-2016 07:52 PM
yes. user not found issue is gone after i created the user on all the nodes. Do you know where I can look for which classpath/jars that has permissions issue?
Created 02-26-2016 08:35 PM
First check the java version and java version path env variable if not the same ten create soft link $JAVA_HOM/bin/java to /usr/bin/java
#java -version
Cross check steps for reference: [root@sandbox ~]# which java
/usr/lib/jvm/java-1.7.0-openjdk.x86_64/bin/java
[root@sandbox ~]# ls -l /usr/bin/java
lrwxrwxrwx 1 root root 22 2014-12-16 18:25 /usr/bin/java -> /etc/alternatives/java
[root@sandbox ~]# ls -l /etc/alternatives/java
lrwxrwxrwx 1 root root 46 2014-12-16 18:25 /etc/alternatives/java -> /usr/lib/jvm/jre-1.7.0-openjdk.x86_64/bin/java
Check the java home path is set properly
vi /etc/hadoop/conf/hadoop-env.sh
Run the simple pi mapreduce job
#yarn jar /usr/hdp/2.3.2.0-2950/hadoop-mapreduce/hadoop-mapreduce-examples.jar pi 1 1
Created 02-26-2016 09:03 PM
@rbalam, if above solution not work then check the yarn-site.xml is set yarn.application.classpath properly. (required lib directory should be exists.
Created 02-26-2016 09:06 PM
@rbalam, third steps should be check the permission of that user to read the classpath directory and hdfs folder
Created 02-04-2020 09:17 AM
Hello
Please how did u add users?
Actually i am using the active directory users and I just add them into Edge node using samba + kerberos
Now I have enabled kerberos on the hadoop hortonworks cluster => I got the same issue as yours
So may I add the same user to all nodes? adduser? which group? how could it be resolved as an AD user?
Thanks
Created 02-04-2020 04:02 PM
One of our members posted a reply on how to add users in the thread you posted a similar question to later the same day.
As this is an older thread which was previously marked 'Solved', you would have a better chance of receiving a resolution by starting a new thread. This will also provide the opportunity for you to provide details specific to your environment about what you did in an attempt to add the relevant user accounts that could aid others in providing a more relevant, accurate answer to your question.
Created 09-08-2016 11:42 AM
I faced the same issue for Kerberos environment. It got resolved after I created the user on all the nodes.
Created 02-26-2016 11:19 PM
jobs are running fine after i added the user to hadoop group on all the nodes .. but i am not sure adding the user account to the hadoop group would be a good idea ..
Created 02-10-2017 06:24 AM
Hi,
I encountered a similar mistake, running spark, the user can not find!Please help me, thank you!
spark-submit \ --class org.apache.spark.examples.SparkPi \ --master yarn-client \ --executor-memory 1G \ --num-executors 1 \ --num-executors 2 \ --driver-memory 1g \ --executor-cores 1 \ --principal kadmin/admin@NGAA.COM \ --keytab /home/test/sparktest/princpal/sparkjob.keytab \ /opt/cloudera/parcels/CDH/lib/spark/lib/spark-examples.jar 12
error messages:
17/02/10 13:54:16 INFO security.UserGroupInformation: Login successful for user kadmin/admin@NGAA.COM using keytab file /home/test/sparktest/princpal/sparkjob.keytab 17/02/10 13:54:16 INFO spark.SparkContext: Running Spark version 1.6.0 17/02/10 13:54:16 INFO spark.SecurityManager: Changing view acls to: root,kadmin 17/02/10 13:54:16 INFO spark.SecurityManager: Changing modify acls to: root,kadmin 17/02/10 13:54:16 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root, kadmin); users with modify permissions: Set(root, kadmin) 17/02/10 13:54:17 INFO util.Utils: Successfully started service 'sparkDriver' on port 56214. 17/02/10 13:54:17 INFO slf4j.Slf4jLogger: Slf4jLogger started 17/02/10 13:54:17 INFO Remoting: Starting remoting 17/02/10 13:54:18 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriverActorSystem@10.10.100.51:40936] 17/02/10 13:54:18 INFO Remoting: Remoting now listens on addresses: [akka.tcp://sparkDriverActorSystem@10.10.100.51:40936] 17/02/10 13:54:18 INFO util.Utils: Successfully started service 'sparkDriverActorSystem' on port 40936. 17/02/10 13:54:18 INFO spark.SparkEnv: Registering MapOutputTracker 17/02/10 13:54:18 INFO spark.SparkEnv: Registering BlockManagerMaster 17/02/10 13:54:18 INFO storage.DiskBlockManager: Created local directory at /tmp/blockmgr-cf37cdde-4eab-4804-b84b-b5f937828aa7 17/02/10 13:54:18 INFO storage.MemoryStore: MemoryStore started with capacity 530.3 MB 17/02/10 13:54:18 INFO spark.SparkEnv: Registering OutputCommitCoordinator 17/02/10 13:54:19 INFO util.Utils: Successfully started service 'SparkUI' on port 4040. 17/02/10 13:54:19 INFO ui.SparkUI: Started SparkUI at http://10.10.100.51:4040 17/02/10 13:54:19 INFO spark.SparkContext: Added JAR file:/opt/cloudera/parcels/CDH/lib/spark/lib/spark-examples.jar at spark://10.10.100.51:56214/jars/spark-examples.jar with timestamp 1486706059601 17/02/10 13:54:19 INFO yarn.Client: Attempting to login to the Kerberos using principal: kadmin/admin@NGAA.COM and keytab: /home/test/sparktest/princpal/sparkjob.keytab 17/02/10 13:54:19 INFO client.RMProxy: Connecting to ResourceManager at hadoop1/10.10.100.51:8032 17/02/10 13:54:20 INFO yarn.Client: Requesting a new application from cluster with 4 NodeManagers 17/02/10 13:54:20 INFO yarn.Client: Verifying our application has not requested more than the maximum memory capability of the cluster (8192 MB per container) 17/02/10 13:54:20 INFO yarn.Client: Will allocate AM container, with 896 MB memory including 384 MB overhead 17/02/10 13:54:20 INFO yarn.Client: Setting up container launch context for our AM 17/02/10 13:54:20 INFO yarn.Client: Setting up the launch environment for our AM container 17/02/10 13:54:21 INFO yarn.Client: Credentials file set to: credentials-79afe260-414b-4df7-8242-3cd1a279dbc7 17/02/10 13:54:21 INFO yarn.YarnSparkHadoopUtil: getting token for namenode: hdfs://hadoop2:8020/user/kadmin/.sparkStaging/application_1486705141135_0002 17/02/10 13:54:21 INFO hdfs.DFSClient: Created HDFS_DELEGATION_TOKEN token 44 for kadmin on 10.10.100.52:8020 17/02/10 13:54:21 INFO yarn.Client: Renewal Interval set to 86400061 17/02/10 13:54:21 INFO yarn.Client: Preparing resources for our AM container 17/02/10 13:54:21 INFO yarn.YarnSparkHadoopUtil: getting token for namenode: hdfs://hadoop2:8020/user/kadmin/.sparkStaging/application_1486705141135_0002 17/02/10 13:54:21 INFO hdfs.DFSClient: Created HDFS_DELEGATION_TOKEN token 45 for kadmin on 10.10.100.52:8020 17/02/10 13:54:22 INFO hive.metastore: Trying to connect to metastore with URI thrift://hadoop1:9083 17/02/10 13:54:22 INFO hive.metastore: Opened a connection to metastore, current connections: 1 17/02/10 13:54:22 INFO hive.metastore: Connected to metastore. 17/02/10 13:54:22 INFO hive.metastore: Closed a connection to metastore, current connections: 0 17/02/10 13:54:23 INFO yarn.Client: To enable the AM to login from keytab, credentials are being copied over to the AM via the YARN Secure Distributed Cache. 17/02/10 13:54:23 INFO yarn.Client: Uploading resource file:/home/test/sparktest/princpal/sparkjob.keytab -> hdfs://hadoop2:8020/user/kadmin/.sparkStaging/application_1486705141135_0002/sparkjob.keytab 17/02/10 13:54:23 INFO yarn.Client: Uploading resource file:/tmp/spark-79d08367-6f8d-4cb3-813e-d450e90a3128/__spark_conf__4615276915023723512.zip -> hdfs://hadoop2:8020/user/kadmin/.sparkStaging/application_1486705141135_0002/__spark_conf__4615276915023723512.zip 17/02/10 13:54:23 INFO spark.SecurityManager: Changing view acls to: root,kadmin 17/02/10 13:54:23 INFO spark.SecurityManager: Changing modify acls to: root,kadmin 17/02/10 13:54:23 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root, kadmin); users with modify permissions: Set(root, kadmin) 17/02/10 13:54:23 INFO yarn.Client: Submitting application 2 to ResourceManager 17/02/10 13:54:23 INFO impl.YarnClientImpl: Submitted application application_1486705141135_0002 17/02/10 13:54:24 INFO yarn.Client: Application report for application_1486705141135_0002 (state: FAILED) 17/02/10 13:54:24 INFO yarn.Client: client token: N/A diagnostics: Application application_1486705141135_0002 failed 2 times due to AM Container for appattempt_1486705141135_0002_000002 exited with exitCode: -1000 For more detailed output, check application tracking page:http://hadoop1:8088/proxy/application_1486705141135_0002/Then, click on links to logs of each attempt. Diagnostics: Application application_1486705141135_0002 initialization failed (exitCode=255) with output: main : command provided 0 main : run as user is kadmin main : requested yarn user is kadmin User kadmin not found Failing this attempt. Failing the application. ApplicationMaster host: N/A ApplicationMaster RPC port: -1 queue: root.users.kadmin start time: 1486706063635 final status: FAILED tracking URL: http://hadoop1:8088/cluster/app/application_1486705141135_0002 user: kadmin 17/02/10 13:54:24 INFO yarn.Client: Deleting staging directory .sparkStaging/application_1486705141135_0002 17/02/10 13:54:24 ERROR spark.SparkContext: Error initializing SparkContext. org.apache.spark.SparkException: Yarn application has already ended! It might have been killed or unable to launch application master. at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.waitForApplication(YarnClientSchedulerBackend.scala:124) at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:64) at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:144) at org.apache.spark.SparkContext.<init>(SparkContext.scala:541) at org.apache.spark.examples.SparkPi$.main(SparkPi.scala:29) at org.apache.spark.examples.SparkPi.main(SparkPi.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) 17/02/10 13:54:25 INFO ui.SparkUI: Stopped Spark web UI at http://10.10.100.51:4040 17/02/10 13:54:25 INFO cluster.YarnClientSchedulerBackend: Shutting down all executors 17/02/10 13:54:25 INFO cluster.YarnClientSchedulerBackend: Asking each executor to shut down 17/02/10 13:54:25 INFO cluster.YarnClientSchedulerBackend: Stopped 17/02/10 13:54:25 INFO spark.MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped! 17/02/10 13:54:25 ERROR util.Utils: Uncaught exception in thread main java.lang.NullPointerException at org.apache.spark.network.shuffle.ExternalShuffleClient.close(ExternalShuffleClient.java:152) at org.apache.spark.storage.BlockManager.stop(BlockManager.scala:1231) at org.apache.spark.SparkEnv.stop(SparkEnv.scala:96) at org.apache.spark.SparkContext$$anonfun$stop$12.apply$mcV$sp(SparkContext.scala:1767) at org.apache.spark.util.Utils$.tryLogNonFatalError(Utils.scala:1230) at org.apache.spark.SparkContext.stop(SparkContext.scala:1766) at org.apache.spark.SparkContext.<init>(SparkContext.scala:613) at org.apache.spark.examples.SparkPi$.main(SparkPi.scala:29) at org.apache.spark.examples.SparkPi.main(SparkPi.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) 17/02/10 13:54:25 INFO spark.SparkContext: Successfully stopped SparkContext Exception in thread "main" org.apache.spark.SparkException: Yarn application has already ended! It might have been killed or unable to launch application master. at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.waitForApplication(YarnClientSchedulerBackend.scala:124) at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:64) at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:144) at org.apache.spark.SparkContext.<init>(SparkContext.scala:541) at org.apache.spark.examples.SparkPi$.main(SparkPi.scala:29) at org.apache.spark.examples.SparkPi.main(SparkPi.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) 17/02/10 13:54:25 INFO storage.DiskBlockManager: Shutdown hook called 17/02/10 13:54:25 INFO util.ShutdownHookManager: Shutdown hook called 17/02/10 13:54:25 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-79d08367-6f8d-4cb3-813e-d450e90a3128/userFiles-58912a50-d060-42ec-8665-7a74c1be9a7b 17/02/10 13:54:25 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-79d08367-6f8d-4cb3-813e-d450e90a3128
Thanks
Created 11-13-2017 03:53 AM
This problem is caused by two reasons: (1) Each node did not add this ### Linux user and added it to the yarn user group. (2) nodemanager container directory permissions are not normal, this is due to the machine partition is not uniform. Solve as follows Execute on each machine ---> useradd -M ### usermod -a -G supergroup ### Finally, check each node machine node node nm directory permissions are the same!
Created 11-13-2017 03:55 AM
@rbalam Please refer to my approach.
Created 01-31-2018 05:30 PM
hi, @yang jifei, were you able to solve your issue? I am having similar problem as well. Could you please help me?