Created 02-08-2016 10:34 AM
Say users are allowed to access a cluster from the edge node of a cluster. If the user wants to run jobs on the cluster, does the user should have his account on all the nodes of the cluster or just having an account on the edge node is enough?
Created 02-09-2016 02:30 AM
No. User should not have account on all the nodes of the cluster. He should only have account on edge node.
For a new user there are 2 types are directories we need to create before the user access the cluster. 1- User home directory [directory created on Linux Filesystem ie. /home/<username>] 2- User HDFS directory [directory created on HDFS filesystem ie. /user/<username>]
As per neeraj, you only need to create HDFS home directory[ie. /user/<username>] on edge node. You can still run jobs with the new user on cluster, even if you havent created his home directory in linux.
==============
Below are 2 scenarios -
a. I added new user on edge node using command - #useradd <username> Before launching job on cluster, i need to create hdfs directory for user #sudo -u hdfs hadoop fs -mkdir </user/{username}> #sudo -u hdfs hadoop fs chown -R <username>:<grp_name> </user/{username}>
b. If the user is coming from ldap server, then you only need to make your edge node as ldap client and create a directory in HDFS using below command -
#sudo -u hdfs hadoop fs -mkdir </user/{username}> #sudo -u hdfs hadoop fs chown -R <username>:<grp_name> </user/{username}>
Let me know if this clears, what you are looking for.
Created 02-17-2016 07:30 AM
hi @ARUNKUMAR RAMASAMY, @Sagar Shimpi I verified with removing user "vgadade" from datanode, please find below output on secure hadoop cluster
Here is sample job output..
16/02/17 01:47:24 INFO mapreduce.Job: Job job_1455179809801_0007 running in uber mode : false
16/02/17 01:47:24 INFO mapreduce.Job: map 0% reduce 0%
16/02/17 01:47:26 INFO mapreduce.Job: Task Id : attempt_1455179809801_0007_m_000001_0, Status : FAILED
Application application_1455179809801_0007 initialization failed (exitCode=255) with output: main : command provided 0
main : run as user is vgadade
main : requested yarn user is vgadade
User vgadade not found
16/02/17 01:47:29 INFO mapreduce.Job: Task Id : attempt_1455179809801_0007_m_000001_1, Status : FAILED
Application application_1455179809801_0007 initialization failed (exitCode=255) with output: main : command provided 0
main : run as user is vgadade
main : requested yarn user is vgadade
User vgadade not found
16/02/17 01:47:30 INFO mapreduce.Job: map 50% reduce 0%
16/02/17 01:47:32 INFO mapreduce.Job: Task Id : attempt_1455179809801_0007_m_000001_2, Status : FAILED
Application application_1455179809801_0007 initialization failed (exitCode=255) with output: main : command provided 0
main : run as user is vgadade
main : requested yarn user is vgadade
User vgadade not found
16/02/17 01:47:37 INFO mapreduce.Job: map 100% reduce 0%
16/02/17 01:47:43 INFO mapreduce.Job: map 100% reduce 100%
16/02/17 01:47:44 INFO mapreduce.Job: Job job_1455179809801_0007 completed successfully
Job Counters
Failed map tasks=3
Launched map tasks=5
Launched reduce tasks=1
Other local map tasks=3
Data-local map tasks=1
Rack-local map tasks=1
Job Finished in 28.409 seconds
Nodemanager log out file ouput
2016-02-17 01:47:24,944 WARN org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exit code from container container_e21_1455179809801_0007_01_000002 startLocalizer is : 255
java.io.IOException: Application application_1455179809801_0007 initialization failed (exitCode=255) with output: main : command provided 0
main : run as user is vgadade
main : requested yarn user is vgadade
User vgadade not found
at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.startLocalizer(LinuxContainerExecutor.java:262)
2016-02-17 01:47:24,946 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_e21_1455179809801_0007_01_000002 transitioned from LOCALIZING to LOCALIZATION_FAILED
vgadade user added again on datanode and executed job ( container is successfully launched on previous Data node)
Here is sample job output..
16/02/17 01:55:46 INFO mapreduce.Job: Running job: job_1455179809801_0008
16/02/17 01:55:52 INFO mapreduce.Job: Job job_1455179809801_0008 running in uber mode : false
16/02/17 01:55:52 INFO mapreduce.Job: map 0% reduce 0%
16/02/17 01:55:58 INFO mapreduce.Job: map 100% reduce 0%
16/02/17 01:56:04 INFO mapreduce.Job: map 100% reduce 100%
16/02/17 01:56:04 INFO mapreduce.Job: Job job_1455179809801_0008 completed successfully
16/02/17 01:56:05 INFO mapreduce.Job: Counters: 50
Job Counters
Launched map tasks=2
Launched reduce tasks=1
Data-local map tasks=1
Rack-local map tasks=1
Job Finished in 20.333 seconds
Created 02-18-2016 05:14 AM
Thanks @Vikas Gadade. It helped.