Reply
Highlighted
Contributor
Posts: 36
Registered: ‎01-11-2016

MapReduce job on Kerberized CDH 5.9.0 cluster causes YARN "User xxxxx not found"

Hi,

 

Trying to run the Cloudera supplied "wordcount" job on a CDH 5.9.0 cluster that has been Kerberised....

 

Can anybody provide advice as to what might be the cause ?

 

 

[dreeves@{hostname}]$  kinit dreeves@{obfuscated_realm} -k -t /path/to/keytab

 

[dreeves@{hostname}]$  hadoop jar /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar wordcount /user/dreeves/test_file.txt /user/dreeves/count_of_words_test_file_27012017_1113

 

 

17/01/27 11:13:12 INFO client.RMProxy: Connecting to ResourceManager at {obfuscated_FQDN_of_resource_manager_machine}/10.8.131.79:8032

17/01/27 11:13:13 INFO hdfs.DFSClient: Created token for dreeves: HDFS_DELEGATION_TOKEN owner=dreeves@{obfuscated_realm}, renewer=yarn, realUser=, issueDate=1485475993334, maxDate=1486080793334, sequenceNumber=5, masterKeyId=12 on ha-hdfs:ctx-hp1

17/01/27 11:13:13 INFO security.TokenCache: Got dt for hdfs://ctx-hp1; Kind: HDFS_DELEGATION_TOKEN, Service: ha-hdfs:ctx-hp1, Ident: (token for dreeves: HDFS_DELEGATION_TOKEN owner=dreeves@{obfuscated_realm}, renewer=yarn, realUser=, issueDate=1485475993334, maxDate=1486080793334, sequenceNumber=5, masterKeyId=12)

17/01/27 11:13:13 INFO input.FileInputFormat: Total input paths to process : 1

17/01/27 11:13:13 INFO mapreduce.JobSubmitter: number of splits:1

17/01/27 11:13:14 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1484891099229_0004

17/01/27 11:13:14 INFO mapreduce.JobSubmitter: Kind: HDFS_DELEGATION_TOKEN, Service: ha-hdfs:ctx-hp1, Ident: (token for dreeves: HDFS_DELEGATION_TOKEN owner=dreeves@{obfuscated_realm}, renewer=yarn, realUser=, issueDate=1485475993334, maxDate=1486080793334, sequenceNumber=5, masterKeyId=12)

17/01/27 11:13:14 INFO impl.YarnClientImpl: Submitted application application_1484891099229_0004

17/01/27 11:13:14 INFO mapreduce.Job: The url to track the job: 

http://{obfuscated_FQDN_of_resource_manager_machine}:8088/proxy/application_1484891099229_0004/

17/01/27 11:13:14 INFO mapreduce.Job: Running job: job_1484891099229_0004

17/01/27 11:13:15 INFO mapreduce.Job: Job job_1484891099229_0004 running in uber mode : false

17/01/27 11:13:15 INFO mapreduce.Job:  map 0% reduce 0%

17/01/27 11:13:15 INFO mapreduce.Job: Job job_1484891099229_0004 failed with state FAILED due to: 

Application application_1484891099229_0004 failed 2 times due to 

AM Container for appattempt_1484891099229_0004_000002 exited with  exitCode: -1000

For more detailed output, check application tracking page:

http://{obfuscated_FQDN_of_resource_manager_machine}:8088/proxy/application_1484891099229_0004/

Then, click on links to logs of each attempt.

Diagnostics: Application application_1484891099229_0004 initialization failed (exitCode=255) with output: main : command provided 0

main : run as user is dreeves

main : requested yarn user is dreeves

User dreeves not found

 

Failing this attempt. Failing the application.

17/01/27 11:13:15 INFO mapreduce.Job: Counters: 0

 

 

When I check for the YARN log that corresponds with the job:

 

[dreeves@{hostname}]$  yarn logs -applicationId application_1484891099229_0004


17/01/27 13:12:33 INFO client.RMProxy: Connecting to ResourceManager at {obfuscated_FQDN_of_resource_manager_machine}/10.8.131.79:8032


/tmp/logs/dreeves/logs/application_1484891099229_0004 does not have any log files.

 

 

So I cannot troubleshoot....

 

 

Thanks,

 

Damion.

Posts: 634
Topics: 3
Kudos: 102
Solutions: 66
Registered: ‎08-16-2016

Re: MapReduce job on Kerberized CDH 5.9.0 cluster causes YARN "User xxxxx not found"

The users and groups need to be available on the local OS of all nodes.

This can be through LDAP integration or managed manually.
Contributor
Posts: 36
Registered: ‎01-11-2016

Re: MapReduce job on Kerberized CDH 5.9.0 cluster causes YARN "User xxxxx not found"

Hi,

 

Thanks for the quick response !

 

I have already executed the following command on all CDH Master and Worker nodes (rather than scp the keytab file to all):

 

[root@{obfuscated_hostname}]#  su - dreeves@{obfuscated_realm}

[dreeves@{obfuscated_hostname}]$ ktutil


ktutil:    addent -password -p dreeves@{obfuscated_realm} -k 1 -e RC4-HMAC
Password for dreeves@{obfuscated_realm}:   {obfuscated_password}
ktutil:   wkt dreeves.keytab
ktutil:   exit


[dreeves@{obfuscated_hostname}]$ ls -lrt

-rw------- 1 dreeves@{obfuscated_domain} 1336000512 69 Jan 27 17:01 dreeves.keytab

 

 

 

And my user "dreeves" is created in the MS Active Directory Domain....I can search the OU using "ldapsearch" command and find "dreeves"....

 

Not sure what else I need to do ?

 

 

 

Cheers,

 

Damion.

Posts: 634
Topics: 3
Kudos: 102
Solutions: 66
Registered: ‎08-16-2016

Re: MapReduce job on Kerberized CDH 5.9.0 cluster causes YARN "User xxxxx not found"

Blast it all my other response didn't make it.

The short of it is that Hadoop by default and even with Kerberos is using a shell based group mapper. This means that it still does a group and user look up regardless of the auth mechanism. There is a n LDAP group mapping for Hadoop. I, and Cloudera, do not recommend it.

Either integrate LDAP at the OS level or manage the accounts manually.
Contributor
Posts: 36
Registered: ‎01-11-2016

Re: MapReduce job on Kerberized CDH 5.9.0 cluster causes YARN "User xxxxx not found"

Hi,

 

Sorry, but I'm not sure what you mean....if I can explain what I've done....

 

I have integrated the CDH 5.9.0 cluster into an MS Active Directory 2012 R2 environment using Cloudera Manager (Administration -> Security Wizard -> Kerberos Wizard) and can supply you with 20 x screen shots of my progression through the CM Kerberos Wizard screens.

 

If I login to the AD server and open Server Tools -> "Active Directory Users and Computers", I can see the CDH service principles listed (in the Kerberos Wizard I chose a prefix of "hp1-" and there are about 40 x users appearing).

 

Below is the one for the HDFS user:

 

"hp1-AEczKxdmxb" (hdfs/{obfuscated_fqdn_of_master_hadoop_namenode1})

 

 

If I login to Cloudera Manager, I can see the Kerberos Credentials shown in the CM GUI (Admnistration -> Security -> Kerberos Credentials).

 

These are the credentials for the CDH Services (hdfs, hive, mapred, oozie, sqoop, yarn, solr etc etc) and are tied to the hostname that each of these services runs on (in some cases, where the service is like hdfs, there are 8 x entries, one for each of the hosts).

 

 

Now, in terms of non-CDH service accounts/principals, I have created various users directory in the Active DirectoryServer Tools -> "Active Directory Users and Computers" (like me, dreeves and 4 others who should be able to logon to say an edge node and run MapReduce, Hive and Impala jobs).

 

Each of these non-service users (like my dreeves account) has an entry in Active Directory (but not a local RHEL Linux account on each of the nodes in the CDH cluster).

 

I have shown that various commands from the RHEL command line on an edge node called "ecli001" work, these commands include:

 

su - dreeves@{obfuscated_realm}

ktutil

kinit dreeves@{obfuscated_realm} -k -t my.keytab

 

 

From the same edge node called "ecli001", I can then run an "ldapsearch" command to traverse my OU and find my LDAP entity:

 

/usr/bin/ldapsearch -v -LLL -H ldap://{obfuscated_fqdn_of_active_directory_machine1}:389 -b OU=Users,OU=Prod,OU=Clusters,OU=cdh,DC=cdh,DC={obfuscated_client_dc},DC=com,DC=au -x -D Administrator@{obfuscated_client_realm} -W userPrincipalName=dreeves@{obfuscated_client_realm}

 

This returns various lines of LDAP stuff including the following which indicates to me LDAP is working:

 

distinguishedName: CN=Damion Reeves,OU=Users,OU=Prod,OU=Clusters,OU=cdh,DC=cdh,DC={obfuscated_client_name},DC=com,DC=au

....

....

primaryGroupID: 512

objectSid:: AQUAAAAAAAUVAAAA9sO31/Jk4ane6hzXWgQAAA==

adminCount: 1

accountExpires: 9223372036854775807

logonCount: 13

sAMAccountName: dreeves

sAMAccountType: 805306368

userPrincipalName: dreeves@{obscured_domain}

 

 

The following LDAP command for the CDH services "hdfs", "yarn" and "mapred" also return LDAP information successfully:

 

/usr/bin/ldapsearch -v -LLL -H ldap://{obfuscated_fqdn_of_active_directory_machine1}:389 -b OU=Users,OU=Prod,OU=Clusters,OU=cdh,DC=cdh,DC={obfuscated_client_dc},DC=com,DC=au -x -D Administrator@{obfuscated_client_realm} -W userPrincipalName=hdfs/{obfuscated_clients_whad001_node}@{obfuscated_client_realm}

 

/usr/bin/ldapsearch -v -LLL -H ldap://{obfuscated_fqdn_of_active_directory_machine1}:389 -b OU=Users,OU=Prod,OU=Clusters,OU=cdh,DC=cdh,DC={obfuscated_client_dc},DC=com,DC=au -x -D Administrator@{obfuscated_client_realm} -W userPrincipalName=yarn/{obfuscated_client_whad001_node}@{obfuscated_client_realm}

 

/usr/bin/ldapsearch -v -LLL -H ldap://{obfuscated_fqdn_of_active_directory_machine1}:389 -b OU=Users,OU=Prod,OU=Clusters,OU=cdh,DC=cdh,DC={obfuscated_client_dc},DC=com,DC=au -x -D Administrator@{obfuscated_client_realm} -W userPrincipalName=mapred/{obfuscated_clients_resource_manager_node}@{obfuscated_client_realm}

 

 

 

 

I can successfully run variuos commands using the following services:

 

1)  HDFS commands

 

hdfs dfs -put {fole_name} /user/dreeves/

 

 

2)  Impala Shell 

 

impala-shell

connect {fqdn_of_whad001_where_impalad_is_running};

describe {database"

show {table}

select col1, col2 from {table}

etc

 

2)  Hive "beeline"

 

I can connect using beeline:

 

beeline -u "jdbc:hive2://{TCPIP_of_ecli001}:10000/default;principal=hive/{obfuscated_fqdn_ecli001}@{obfuscated_realm}

 

 

And successfully execue commands like:

 

connect {database}

show tables;

describe {table_name}

 

 

But when I try Hive commands that execute DML (like "select count(col1) from {table_name}") they fail with:

 

INFO  : Kind: HDFS_DELEGATION_TOKEN, Service: ha-hdfs:{obfuscated_hdfs_namespace}, Ident: (token for dreeves: HDFS_DELEGATION_TOKEN owner=dreeves, renewer=yarn, realUser=hive/{obfuscated_machine_name}@{obfuscated_realm}, issueDate=1485494600933, maxDate=1486099400933, sequenceNumber=10, masterKeyId=12)

INFO  : Kind: HIVE_DELEGATION_TOKEN, Service: HiveServer2ImpersonationToken, Ident: 00 07 64 72 65 65 76 65 73 07 64 72 65 65 76 65 73 42 68 69 76 65 2f 63 74 78 2d 68 70 31 2d 65 63 6c 69 30 30 31 2d 61 77 73 2d 73 79 64 2d 62 2e 63 64 68 2e 63 61 6c 74 65 78 2e 63 6f 6d 2e 61 75 40 43 44 48 2e 43 41 4c 54 45 58 2e 43 4f 4d 2e 41 55 8a 01 59 de 60 8f ca 8a 01 5a 02 6d 13 ca 06 07

INFO  : The url to track the job: http://{obfuscated_resource_manager_url}:8088/proxy/application_1484891099229_0008/

INFO  : Starting Job = job_1484891099229_0008, Tracking URL = http://{obfuscated_resource_manager_url}:8088/proxy/application_1484891099229_0008/

INFO  : Kill Command = /opt/cloudera/parcels/CDH-5.9.0-1.cdh5.9.0.p0.23/lib/hadoop/bin/hadoop job  -kill job_1484891099229_0008

INFO  : Hadoop job information for Stage-1: number of mappers: 0; number of reducers: 0

INFO  : 2017-01-27 16:23:23,125 Stage-1 map = 0%,  reduce = 0%

ERROR : Ended Job = job_1484891099229_0008 with errors

ERROR : FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask

INFO  : MapReduce Jobs Launched:

INFO  : Stage-Stage-1:  HDFS Read: 0 HDFS Write: 0 FAIL

INFO  : Total MapReduce CPU Time Spent: 0 msec

INFO  : Completed executing command(queryId=hive_20170127162323_aa4b3477-6312-4bec-a096-709d9aa26466); Time taken: 2.467 seconds

Error: Error while processing statement: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask (state=08S01,code=2)

 

 

MapReduce commands also don't work (the error for the Cloudera wordcount job I have already pasted into this community topic as the initial issue).

 

If I also try and review the YARN log for these speicific attempts, it fails with:

 

yarn logs -applicationId application_1484891099229_0008


17/01/30 11:19:23 INFO client.RMProxy: Connecting to ResourceManager at {obfuscated_resource_manage_node}/{obfuscated_TCPIP_number_of_rm}:8032


/tmp/logs/dreeves/logs/application_1484891099229_0008 does not have any log files.

 

 

 

Any advice and help would be greatly appreciated.

 

 

Thanks,

 

Damion.

 

 

Posts: 634
Topics: 3
Kudos: 102
Solutions: 66
Registered: ‎08-16-2016

Re: MapReduce job on Kerberized CDH 5.9.0 cluster causes YARN "User xxxxx not found"

Go to the node that the mapper or reducer failed and run 'id dreeves'. This need to return a user. If it does not the worker is not able to operate as the user.

I don't know why exactly the other commands worked. Did the correct ownership get applied to the file or does it just show a UID and gid. I have seen that in the case with a user is present on the client, edge node, and Namenode but are different.
Contributor
Posts: 36
Registered: ‎01-11-2016

Re: MapReduce job on Kerberized CDH 5.9.0 cluster causes YARN "User xxxxx not found"

Hi,

 

I was under the impression that we didn't need "local" RHEL linux user on any nodes if we were using AD/LDAP ?

 

As you can see below, local user accounts don't exist:

 

[root@{obfuscated_client_edge_nodename}~]#   id dreeves
id: dreeves: no such user

 

 

But AD accounts do (you can also see there doesn't appear to be any AD-to-Linux GID/UID mappings either because I haven't worked out how to do this in AD):


[root@{obfuscated_client_edge_nodename}~]]#  id dreeves@{obfuscated_realm}


uid=33601114(dreeves@{obfuscated_domain}) gid=33600512 groups=33600512,33601105,33601106,33601107,33601109,33601111,33601112,33601113,33600513,33600572,33601197(datameer_users@{obfuscated_domain}),33601110(prod_hue_users@{obfuscated_domain}),33601108(prod_cm_users@{obfuscated_domain})

 

 

Thanks,

 

Damion.

 

Posts: 634
Topics: 3
Kudos: 102
Solutions: 66
Registered: ‎08-16-2016

Re: MapReduce job on Kerberized CDH 5.9.0 cluster causes YARN "User xxxxx not found"

It is a common misconception. Unless you config Hadoop to use LDAP to look up users it will use the default shell based lookup. So you still need local users or implement LDAP for RHEL. It looks like you have that pariltially in place. UID and gids are in place in AD. I don't know how you did it but you need to see how you can present the account without the domain name.
Contributor
Posts: 36
Registered: ‎01-11-2016

Re: MapReduce job on Kerberized CDH 5.9.0 cluster causes YARN "User xxxxx not found"

[ Edited ]

OK, I have used the following Cloudera Doco for the entire process, with an Active Directory setup, not direct to an MIT KDC:

 

http://www.cloudera.com/documentation/enterprise/latest/topics/cm_sg_s4_kerb_wizard.html

http://www.cloudera.com/documentation/enterprise/latest/topics/cm_sg_s5_hdfs_principal.html

http://www.cloudera.com/documentation/enterprise/latest/topics/cm_sg_s6_user_principals.html

 

I've fully completed the Kerberos Wizard (no issues at all).

 

And I've just returned to Step 7 here:

 

http://www.cloudera.com/documentation/enterprise/latest/topics/cm_sg_s7_prepare_cluster.html

 

....to make sure I correctly added the AD user "dreeves@{obfuscated_realm}".

 

 

I've also now added a local RHEL user to all 17 x nodes in the cluster:

 

      useradd -u 1005 -s /bin/bash -G contexti -d /home/dreeves dreeves

 

And have logged in to a few of the nodes as "dreeves".

 

However, I cannot see how the mapping between the AD user "dreeves@{obfuscated_realm}" and the local user "dreeves" occurs ?

 

The Cloudera doco is very poor in this respect....it simply jumps to this page, without giving any advice on how to map AD user accounts GID/UID to local RHEL accounts GID/UID and converting (or dropping the @realm)....

 

http://www.cloudera.com/documentation/enterprise/latest/topics/cm_sg_s8_verify_kerb.html

 

 

I am able to verify kerberos for the "dreeves@{obfuscated_realm}" (as previously shown):

 

kinit dreeves@{obfuscated_realm} -k -t dreeves.keytab

hdfs dfs -put test_file.txt /user/dreeves/

hdfs dfs -ls /user/dreeves/

 

 

But I am not able to do this for the "dreeves" user:

 

[root@{fqdn_ecli001} ~]#  sudo su - dreeves

 

[dreeves@{fqdn_ecli001} ~]$ id
uid=1005(dreeves) gid=1005(dreeves) groups=1005(dreeves),5000({obfuscated_group})


[dreeves@{fqdn_ecli001} ~]$ ktutil

 

    ktutil: addent -password -p dreeves -k 1 -e RC4-HMAC
    Password for dreeves@{obfuscated_realm}:
    ktutil: wkt dreeves_local.keytab
    ktutil: exit


[dreeves@{fqdn_ecli001}]$ kinit dreeves -k -t dreeves_local.keytab
kinit: Preauthentication failed while getting initial credentials


[dreeves@{fqdn_ecli001}]$ kinit dreeves@{obfuscated_realm} -k -t dreeves_local.keytab
kinit: Preauthentication failed while getting initial credentials


[dreeves@{fqdn_ecli001}]$ kinit dreeves
Password for dreeves@{obfuscated_realm}:
kinit: Preauthentication failed while getting initial credentials


[dreeves@{fqdn_ecli001}]$ kinit dreeves@{obfuscated_realm}
Password for dreeves@{obfuscated_realm}:
kinit: Preauthentication failed while getting initial credentials


[dreeves@{fqdn_ecli001}]$ klist
klist: No credentials cache found (filename: /tmp/krb5cc_1005)

 

 

 

Sorry to be a pain in the.....but is there any in-depth doco in how ?

 

 

Thanks,

 

Damion.

Posts: 634
Topics: 3
Kudos: 102
Solutions: 66
Registered: ‎08-16-2016

Re: MapReduce job on Kerberized CDH 5.9.0 cluster causes YARN "User xxxxx not found"

What is default_realm set to in your krb5.conf?

What is in the new keytab?

klist -kt dreeves_local.keytab

And try kinit as follows:

kinit -C username or kinit -C username@domain

From the Cloudera doc you linked.

"Make sure all hosts in the cluster have a Linux user account with the same name as the first component of that user's principal name. For example, the Linux account joe should exist on every box if the user's principal name is joe@YOUR-REALM.COM. You can use LDAP for this step if it is available in your organization."

So if your principal and AD account is dreeves@REALM.COM, you need to ensure that the account dreeves exist on all nodes, either using a local Linux account or using LDAP integration. This account also needs to have the same UID/GID across all nodes. You have this in place now for dreeves so we just need to workout authenticating using the with kinit and you should be good. The output of the above commands should help. The expectation is that either kinit command works and I suspect that a non-valid principal or no principals were added to the keytab file. You could use the original keytab you made that works but try it under the dreeves account you made.
Announcements