Support Questions
Find answers, ask questions, and share your expertise

Configuring kerberos for YARN with Isilon

Hello!

 

I'm looking for some guidance on what additional security configurations need adding/updating to enable YARN jobs to run against remote Isilon hdfs storage.

 

The cluster and Isilon are using AD kerberos authentication, I can access the file system with kerberos users but can't execute sample jobs. Pretty sure this is related to incorrect token delegation for YARN, but it's unclear which security settings and which values need updating to resolve this in yarn-site.xml, mapred-site.xml and maybe hdfs-site.xml with the Isilon in the mix. 

 

 

bash-4.1$ kinit
Password for cloudera@FOO.COM:
bash-4.1$ hadoop jar /opt/cloudera/parcels/CDH/lib/hadoop-0.20-mapreduce/hadoop-examples-2.3.0-mr1-cdh5.1.3.jar pi 10 10
Number of Maps = 10
Samples per Map = 10
Wrote input for Map #0
Wrote input for Map #1
Wrote input for Map #2
Wrote input for Map #3
Wrote input for Map #4
Wrote input for Map #5
Wrote input for Map #6
Wrote input for Map #7
Wrote input for Map #8
Wrote input for Map #9
Starting Job
15/09/22 15:35:26 INFO client.RMProxy: Connecting to ResourceManager at cdhcm.foo.com/172.16.201.100:8032
15/09/22 15:35:26 INFO hdfs.DFSClient: Created HDFS_DELEGATION_TOKEN token 57 for cloudera on 172.16.201.90:8020
15/09/22 15:35:27 INFO security.TokenCache: Got dt for hdfs://moby2.foo.com:8020; Kind: HDFS_DELEGATION_TOKEN, Service: 172.16.201.90:8020, Ident: (HDFS_DELEGATION_TOKEN token 57 for cloudera)
15/09/22 15:35:27 INFO input.FileInputFormat: Total input paths to process : 10
15/09/22 15:35:27 INFO mapreduce.JobSubmitter: number of splits:10
15/09/22 15:35:27 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1442878984264_0003
15/09/22 15:35:27 INFO mapreduce.JobSubmitter: Kind: HDFS_DELEGATION_TOKEN, Service: 172.16.201.90:8020, Ident: (HDFS_DELEGATION_TOKEN token 57 for cloudera)
15/09/22 15:35:28 INFO impl.YarnClientImpl: Submitted application application_1442878984264_0003
15/09/22 15:35:28 INFO mapreduce.Job: The url to track the job: http://cdhcm.foo.com:8088/proxy/application_1442878984264_0003/
15/09/22 15:35:28 INFO mapreduce.Job: Running job: job_1442878984264_0003
15/09/22 15:35:48 INFO mapreduce.Job: Job job_1442878984264_0003 running in uber mode : false
15/09/22 15:35:48 INFO mapreduce.Job: map 0% reduce 0%
15/09/22 15:35:48 INFO mapreduce.Job: Job job_1442878984264_0003 failed with state FAILED due to: Application application_1442878984264_0003 failed 2 times due to AM Container for appattempt_1442878984264_0003_000002 exited with exitCode: -1000 due to: Failed on local exception: java.io.IOException: org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS]; Host Details : local host is: "cdhcm/172.16.201.100"; destination host is: "moby2.foo.com":8020;
.Failing this attempt.. Failing the application.
15/09/22 15:35:48 INFO mapreduce.Job: Counters: 0
Job Finished in 21.727 seconds
java.io.FileNotFoundException: File does not exist: hdfs://moby2.foo.com:8020/user/cloudera/QuasiMonteCarlo_1442950523487_605654960/out/reduce-out
at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1128)
at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1120)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1120)
at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1749)
at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1773)
at org.apache.hadoop.examples.QuasiMonteCarlo.estimatePi(QuasiMonteCarlo.java:314)
at org.apache.hadoop.examples.QuasiMonteCarlo.run(QuasiMonteCarlo.java:354)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.examples.QuasiMonteCarlo.main(QuasiMonteCarlo.java:363)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:72)
at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144)
at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:74)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)

 

 

Thanks!

1 ACCEPTED SOLUTION

Accepted Solutions

Yes, i was able to resolve this, it appeared to be related to the TTL's on the Isilon SmartConnect Zone.

 

Because the Isilon SCZone SSIP FQDN is how the compute cluster is connecting to Isilon and it is "load balancing" connections to different nodes in the Isilon cluster. If your TTL is set too low on the pool when kerberos ticket exchanges are occuring the 'kerberos exchange' gets sent to different nodes in the Isilon cluster because of the low TTL on the SCZone pool, clearly this breaks the kerberos authentication as you are not maintaining a consistent connection to a single host to complete the kerberos authentication.

 

The default TTL on the pool is zero(0), I bumped this to 60 and this resolved this issue.

 

 

# isi networks list pools --v

 

subnet0:cloudera
In Subnet: subnet0
Allocation: Dynamic
Ranges: 1
172.16.201.90-172.16.201.93
Pool Membership: 1
1:ext-1 (up)
Aggregation Mode: Link Aggregation Control Protocol (LACP)
Access Zone: cloudera (6)
SmartConnect:
Suspended Nodes : None
Auto Unsuspend ... 0
Zone : moby2.foo.com
Time to Live : 0
Service Subnet : subnet0
Connection Policy: Round Robin

 

 

# isi networks modify pool subnet0:cloudera --ttl=60

 

subnet0:cloudera
In Subnet: subnet0
Allocation: Dynamic
Ranges: 1
172.16.201.90-172.16.201.93
Pool Membership: 1
1:ext-1 (up)
Aggregation Mode: Link Aggregation Control Protocol (LACP)
Access Zone: cloudera (6)
SmartConnect:
Suspended Nodes : None
Auto Unsuspend ... 0
Zone : moby2.foo.com
Time to Live : 60
Service Subnet : subnet0
Connection Policy: Round Robin

 

 

 

successful kerberozied yarn job

 

bash-4.1$ kinit
Password for cloudera@FOO.COM:
bash-4.1$ yarn jar /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar teragen 10000 /teragenOUT
16/01/11 11:51:32 INFO client.RMProxy: Connecting to ResourceManager at cdhcm.foo.com/172.16.201.100:8032
16/01/11 11:51:32 INFO hdfs.DFSClient: Created HDFS_DELEGATION_TOKEN token 69 for cloudera on 172.16.201.93:8020
16/01/11 11:51:32 INFO security.TokenCache: Got dt for hdfs://moby2.foo.com:8020; Kind: HDFS_DELEGATION_TOKEN, Service: 172.16.201.93:8020, Ident: (HDFS_DELEGATION_TOKEN token 69 for cloudera)
16/01/11 11:51:33 INFO terasort.TeraSort: Generating 10000 using 2
16/01/11 11:51:33 INFO mapreduce.JobSubmitter: number of splits:2
16/01/11 11:51:33 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1448999039760_0006
16/01/11 11:51:33 INFO mapreduce.JobSubmitter: Kind: HDFS_DELEGATION_TOKEN, Service: 172.16.201.93:8020, Ident: (HDFS_DELEGATION_TOKEN toke n 69 for cloudera)
16/01/11 11:51:35 INFO impl.YarnClientImpl: Submitted application application_1448999039760_0006
16/01/11 11:51:35 INFO mapreduce.Job: The url to track the job: http://cdhcm.foo.com:8088/proxy/application_1448999039760_0006/
16/01/11 11:51:35 INFO mapreduce.Job: Running job: job_1448999039760_0006
16/01/11 11:51:52 INFO mapreduce.Job: Job job_1448999039760_0006 running in uber mode : false
16/01/11 11:51:52 INFO mapreduce.Job: map 0% reduce 0%
16/01/11 11:51:59 INFO mapreduce.Job: map 50% reduce 0%
16/01/11 11:52:06 INFO mapreduce.Job: map 100% reduce 0%
16/01/11 11:52:06 INFO mapreduce.Job: Job job_1448999039760_0006 completed successfully
16/01/11 11:52:06 INFO mapreduce.Job: Counters: 31
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=223314
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=164
HDFS: Number of bytes written=1000000
HDFS: Number of read operations=8
HDFS: Number of large read operations=0
HDFS: Number of write operations=4
Job Counters
Launched map tasks=2
Other local map tasks=2
Total time spent by all maps in occupied slots (ms)=11069
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=11069
Total vcore-seconds taken by all map tasks=11069
Total megabyte-seconds taken by all map tasks=11334656
Map-Reduce Framework
Map input records=10000
Map output records=10000
Input split bytes=164
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=123
CPU time spent (ms)=1840
Physical memory (bytes) snapshot=327118848
Virtual memory (bytes) snapshot=3071070208
Total committed heap usage (bytes)=379584512
org.apache.hadoop.examples.terasort.TeraGen$Counters
CHECKSUM=21555350172850
File Input Format Counters
Bytes Read=0
File Output Format Counters
Bytes Written=1000000

View solution in original post

9 REPLIES 9

Master Guru

I beleieve the Isilon configuration is valid, i was able to run and execute jobs prior to adding kerberos to cloudera.

 

The job is writing files to the Isilon file system so this isn't a file system permission issue.

 

 

 

The user running the job is an AD user 'cloudera' and i appear to be able to get the delegation for user 'yarn' but i see think i have some misconfiguration around principals somewhere.

 

15/09/30 11:25:48 INFO client.RMProxy: Connecting to ResourceManager at cdhcm.foo.com/172.16.201.100:8032
15/09/30 11:25:48 INFO hdfs.DFSClient: Created HDFS_DELEGATION_TOKEN token 60 for yarn on 172.16.201.90:8020
15/09/30 11:25:48 INFO security.TokenCache: Got dt for hdfs://moby2.foo.com:8020; Kind: HDFS_DELEGATION_TOKEN, Service: 172.16.201.90:8020, Ident: (HDFS_DELEGATION_TOKEN token 60 for yarn)
15/09/30 11:25:48 INFO input.FileInputFormat: Total input paths to process : 10
15/09/30 11:25:48 INFO mapreduce.JobSubmitter: number of splits:10
15/09/30 11:25:48 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1442878984264_0006
15/09/30 11:25:48 INFO mapreduce.JobSubmitter: Kind: HDFS_DELEGATION_TOKEN, Service: 172.16.201.90:8020, Ident: (HDFS_DELEGATION_TOKEN token 60 for yarn)
15/09/30 11:25:49 INFO impl.YarnClientImpl: Submitted application application_1442878984264_0006
15/09/30 11:25:49 INFO mapreduce.Job: The url to track the job: http://cdhcm.foo.com:8088/proxy/application_1442878984264_0006/
15/09/30 11:25:49 INFO mapreduce.Job: Running job: job_1442878984264_0006
15/09/30 11:26:12 INFO mapreduce.Job: Job job_1442878984264_0006 running in uber mode : false
15/09/30 11:26:12 INFO mapreduce.Job: map 0% reduce 0%
15/09/30 11:26:12 INFO mapreduce.Job: Job job_1442878984264_0006 failed with state FAILED due to: Application application_1442878984264_0006 failed 2 times due to AM Container for appattempt_1442878984264_0006_000002 exited with exitCode: 1 due to: Exception from container-launch:
org.apache.hadoop.util.Shell$ExitCodeException:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:511)
at org.apache.hadoop.util.Shell.run(Shell.java:424)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:656)
at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:279)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:300)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:81)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

main : command provided 1
main : user is yarn
main : requested yarn user is yarn

 

 

 

As an fyi, the isilon cluster name is moby1.foo.com but we are accessing the isilon cluster via a second smartconnect zone moby2.foo.com

 

the spn's from the cluster are as follows.

 

 

moby1-1# isi auth ads spn list --domain=foo.com
SPNs registered for MOBY1$:
hdfs/moby1.foo.com
nfs/moby2.foo.com
hdfs/moby2.foo.com
HOST/moby2.foo.com
HOST/moby2
HOST/moby.foo.com
HOST/moby
HOST/pivotal-moby1.foo.com
HOST/pivotal-moby1
HOST/moby1
HOST/moby1.foo.com

 

 

 

Thanks for taking a look

 

Explorer

Russ,

 

have you ever got this issue reloved? We are facing exactly the same problem with PHD distribution.

Could you share, please, how, if ever, you solved this.

 

Regards,

Igor Kiselev.

Yes, i was able to resolve this, it appeared to be related to the TTL's on the Isilon SmartConnect Zone.

 

Because the Isilon SCZone SSIP FQDN is how the compute cluster is connecting to Isilon and it is "load balancing" connections to different nodes in the Isilon cluster. If your TTL is set too low on the pool when kerberos ticket exchanges are occuring the 'kerberos exchange' gets sent to different nodes in the Isilon cluster because of the low TTL on the SCZone pool, clearly this breaks the kerberos authentication as you are not maintaining a consistent connection to a single host to complete the kerberos authentication.

 

The default TTL on the pool is zero(0), I bumped this to 60 and this resolved this issue.

 

 

# isi networks list pools --v

 

subnet0:cloudera
In Subnet: subnet0
Allocation: Dynamic
Ranges: 1
172.16.201.90-172.16.201.93
Pool Membership: 1
1:ext-1 (up)
Aggregation Mode: Link Aggregation Control Protocol (LACP)
Access Zone: cloudera (6)
SmartConnect:
Suspended Nodes : None
Auto Unsuspend ... 0
Zone : moby2.foo.com
Time to Live : 0
Service Subnet : subnet0
Connection Policy: Round Robin

 

 

# isi networks modify pool subnet0:cloudera --ttl=60

 

subnet0:cloudera
In Subnet: subnet0
Allocation: Dynamic
Ranges: 1
172.16.201.90-172.16.201.93
Pool Membership: 1
1:ext-1 (up)
Aggregation Mode: Link Aggregation Control Protocol (LACP)
Access Zone: cloudera (6)
SmartConnect:
Suspended Nodes : None
Auto Unsuspend ... 0
Zone : moby2.foo.com
Time to Live : 60
Service Subnet : subnet0
Connection Policy: Round Robin

 

 

 

successful kerberozied yarn job

 

bash-4.1$ kinit
Password for cloudera@FOO.COM:
bash-4.1$ yarn jar /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar teragen 10000 /teragenOUT
16/01/11 11:51:32 INFO client.RMProxy: Connecting to ResourceManager at cdhcm.foo.com/172.16.201.100:8032
16/01/11 11:51:32 INFO hdfs.DFSClient: Created HDFS_DELEGATION_TOKEN token 69 for cloudera on 172.16.201.93:8020
16/01/11 11:51:32 INFO security.TokenCache: Got dt for hdfs://moby2.foo.com:8020; Kind: HDFS_DELEGATION_TOKEN, Service: 172.16.201.93:8020, Ident: (HDFS_DELEGATION_TOKEN token 69 for cloudera)
16/01/11 11:51:33 INFO terasort.TeraSort: Generating 10000 using 2
16/01/11 11:51:33 INFO mapreduce.JobSubmitter: number of splits:2
16/01/11 11:51:33 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1448999039760_0006
16/01/11 11:51:33 INFO mapreduce.JobSubmitter: Kind: HDFS_DELEGATION_TOKEN, Service: 172.16.201.93:8020, Ident: (HDFS_DELEGATION_TOKEN toke n 69 for cloudera)
16/01/11 11:51:35 INFO impl.YarnClientImpl: Submitted application application_1448999039760_0006
16/01/11 11:51:35 INFO mapreduce.Job: The url to track the job: http://cdhcm.foo.com:8088/proxy/application_1448999039760_0006/
16/01/11 11:51:35 INFO mapreduce.Job: Running job: job_1448999039760_0006
16/01/11 11:51:52 INFO mapreduce.Job: Job job_1448999039760_0006 running in uber mode : false
16/01/11 11:51:52 INFO mapreduce.Job: map 0% reduce 0%
16/01/11 11:51:59 INFO mapreduce.Job: map 50% reduce 0%
16/01/11 11:52:06 INFO mapreduce.Job: map 100% reduce 0%
16/01/11 11:52:06 INFO mapreduce.Job: Job job_1448999039760_0006 completed successfully
16/01/11 11:52:06 INFO mapreduce.Job: Counters: 31
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=223314
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=164
HDFS: Number of bytes written=1000000
HDFS: Number of read operations=8
HDFS: Number of large read operations=0
HDFS: Number of write operations=4
Job Counters
Launched map tasks=2
Other local map tasks=2
Total time spent by all maps in occupied slots (ms)=11069
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=11069
Total vcore-seconds taken by all map tasks=11069
Total megabyte-seconds taken by all map tasks=11334656
Map-Reduce Framework
Map input records=10000
Map output records=10000
Input split bytes=164
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=123
CPU time spent (ms)=1840
Physical memory (bytes) snapshot=327118848
Virtual memory (bytes) snapshot=3071070208
Total committed heap usage (bytes)=379584512
org.apache.hadoop.examples.terasort.TeraGen$Counters
CHECKSUM=21555350172850
File Input Format Counters
Bytes Read=0
File Output Format Counters
Bytes Written=1000000

View solution in original post

Explorer

Russ,

 

thank you so much for your help. Unfortunately it did not work in my case.

Could, please, tell me, when and if you have time, which principals did you create on Isilon?

Did you go with recommended or you had to create something special?

I will really, really appreciate your response and help.

If one googles this problem, one would find that you seems to be the only person who made it to work.

And we are getting pretty desperate over here.

 

Regards,

Igor Kiselev.

Isilon currently only supports the hdfs SPN so you just need to validate the hdfs/<realm fqdn>  for kerberos access, this is not created automatically and needs manually creating.

 

 

 

moby1-1# isi auth ads spn list --domain=foo.com
SPNs registered for MOBY1$:
hdfs/moby2.foo.com
HOST/moby2
HOST/moby2.foo.com

 

moby2.foo.com is my Smartconnect Zone name 

 

hdfs-site.xml

<property>
<name>dfs.namenode.kerberos.principal</name>
<value>hdfs/moby2.foo.com@FOO.COM</value>
</property>


<property>
<name>dfs.datanode.kerberos.principal</name>
<value>hdfs/moby2.foo.com@FOO.COM</value>
</property>

 

 

yarn-site.xml

 

<property>
<name>dfs.namenode.kerberos.principal</name>
<value>hdfs/moby2.foo.com@FOO.com</value>
</property>
<property>
<name>dfs.datanode.kerberos.principal</name>
<value>hdfs/moby2.foo.com@FOO.com</value>
</property>

 

 

 

 

 

moby1-1# isi zone zones view --zone=cloudera
Name: cloudera
Path: /ifs/cloudera
Cache Size: 9.54M
Map Untrusted:
Auth Providers: lsa-activedirectory-provider:FOO.COM, lsa-ldap-provider:foo_ldap_AD
NetBIOS Name:
All Auth Providers: No
User Mapping Rules: -
Home Directory Umask: 0077
Skeleton Directory: /usr/share/skel
Audit Success: create, delete, rename, set_security, close
Audit Failure: create, delete, rename, set_security, close
HDFS Authentication: kerberos_only
HDFS Root Directory: /ifs/cloudera/hdfs
WebHDFS Enabled: Yes
HDFS Ambari Server:
HDFS Ambari Namenode:
Syslog Forwarding Enabled: No
Syslog Audit Events: create, delete, rename, set_security
Zone ID: 6

 

 

keep me posted, it took me a while to work through this.

 

russ

Explorer

Russ,

 

thank you very much for your help. We got it working.

It was hadoop.security.token.service.use_ip property value.

Isilon documentation explicitely says to set it to false and this

breaks mapred job. We discovered it by accident.

Somebody forgot it to set on one cluster and mapred worked on this cluster,

since default is true.

 

Regards,

Igor Kiselev.

Great news!

 

Can you let me know which Isilon document you found this in?

 

 

Thanks!

russ

 

 

Explorer

docu56048_OneFS-7.2-CLI-Administration-Guide.pdf

 

Configure HDFS authentication properties on the Hadoop client
If you want clients running Hadoop 2.2 and later to connect to an access zone through
Kerberos, you must make some modifications to the core-site.xml and hdfs-
site.xml files on the Hadoop clients.
Before you begin
Kerberos must be set as the HDFS authentication method and a Kerberos authentication
provider must be configured on the cluster.
Procedure
1. Go to the $HADOOP_CONF directory on your Hadoop client.
2. Open the core-site.xml file in a text editor.
3. Set the value of the hadoop.security.token.service.use_ip property to false as shown
in the following example:
<property>
<name>hadoop.security.token.service.use_ip</name>
<value>false</value>
</property>
4. Save and close the core-site.xml file.
5. Open the hdfs-site.xml file in a text editor.
6. Set the value of the dfs.namenode.kerberos.principal.pattern property to the Kerberos
realm as shown in the following example:
<property>
<name>dfs.namenode.kerberos.principal.pattern</name>
<value>hdfs/*@storage.company.com</value>
</property>
7. Save and close the hdfs-site.xml file.