28728
DISCUSSIONS
101729
MEMBERS
3157
ARTICLES
Created 02-22-2017 09:05 PM
We are facing issues while configuration of TLS for HDFS/YARN. We were able to isolate the problem to enabling TLS at HDFS causing MR jobs to fail. TLS enabled for HDFS is strongly recommended by Cloudera for HDFS encryption but when we enabled TLS for HDFS, simple jobs like helloworld example of MR jobs are failing with errors.
Please see the session output when TLS is enabled at HDFS/YARN. The same program works fine without any error when TLS is disabled. Any help will be appreciated.
CDH Version : 5.7.4
Keytrustee server version : 5.7.0-1.keytrustee5.7.0.p0.5 (No HA configured. Single node cluster)
Keytrustree KMS versions: 5.8.2-5.KEYTRUSTEE5.8.2.p0.1
Clouster is Kerberos enabled
Few internal system names are changed with generic name for confidentiality.
[root@ ~]# hadoop jar /opt/cloudera/parcels/CDH-5.7.4-1.cdh5.7.4.p0.2/lib/hadoop-mapreduce/hadoop-mapreduce-examples-2.6.0-cdh5.7.4.jar wordcount /tmp/ngdb_files.txt /tmp/five
17/02/22 08:17:59 INFO hdfs.DFSClient: Created token for hive: HDFS_DELEGATION_TOKENowner=hive/192.168.11.222@DOMAINNAME, renewer=yarn, realUser=, issueDate=1487769479000, maxDate=1488374279000, sequenceNumber=398, masterKeyId=25 on ha-hdfs:cemodcluster
17/02/22 08:17:59 INFO security.TokenCache: Got dt for hdfs://cemodcluster; Kind: HDFS_DELEGATION_TOKEN, Service: ha-hdfs:cemodcluster, Ident: (token for hive: HDFS_DELEGATION_TOKEN owner=hive/192.168.11.222@DOMAINNAME, renewer=yarn, realUser=, issueDate=1487769479000, maxDate=1488374279000, sequenceNumber=398, masterKeyId=25)
17/02/22 08:17:59 WARN token.Token: Cannot find class for token kind kms-dt
17/02/22 08:17:59 INFO security.TokenCache: Got dt for hdfs://cemodcluster; Kind: kms-dt, Service: 192.168.11.25:16000, Ident: 00 04 68 69 76 65 04 79 61 72 6e 00 8a 01 5a 65 f8 a8 5f 8a 01 5a 8a 05 2c 5f 8e 01 43 08
17/02/22 08:17:59 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm29
17/02/22 08:17:59 INFO input.FileInputFormat: Total input paths to process : 1
17/02/22 08:17:59 INFO mapreduce.JobSubmitter: number of splits:1
17/02/22 08:18:00 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1487769357731_0005
17/02/22 08:18:00 INFO mapreduce.JobSubmitter: Kind: HDFS_DELEGATION_TOKEN, Service: ha-hdfs:cemodcluster, Ident: (token for hive: HDFS_DELEGATION_TOKENowner=hive/192.168.11.222@DOMAINNAME, renewer=yarn, realUser=, issueDate=1487769479000, maxDate=1488374279000, sequenceNumber=398, masterKeyId=25)
17/02/22 08:18:00 WARN token.Token: Cannot find class for token kind kms-dt
17/02/22 08:18:00 WARN token.Token: Cannot find class for token kind kms-dt
Kind: kms-dt, Service: 192.168.11.25:16000, Ident: 00 04 68 69 76 65 04 79 61 72 6e 00 8a 01 5a 65 f8 a8 5f 8a 01 5a 8a 05 2c 5f 8e 01 43 08
17/02/22 08:18:00 INFO impl.YarnClientImpl: Submitted application application_1487769357731_0005
17/02/22 08:18:00 INFO mapreduce.Job: The url to track the job:https://hadooppassive:8090/proxy/application_1487769357731_0005/
17/02/22 08:18:00 INFO mapreduce.Job: Running job: job_1487769357731_0005
17/02/22 08:18:07 INFO mapreduce.Job: Job job_1487769357731_0005 running in uber mode : false
17/02/22 08:18:07 INFO mapreduce.Job: map 0% reduce 0%
17/02/22 08:18:14 INFO mapreduce.Job: map 100% reduce 0%
17/02/22 08:19:18 INFO mapreduce.Job: Task Id : attempt_1487769357731_0005_r_000000_0, Status : FAILED
Error: org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in shuffle in fetcher#1
at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:134)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:376)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.io.IOException: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.
at org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl.checkReducerHealth(ShuffleSchedulerImpl.java:366)
at org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl.copyFailed(ShuffleSchedulerImpl.java:288)
at org.apache.hadoop.mapreduce.task.reduce.Fetcher.openShuffleUrl(Fetcher.java:282)
at org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:323)
at org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:193)
17/02/22 08:20:24 INFO mapreduce.Job: Task Id : attempt_1487769357731_0005_r_000000_1, Status : FAILED
Error: org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in shuffle in fetcher#1
at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:134)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:376)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.io.IOException: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.
at org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl.checkReducerHealth(ShuffleSchedulerImpl.java:366)
at org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl.copyFailed(ShuffleSchedulerImpl.java:288)
at org.apache.hadoop.mapreduce.task.reduce.Fetcher.openShuffleUrl(Fetcher.java:282)
at org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:323)
at org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:193)
17/02/22 08:21:29 INFO mapreduce.Job: Task Id : attempt_1487769357731_0005_r_000000_2, Status : FAILED
Error: org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in shuffle in fetcher#1
at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:134)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:376)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.io.IOException: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.
at org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl.checkReducerHealth(ShuffleSchedulerImpl.java:366)
at org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl.copyFailed(ShuffleSchedulerImpl.java:288)
at org.apache.hadoop.mapreduce.task.reduce.Fetcher.openShuffleUrl(Fetcher.java:282)
at org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:323)
at org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:193)
17/02/22 08:22:35 INFO mapreduce.Job: map 100% reduce 100%
17/02/22 08:24:17 INFO mapreduce.Job: Job job_1487769357731_0005 failed with state FAILED due to: Task failed task_1487769357731_0005_r_000000
Job failed as tasks failed. failedMaps:0 failedReduces:1
17/02/22 08:24:17 INFO mapreduce.Job: Counters: 38
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=133099
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=59980
HDFS: Number of bytes written=0
HDFS: Number of read operations=3
HDFS: Number of large read operations=0
HDFS: Number of write operations=0
Job Counters
Failed reduce tasks=4
Launched map tasks=1
Launched reduce tasks=4
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=10510
Total time spent by all reduces in occupied slots (ms)=1002256
Total time spent by all map tasks (ms)=5255
Total time spent by all reduce tasks (ms)=250564
Total vcore-seconds taken by all map tasks=5255
Total vcore-seconds taken by all reduce tasks=250564
Total megabyte-seconds taken by all map tasks=21524480
Total megabyte-seconds taken by all reduce tasks=2052620288
Map-Reduce Framework
Map input records=596
Map output records=4768
Map output bytes=72441
Map output materialized bytes=6629
Input split bytes=103
Combine input records=4768
Combine output records=693
Spilled Records=693
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=0
CPU time spent (ms)=1100
Physical memory (bytes) snapshot=1463422976
Virtual memory (bytes) snapshot=4292116480
Total committed heap usage (bytes)=2058354688
File Input Format Counters
Bytes Read=59877