Created on 10-04-2017 10:59 AM - edited 09-16-2022 05:20 AM
Hi, Team
I have a problem when querying select count(*) from table atau select distinct field from table atau select * from table order by field.
when checked through yarn, then application details.
error appears :
2017-10-03 16:22:54,826 INFO [IPC Server handler 12 on 35262] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt attempt_1506588001647_0368_r_000000_0 is : 0.21282798
2017-10-03 16:22:54,827 FATAL [IPC Server handler 10 on 35262] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Task: attempt_1506588001647_0368_r_000000_0 - exited : org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in shuffle in fetcher#3
at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:134)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:376)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1709)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.io.IOException: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.
at org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl.checkReducerHealth(ShuffleSchedulerImpl.java:391)
at org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl.copyFailed(ShuffleSchedulerImpl.java:306)
at org.apache.hadoop.mapreduce.task.reduce.Fetcher.openShuffleUrl(Fetcher.java:294)
at org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:335)
at org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:198)
can be helped on this
Many Thanks
Created 10-04-2017 09:25 PM
could you fire the below commands in master and slave
netstat -anp | grep 50060
also see if you can ping your slave from master and vice versa
looks like issue between them
Created 10-09-2017 05:33 PM
Created on 04-17-2020 10:03 PM - edited 04-17-2020 10:06 PM
You can set auth in config. such as :
configuration.set("yarn.nodemanager.webapp.spnego-principal", "HTTP/_HOST@DEMO.CN");
configuration.set("yarn.resourcemanager.webapp.spnego-principal", "HTTP/_HOST@DEMO.CN");
becsuse of cache file in cluster, when reducer is going, job clould fetch data from others node, so set yarn web auth is needed.
so your all auth config is :
package demo.utils;
import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.security.UserGroupInformation;
public class Auth {
private String keytab;
public Auth(String keytab) {
this.keytab = keytab;
}
public void authorization(Configuration configuration) {
System.setProperty("java.security.krb5.conf", "/etc/krb5.conf");
configuration.set("hadoop.security.authentication", "Kerberos");
configuration.set("fs.defaultFS", "hdfs://m1.DEMO.CN");
configuration.set("dfs.namenode.kerberos.principal.pattern", "nn/*@DEMO.CN");
configuration.set("yarn.nodemanager.principal", "nm/_HOST@DEMO.CN");
configuration.set("yarn.resourcemanager.principal", "rm/_HOST@DEMO.CN");
configuration.set("yarn.nodemanager.webapp.spnego-principal", "HTTP/_HOST@DEMO.CN");
configuration.set("yarn.resourcemanager.webapp.spnego-principal", "HTTP/_HOST@DEMO.CN");
UserGroupInformation.setConfiguration(configuration);
try {
UserGroupInformation.setConfiguration(configuration);
UserGroupInformation.loginUserFromKeytab("user@DEMO.CN", this.keytab);
} catch (IOException e) {
e.printStackTrace();
}
}
}
Created 10-18-2021 10:59 AM
Usually, Exception: java.io.IOException: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out is caused by communication issues among Hadoop cluster nodes.