Created 02-26-2016 09:27 AM
Hello,
Since we have activated the HA namenode in Ambari 2.1.2.1 we are not able to run queries in spark with sparkhive (sqlcontext) in our kerberized cluster.
It keeps raising this error :
Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 6, [datanode fqdn]): java.io.IOException: Failed on local exception: java.io.IOException: org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS]; Host Details : local host is: "[datanode fqdn]/[datanode 9 address]"; destination host is: "[namenode fqdn]":8020;
Funny thing is that sometimes it works, for some users , then it fails, then works again etc ...
Note that we are now using Ambari 2.2 and HDP 2.3.4 and it did not solve the issue.
Any idea?
Created 02-26-2016 10:06 AM
I think in your case your client system from where you execute the query its not able to communicate with namenode and datanode systems using kerberos authentication.
For debugging check kerberos log file so you can find out from which user query get executed with fqdn system names and reason of failure job.
Created 02-29-2016 06:55 PM
Hi,
It seems few datanodes are not able to communicate with namenode/keberos server for getting ticket. I will suggest to check below things -
1. Can you pls check if the hostnames of all machines are correct in /etc/hosts
2. check the principals and corresponding hostname in kerberos for the datanodes and namenodes.
3. Paste the logs [job logs] for more details