Created 10-05-2017 02:50 PM
Environment:
HDP 2.6 with Kerberson enabled
OS RedHat 7.3
MS SQL Server 2017 with Polybase
Polybase works fine when during direct loading from HDFS.
However if I use FORCE EXTERNALPUSHDOWN option (it means that Polybase start MR job on the cluster) - all containers fail with following error
2017-10-05 16:29:25,066 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Received completed container container_e50_1507209892753_0004_01_0000292017-10-05 16:29:25,067 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Received completed container container_e50_1507209892753_0004_01_0000162017-10-05 16:29:25,068 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Scheduling: PendingReds:0 ScheduledMaps:0 ScheduledReds:0 AssignedMaps:27 AssignedReds:0 CompletedMaps:0 CompletedReds:0 ContAlloc:29 ContRel:0 HostLocal:25 RackLocal:42017-10-05 16:29:25,075 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1507209892753_0004_m_000016_0 TaskAttempt Transitioned from RUNNING to FAIL_CONTAINER_CLEANUP2017-10-05 16:29:25,075 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1507209892753_0004_m_000016_0: Exception from container-launch.Container id: container_e50_1507209892753_0004_01_000029Exit code: 1Stack trace: org.apache.hadoop.yarn.server.nodemanager.containermanager.runtime.ContainerExecutionException: Launch container failed at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DefaultLinuxContainerRuntime.launchContainer(DefaultLinuxContainerRuntime.java:109) at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime.launchContainer(DelegatingLinuxContainerRuntime.java:89) at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:392) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:317) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:83) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745)
Where to find root cause?
Created 10-05-2017 03:56 PM
You may have to look at the complete Yarn logs for this application ID which would give more details regarding cause of failure. The logs can be gathered with below command
yarn logs -applicationId application_1507209892753_0004
Created 10-30-2017 10:17 AM
Hello!
Same logs I can find using web-ui - there is nothing about cause why job got faild. Containers simply
Task Transitioned from RUNNING to KILL_WAIT
Created 10-06-2017 08:16 AM
Same logs I found in YARN RM console
One more suspicious line - before start container
2017-10-05 18:20:26,264 WARN [main] org.apache.hadoop.ipc.Client: Exception encountered while connecting to the server : org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS]
Created 10-10-2017 06:48 AM
Did you have a valid kerberos ticket before submitting the YARN job?
Share the output of 'klist' command?
Created 10-12-2017 01:39 PM
Can't do kinit because this job initiated by MS SQL Polybase connector