Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Hive on Tez+LLAP tasks hang

Hive on Tez+LLAP tasks hang

New Contributor

Hi,

I run the following SQL statement

SELECT COUNT(*) FROM s9;

on Hive 2.1 CLI with the configurations

hive.execution.mode = llap
hive.llap.execution.mode = all

the job failed as a result of all 3 TaskAttempts timed out.

hive> SELECT COUNT(*) FROM s9;
Query ID = hadoop_20160914170649_b9482763-423b-4f13-b166-beec894a5dad
Total jobs = 1
Launching Job 1 out of 1

Status: Running (Executing on YARN cluster with App id application_1473650558014_0066)

----------------------------------------------------------------------------------------------
        VERTICES      MODE        STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  KILLED
----------------------------------------------------------------------------------------------
Map 1                 llap        FAILED      1          0        0        1       4       0
Reducer 2             llap        KILLED      1          0        0        1       0       0
----------------------------------------------------------------------------------------------
VERTICES: 00/02  [>>--------------------------] 0%    ELAPSED TIME: 1316.16 s
----------------------------------------------------------------------------------------------
Status: Failed
Vertex failed, vertexName=Map 1, vertexId=vertex_1473650558014_0066_1_00, diagnostics=[Task failed, taskId=task_1473650558014_0066_1_00_000000, diagnostics=[TaskAttempt 0 failed, info=[Container container_222212222_0066_01_000001 timed out], TaskAttempt 1 failed, info=[Container container_222212222_0066_01_000002 timed out], TaskAttempt 2 failed, info=[Container container_222212222_0066_01_000003 timed out], TaskAttempt 3 failed, info=[Container container_222212222_0066_01_000004 timed out]], Vertex did not succeed due to OWN_TASK_FAILURE, failedTasks:1 killedTasks:0, Vertex vertex_1473650558014_0066_1_00 [Map 1] killed/failed due to:OWN_TASK_FAILURE]
Vertex killed, vertexName=Reducer 2, vertexId=vertex_1473650558014_0066_1_01, diagnostics=[Vertex received Kill while in RUNNING state., Vertex did not succeed due to OTHER_VERTEX_FAILURE, failedTasks:0 killedTasks:1, Vertex vertex_1473650558014_0066_1_01 [Reducer 2] killed/failed due to:OTHER_VERTEX_FAILURE]
DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:1
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Map 1, vertexId=vertex_1473650558014_0066_1_00, diagnostics=[Task failed, taskId=task_1473650558014_0066_1_00_000000, diagnostics=[TaskAttempt 0 failed, info=[Container container_222212222_0066_01_000001 timed out], TaskAttempt 1 failed, info=[Container container_222212222_0066_01_000002 timed out], TaskAttempt 2 failed, info=[Container container_222212222_0066_01_000003 timed out], TaskAttempt 3 failed, info=[Container container_222212222_0066_01_000004 timed out]], Vertex did not succeed due to OWN_TASK_FAILURE, failedTasks:1 killedTasks:0, Vertex vertex_1473650558014_0066_1_00 [Map 1] killed/failed due to:OWN_TASK_FAILURE]Vertex killed, vertexName=Reducer 2, vertexId=vertex_1473650558014_0066_1_01, diagnostics=[Vertex received Kill while in RUNNING state., Vertex did not succeed due to OTHER_VERTEX_FAILURE, failedTasks:0 killedTasks:1, Vertex vertex_1473650558014_0066_1_01 [Reducer 2] killed/failed due to:OTHER_VERTEX_FAILURE]DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:1

After switching hive.execution.mode from "llap" to "container", the query job succeed.

Any ideas?

Thanks!

2 REPLIES 2
Highlighted

Re: Hive on Tez+LLAP tasks hang

Rising Star

Is this on some sort of cloud provider? There's a known issue when DNS is not setup for forward/reverse resolutions to work the same way.

Re: Hive on Tez+LLAP tasks hang

New Contributor

@gopal, Thanks for the response and sorry for the delay!

We do not run Hive+LLAP on some cloud service but our own a small set of physical nodes, and use simple /etc/hosts to resolve hostname resolution. For /etc/hosts does not get involved in the process of DNS reverse resolution, might it be the cause of Hive+LLAP query hang?