Support Questions

ocassano · ‎02-14-2017

Hi Everyone,

I am facing something strange on my HDP cluster while running Hive (Tez) queries.

On queries on specific ORC tables, the YARN task corresponding to the query (1) status is RUNNING, which implies that the required RAM and CPU is succesfully allocated. But the task then keeps waiting. All mappers and reducers stays in the status PENDING (2) for a long time (20 to 30 minutes), then the query starts.

I know that this issue is very specific, and I read several posts similar to this situation, but whithout finding any solution.

My question is : Did anyone face this situation ? Where should I look for further information ? I already looked at hiveserver2 and hivemetastore logs but nothing special shows up.

Any help on this would be very apreciated.

Regards,

Orlando

(1)
select datekey,period,count(*) 
from A.t 
group by datekey,period 
order by datekey,period;

(2)

--------------------------------------------------------------------------------
        VERTICES      STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  KILLED
--------------------------------------------------------------------------------
Map 1                 INITED    909          0        0      909       0       0
Reducer 2             INITED      8          0        0        8       0       0
Reducer 3             INITED      1          0        0        1       0       0
--------------------------------------------------------------------------------
VERTICES: 00/03  [>>--------------------------] 0%    ELAPSED TIME: 84.23 s
--------------------------------------------------------------------------------

vgumashta · ‎02-16-2017

It would be good to look at the app logs for the query to see where the tasks are stuck. Some helpful tips to debug tez: https://cwiki.apache.org/confluence/display/TEZ/How+to+Diagnose+Tez+App

ocassano · ‎02-17-2017

Hi,

First of all, thank you for your help.

I checked the app logs but unfortunatly I don't see anything wiered.

The ony thing a bit confusing is the syslog_dag of the first container looping with

2017-02-17 09:27:46,933 [INFO] [Dispatcher thread {Central}] |util.RackResolver|: Resolved <worker1_IP> to /default-rack
2017-02-17 09:27:46,933 [INFO] [Dispatcher thread {Central}] |util.RackResolver|: Resolved <worker2_IP> to /default-rack
...

What excatly is the RackResolver ? Could this be dued to a issue in my network's configuration ?

Bests,

Orlando

Cloudera Community

Support Questions

Hive query - Mappers stuck in pending state