Support Questions

Find answers, ask questions, and share your expertise

Hive query - Mappers stuck in pending state

avatar
Explorer

Hi Everyone,

I am facing something strange on my HDP cluster while running Hive (Tez) queries.

On queries on specific ORC tables, the YARN task corresponding to the query (1) status is RUNNING, which implies that the required RAM and CPU is succesfully allocated. But the task then keeps waiting. All mappers and reducers stays in the status PENDING (2) for a long time (20 to 30 minutes), then the query starts.

I know that this issue is very specific, and I read several posts similar to this situation, but whithout finding any solution.

My question is : Did anyone face this situation ? Where should I look for further information ? I already looked at hiveserver2 and hivemetastore logs but nothing special shows up.

Any help on this would be very apreciated.

Regards,

Orlando

(1)
select datekey,period,count(*) 
from A.t 
group by datekey,period 
order by datekey,period;
(2)

--------------------------------------------------------------------------------
        VERTICES      STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  KILLED
--------------------------------------------------------------------------------
Map 1                 INITED    909          0        0      909       0       0
Reducer 2             INITED      8          0        0        8       0       0
Reducer 3             INITED      1          0        0        1       0       0
--------------------------------------------------------------------------------
VERTICES: 00/03  [>>--------------------------] 0%    ELAPSED TIME: 84.23 s
--------------------------------------------------------------------------------


2 REPLIES 2

avatar
Contributor

It would be good to look at the app logs for the query to see where the tasks are stuck. Some helpful tips to debug tez: https://cwiki.apache.org/confluence/display/TEZ/How+to+Diagnose+Tez+App

avatar
Explorer

Hi,

First of all, thank you for your help.

I checked the app logs but unfortunatly I don't see anything wiered.

The ony thing a bit confusing is the syslog_dag of the first container looping with

2017-02-17 09:27:46,933 [INFO] [Dispatcher thread {Central}] |util.RackResolver|: Resolved <worker1_IP> to /default-rack
2017-02-17 09:27:46,933 [INFO] [Dispatcher thread {Central}] |util.RackResolver|: Resolved <worker2_IP> to /default-rack
...

What excatly is the RackResolver ? Could this be dued to a issue in my network's configuration ?

Bests,

Orlando