Created 02-14-2017 09:50 AM
Hi Everyone,
I am facing something strange on my HDP cluster while running Hive (Tez) queries.
On queries on specific ORC tables, the YARN task corresponding to the query (1) status is RUNNING, which implies that the required RAM and CPU is succesfully allocated. But the task then keeps waiting. All mappers and reducers stays in the status PENDING (2) for a long time (20 to 30 minutes), then the query starts.
I know that this issue is very specific, and I read several posts similar to this situation, but whithout finding any solution.
My question is : Did anyone face this situation ? Where should I look for further information ? I already looked at hiveserver2 and hivemetastore logs but nothing special shows up.
Any help on this would be very apreciated.
Regards,
Orlando
(1) select datekey,period,count(*) from A.t group by datekey,period order by datekey,period;
(2) -------------------------------------------------------------------------------- VERTICES STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED -------------------------------------------------------------------------------- Map 1 INITED 909 0 0 909 0 0 Reducer 2 INITED 8 0 0 8 0 0 Reducer 3 INITED 1 0 0 1 0 0 -------------------------------------------------------------------------------- VERTICES: 00/03 [>>--------------------------] 0% ELAPSED TIME: 84.23 s --------------------------------------------------------------------------------
Created 02-16-2017 10:41 PM
It would be good to look at the app logs for the query to see where the tasks are stuck. Some helpful tips to debug tez: https://cwiki.apache.org/confluence/display/TEZ/How+to+Diagnose+Tez+App
Created 02-17-2017 08:33 AM
Hi,
First of all, thank you for your help.
I checked the app logs but unfortunatly I don't see anything wiered.
The ony thing a bit confusing is the syslog_dag of the first container looping with
2017-02-17 09:27:46,933 [INFO] [Dispatcher thread {Central}] |util.RackResolver|: Resolved <worker1_IP> to /default-rack 2017-02-17 09:27:46,933 [INFO] [Dispatcher thread {Central}] |util.RackResolver|: Resolved <worker2_IP> to /default-rack ...
What excatly is the RackResolver ? Could this be dued to a issue in my network's configuration ?
Bests,
Orlando