Support Questions
Find answers, ask questions, and share your expertise

Query coordinator taking too much time to start

New Contributor

Hello,

 

I have a 16 node cluster running hive, I am running a query that pulls only 2.1 million records out of another table for down-sampling, but every now and then the query coordinator takes a lot of time to kick off, this happens maybe once every 40-60 minutes, it is not consistent.

 

The following snippet shows the problem that I am talking about.

 

2021-11-05 20:22:42,807 INFO SessionState: [HiveServer2-Background-Pool: Thread-18688845]:
2021-11-05 20:22:42,807 INFO SessionState: [HiveServer2-Background-Pool: Thread-18688845]: Query Execution Summary
2021-11-05 20:22:42,807 INFO SessionState: [HiveServer2-Background-Pool: Thread-18688845]: ----------------------------------------------------------------------------------------------
2021-11-05 20:22:42,807 INFO SessionState: [HiveServer2-Background-Pool: Thread-18688845]: OPERATION DURATION
2021-11-05 20:22:42,807 INFO SessionState: [HiveServer2-Background-Pool: Thread-18688845]: ----------------------------------------------------------------------------------------------
2021-11-05 20:22:42,807 INFO SessionState: [HiveServer2-Background-Pool: Thread-18688845]: Compile Query 0.08s
2021-11-05 20:22:42,807 INFO SessionState: [HiveServer2-Background-Pool: Thread-18688845]: Prepare Plan 0.04s
2021-11-05 20:22:42,807 INFO SessionState: [HiveServer2-Background-Pool: Thread-18688845]: Get Query Coordinator (AM) 443.42s
2021-11-05 20:22:42,807 INFO SessionState: [HiveServer2-Background-Pool: Thread-18688845]: Submit Plan 0.02s
2021-11-05 20:22:42,807 INFO SessionState: [HiveServer2-Background-Pool: Thread-18688845]: Start DAG 0.01s
2021-11-05 20:22:42,807 INFO SessionState: [HiveServer2-Background-Pool: Thread-18688845]: Run DAG 68.02s
2021-11-05 20:22:42,807 INFO SessionState: [HiveServer2-Background-Pool: Thread-18688845]: ----------------------------------------------------------------------------------------------
2021-11-05 20:22:42,807 INFO SessionState: [HiveServer2-Background-Pool: Thread-18688845]:
2021-11-05 20:22:42,809 INFO SessionState: [HiveServer2-Background-Pool: Thread-18688845]: Task Execution Summary
2021-11-05 20:22:42,809 INFO SessionState: [HiveServer2-Background-Pool: Thread-18688845]: ----------------------------------------------------------------------------------------------
2021-11-05 20:22:42,809 INFO SessionState: [HiveServer2-Background-Pool: Thread-18688845]: VERTICES DURATION(ms) CPU_TIME(ms) GC_TIME(ms) INPUT_RECORDS OUTPUT_RECORDS
2021-11-05 20:22:42,809 INFO SessionState: [HiveServer2-Background-Pool: Thread-18688845]: ----------------------------------------------------------------------------------------------
2021-11-05 20:22:42,810 INFO SessionState: [HiveServer2-Background-Pool: Thread-18688845]: Map 1 64733.00 160,240 3,109 2,130,834 2,130,834
2021-11-05 20:22:42,810 INFO SessionState: [HiveServer2-Background-Pool: Thread-18688845]: Reducer 2 5292.00 53,520 281 2,130,834 0
2021-11-05 20:22:42,810 INFO SessionState: [HiveServer2-Background-Pool: Thread-18688845]: ----------------------------------------------------------------------------------------------

 

 

 

 

If you see, the "Get Query Coordinator (AM)" is stuck for 443 seconds, while the rest of the process happened pretty quickly.

 

What could be causing this? I am running 7 queries every 5 minutes, on top of those, I run 7 more once an hour (so when they are all running it is about 14, plus some compactions happening at the same time).

 

I see that heap is not even at 50% for both HiveServer2 and metastore, HDFS seems cool, so no idea what is this (AM) waiting for? Any idea is greatly appreciated.

 

 

Thanks

Mark

 

3 REPLIES 3

Contributor

Hi @Korez,

There are a bunch of Hive properties that we can make use of to make sure there are a number of AM containers that keep running for the default queues.

hive.server2.tez.default.queues=queue1,queue2,queue3
set hive.server2.tez.sessions.per.default.queue=3 //Number of AM containers/queue
set hive.server2.tez.initialize.default.sessions=true
set hive.prewarm.enabled=true
set hive.prewarm.numcontainers=2
set tez.am.container.reuse.enabled=true​
set tez.am.container.idle.release-timeout-max.millis=20000
set tez.am.container.idle.release-timeout-min.millis=10000

 When you submit the job, do you specify the Tez queue name(tez.queue.name) explicitly? If you do, in that case, they might not use the existing AM containers from the pool, but create a new one.

 

443 seconds is still very long time to launch a container. Do check if you have any resource constraints in YARN.

 

New Contributor

@smruti 

 

Thanks for the response.

I never specify a queue when sending a query, actually, my default.queues is only "default"

 

So, what is the recommendation on how many queues to have, I think that then my problem is to be using default only?

 

I am also not warming up the queue.

 

My Yarn seems to be pretty good, I have enough memory and resources, I just think tez is not configured all the way. Any recommendation is greatly appreciated!

 

Thanks

Contributor

Hi @Korez 

Then, pease consider setting the properties that I mentioned earlier.

set hive.server2.tez.sessions.per.default.queue=3 //Number of AM containers/queue
set hive.server2.tez.initialize.default.sessions=true
set hive.prewarm.enabled=true
set hive.prewarm.numcontainers=2
set tez.am.container.reuse.enabled=true​
set tez.am.container.idle.release-timeout-max.millis=20000
set tez.am.container.idle.release-timeout-min.millis=10000

This will help keep AM containers up and ready for a hive query.