Support Questions

Korez · ‎11-05-2021

Hello,

I have a 16 node cluster running hive, I am running a query that pulls only 2.1 million records out of another table for down-sampling, but every now and then the query coordinator takes a lot of time to kick off, this happens maybe once every 40-60 minutes, it is not consistent.

The following snippet shows the problem that I am talking about.

2021-11-05 20:22:42,807 INFO SessionState: [HiveServer2-Background-Pool: Thread-18688845]:
2021-11-05 20:22:42,807 INFO SessionState: [HiveServer2-Background-Pool: Thread-18688845]: Query Execution Summary
2021-11-05 20:22:42,807 INFO SessionState: [HiveServer2-Background-Pool: Thread-18688845]: ----------------------------------------------------------------------------------------------
2021-11-05 20:22:42,807 INFO SessionState: [HiveServer2-Background-Pool: Thread-18688845]: OPERATION DURATION
2021-11-05 20:22:42,807 INFO SessionState: [HiveServer2-Background-Pool: Thread-18688845]: ----------------------------------------------------------------------------------------------
2021-11-05 20:22:42,807 INFO SessionState: [HiveServer2-Background-Pool: Thread-18688845]: Compile Query 0.08s
2021-11-05 20:22:42,807 INFO SessionState: [HiveServer2-Background-Pool: Thread-18688845]: Prepare Plan 0.04s
2021-11-05 20:22:42,807 INFO SessionState: [HiveServer2-Background-Pool: Thread-18688845]: Get Query Coordinator (AM) 443.42s
2021-11-05 20:22:42,807 INFO SessionState: [HiveServer2-Background-Pool: Thread-18688845]: Submit Plan 0.02s
2021-11-05 20:22:42,807 INFO SessionState: [HiveServer2-Background-Pool: Thread-18688845]: Start DAG 0.01s
2021-11-05 20:22:42,807 INFO SessionState: [HiveServer2-Background-Pool: Thread-18688845]: Run DAG 68.02s
2021-11-05 20:22:42,807 INFO SessionState: [HiveServer2-Background-Pool: Thread-18688845]: ----------------------------------------------------------------------------------------------
2021-11-05 20:22:42,807 INFO SessionState: [HiveServer2-Background-Pool: Thread-18688845]:
2021-11-05 20:22:42,809 INFO SessionState: [HiveServer2-Background-Pool: Thread-18688845]: Task Execution Summary
2021-11-05 20:22:42,809 INFO SessionState: [HiveServer2-Background-Pool: Thread-18688845]: ----------------------------------------------------------------------------------------------
2021-11-05 20:22:42,809 INFO SessionState: [HiveServer2-Background-Pool: Thread-18688845]: VERTICES DURATION(ms) CPU_TIME(ms) GC_TIME(ms) INPUT_RECORDS OUTPUT_RECORDS
2021-11-05 20:22:42,809 INFO SessionState: [HiveServer2-Background-Pool: Thread-18688845]: ----------------------------------------------------------------------------------------------
2021-11-05 20:22:42,810 INFO SessionState: [HiveServer2-Background-Pool: Thread-18688845]: Map 1 64733.00 160,240 3,109 2,130,834 2,130,834
2021-11-05 20:22:42,810 INFO SessionState: [HiveServer2-Background-Pool: Thread-18688845]: Reducer 2 5292.00 53,520 281 2,130,834 0
2021-11-05 20:22:42,810 INFO SessionState: [HiveServer2-Background-Pool: Thread-18688845]: ----------------------------------------------------------------------------------------------

If you see, the "Get Query Coordinator (AM)" is stuck for 443 seconds, while the rest of the process happened pretty quickly.

What could be causing this? I am running 7 queries every 5 minutes, on top of those, I run 7 more once an hour (so when they are all running it is about 14, plus some compactions happening at the same time).

I see that heap is not even at 50% for both HiveServer2 and metastore, HDFS seems cool, so no idea what is this (AM) waiting for? Any idea is greatly appreciated.

Thanks

Mark

smruti · ‎11-14-2021

Hi @Korez,

There are a bunch of Hive properties that we can make use of to make sure there are a number of AM containers that keep running for the default queues.

hive.server2.tez.default.queues=queue1,queue2,queue3
set hive.server2.tez.sessions.per.default.queue=3 //Number of AM containers/queue
set hive.server2.tez.initialize.default.sessions=true
set hive.prewarm.enabled=true
set hive.prewarm.numcontainers=2
set tez.am.container.reuse.enabled=true
set tez.am.container.idle.release-timeout-max.millis=20000
set tez.am.container.idle.release-timeout-min.millis=10000

When you submit the job, do you specify the Tez queue name(tez.queue.name) explicitly? If you do, in that case, they might not use the existing AM containers from the pool, but create a new one.

443 seconds is still very long time to launch a container. Do check if you have any resource constraints in YARN.

Korez · ‎11-16-2021

@smruti

Thanks for the response.

I never specify a queue when sending a query, actually, my default.queues is only "default"

So, what is the recommendation on how many queues to have, I think that then my problem is to be using default only?

I am also not warming up the queue.

My Yarn seems to be pretty good, I have enough memory and resources, I just think tez is not configured all the way. Any recommendation is greatly appreciated!

Thanks

smruti · ‎11-16-2021

Hi @Korez

Then, pease consider setting the properties that I mentioned earlier.

set hive.server2.tez.sessions.per.default.queue=3 //Number of AM containers/queue
set hive.server2.tez.initialize.default.sessions=true
set hive.prewarm.enabled=true
set hive.prewarm.numcontainers=2
set tez.am.container.reuse.enabled=true
set tez.am.container.idle.release-timeout-max.millis=20000
set tez.am.container.idle.release-timeout-min.millis=10000

This will help keep AM containers up and ready for a hive query.

agcuong · ‎08-23-2023

Dear smruti,

Pls i also am facing the same issue in Get Query Coordinator (AM) taking long time above 300sec to during creating Tez session

I already tried your mention above but no luck

Please give me more investigation

Thanks & regards,

DianaTorres · ‎08-23-2023

@agcuong As this is an older post, you would have a better chance of receiving a resolution by starting a new thread. This will also be an opportunity to provide details specific to your environment that could aid others in assisting you with a more accurate answer to your question. You can link this thread as a reference in your new post. Thanks.

Regards,

Diana Torres,
Senior Community Moderator

Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.
Learn more about the Cloudera Community:
Community Guidelines
How to use the forum

Support Questions

Query coordinator taking too much time to start