Support Questions

Find answers, ask questions, and share your expertise

Impala on yarn

avatar
Explorer

I am running a set of queries in Hive and Impala in cloudera cluster. As we know cloudera runs hive queries on yarn but not the impala queries. I want to run the impala queries on yarn. I tried it with impala Llama but what happened is when i set the cluster for Llama, the queries were running but while looking at cloudera manager under yarn application its showing running until I didn't killed it, also after doing all these settings my Hive queries are not running, they are all getting failed. Can anyone please tell me how can I do it, Is there any other way to run the impala query on yarn?

1 ACCEPTED SOLUTION

avatar
Guru
I don't have a ton of experience with Llama, but I think the
misunderstanding here is that Impala manages the execution of its own
queries, and the MapReduce framework manages the execution of Hive queries.
YARN manages resources for individual MapReduce jobs, and it can manage the
Impala daemons via Llama. The YARN application for Llama will run as long
as Impala does - that's by design to keep the latency of Impala queries
very low. In the case of Hive, YARN will manage the job's resources only
until that job (a single query) is finished.

Not sure why your Hive queries would not be running. If this is in the
QuickStart VM, my first guess would be that if Llama is still running and
there aren't enough executors / slots for your Hive queries. YARN in the
QuickStart VM is not going to be configured with a lot of capacity and it's
not tested with Llama.

I know of no other way to manage Impala resources via YARN, though.

View solution in original post

1 REPLY 1

avatar
Guru
I don't have a ton of experience with Llama, but I think the
misunderstanding here is that Impala manages the execution of its own
queries, and the MapReduce framework manages the execution of Hive queries.
YARN manages resources for individual MapReduce jobs, and it can manage the
Impala daemons via Llama. The YARN application for Llama will run as long
as Impala does - that's by design to keep the latency of Impala queries
very low. In the case of Hive, YARN will manage the job's resources only
until that job (a single query) is finished.

Not sure why your Hive queries would not be running. If this is in the
QuickStart VM, my first guess would be that if Llama is still running and
there aren't enough executors / slots for your Hive queries. YARN in the
QuickStart VM is not going to be configured with a lot of capacity and it's
not tested with Llama.

I know of no other way to manage Impala resources via YARN, though.