Support Questions

Find answers, ask questions, and share your expertise

Hive LLAP fails to start with RUNNING_PARTIAL state

avatar
Rising Star

I'm having trouble getting Hive Interactive (LLAP) to work

I'm working on a kerberized HDP 2.5 cluster.

It seems like only 1 daemon starts, here's the error:

2016-11-03 15:50:50,084 - Marker index for start of JSON data for 'llapsrtatus' comamnd : 0

2016-11-03 15:50:50,085 - LLAP app 'llap0' in 'RUNNING_PARTIAL' state. Live Instances : '1'. Desired Instances : '4' after 221.781697989 secs.

2016-11-03 15:50:50,085 - LLAP app 'llap0' did not come up after a wait of 221.782001972 seconds.

2016-11-03 15:50:50,087 - LLAP app 'llap0' deployment unsuccessful.

What is the cause for this?

1 ACCEPTED SOLUTION

avatar
Expert Contributor

@Nasheb Ismaily

Which YARN queue are you using ? Can you attach the HSI start logs and YARN logs for the LLAP application?

Meanwhile, :

1. if you are using 'llap' named queue, can you increase the "% of cluster capacity" slider from HIVE->configs-> HSI section, or

2. if you are using any other queue, you can decrease slider for "Maximum Total Concurrent Queries".

Save the configs and restart HSI.

View solution in original post

6 REPLIES 6

avatar
Expert Contributor

@Nasheb Ismaily

Which YARN queue are you using ? Can you attach the HSI start logs and YARN logs for the LLAP application?

Meanwhile, :

1. if you are using 'llap' named queue, can you increase the "% of cluster capacity" slider from HIVE->configs-> HSI section, or

2. if you are using any other queue, you can decrease slider for "Maximum Total Concurrent Queries".

Save the configs and restart HSI.

avatar
Rising Star

I increased the capacity and it worked, thanks!

avatar
Contributor

Also I had meet this problems, and try to increase the capacity size, it still can't work. There throws an exception:

org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationException: ExitCodeException exitCode=255.yarn-master-log.tar.gz

You can get the detail info from attachment content. It's there have other solve way. Thanks.

avatar
Expert Contributor

I'm facing the same issue:

YARN:
Memory allocated for all YARN containers on a node = 16G
Minimum Container Size (Memory) = 2G
Maximum Container Size (VCores) = 3

Hive:
% of cluster capacity = 40%
Memory per daemon = 8192
Number of LLAP Daemons = 1 
Memory per daemon = 8192 
In-Memory Cache per Daemon = 2048 
Maximum CPUs per Daemon = 3
 

I do see this error messages on the RM UI:

Diagnostics:  Unstable Application Instance : - failed with 
component LLAP failed 'recently' 6 times (4 in startup); threshold is 5 - last failure: Failure container_e29_1492031103210_0001_01_000007 on host host1.fqdn (0): http://host1.fqdn:19888/jobhistory/logs/host1.fqdn:45454/container_e29_1492031103210_0001_01_000007/...

avatar
Explorer

if nothing worked , you can try update your openssl

avatar
Expert Contributor

Hi guys same problem that I had. I tried many things. Finally I changed my yarn llap queue max capacity from %50 to %100 and then Hive2Interactive Server successfully started. Possible cause in my case: allocated containers exceeded llap queue max capacity.