Created on 02-21-2017 07:21 AM - edited 09-16-2022 04:07 AM
I've setup LLAP and it is working fine, but it is not using the IO Cache. I've set the below in both the CLI and HS2, but Grafana shows no cache used (and HDFS name node is very busy keeping the edits). Any ideas on what I might be missing?
--hiveconf hive.execution.mode=llap
--hiveconf hive.llap.execution.mode=all
--hiveconf hive.llap.io.enabled=true
--hiveconf hive.llap.daemon.service.hosts=@llap0
Created 02-21-2017 09:38 PM
Is there a typo n your question above? You mention hive.llap.iomemory.mode.cache
Correct is: set hive.llap.io.memory.mode=cache
Just checking before moving forward.
What makes me to believe is a typo is that you stated that it was null which is not correct. The default is actually "cache". That makes me to believe that you mistyped the variable.
Created 02-22-2017 03:22 AM
"In-Memory Cache per Daemon", by default is set to none. Did you allocate anything to it? This configuration is also available in hive-interactive-site.
Created 02-22-2017 03:27 PM
@Constantin Stanca yes, in-memory cache per daemon = 12288 (Mb)
Created 03-05-2017 02:50 AM
Scott mentioned below some good practices for memory sizing.
Created 02-23-2017 06:54 PM
I'm guessing there is a memory sizing issue. Make sure you follow these sizing rules:
In addition, be sure your LLAP queue is setup appropriately and has sufficient capacity:
Created 06-08-2017 01:14 PM
i forgot to update this. Actually, the LLAP and the cache was setup correctly. In the queries taken from the Teradata, each of the 200k queries performs several joins and one of the joins was on a column that was a null. So, result set was null. Side benefit though was a nice gain in knowledge on all the nobs/levers of this product. And it is a very nice product. My experience is in Teradata/Netezza/Hawq and I've found LLAP to be a clear winner in replacing those. Very fast, very robust.
Couple of Notes on the things that mattered:
-Use ORC (duh)
-Get Partitioned by / Clustered by right (use hive --orcfiledump as shown in Sergey's slides to make sure you get 100k records per orc file)
-Get number of nodes / appmasters right
-Use local Metastore
-Get heap's right
-Get yarn capacity scheduler settings right
-Increase executors
-(probably not supported) clone IS2 jvm's for more throughput if needed.
Pull-up hive ui or grafana and sit back, smile and enjoy watching the transactions fly.
Created 06-08-2017 01:19 PM
Created 04-03-2018 01:02 PM
Hi
I'm using hdp 2.6.4 with hive 2 + llap of course
I've followed all your recommendation here expect for your last comment @James Dinkel
because you do not explain all the other settings you've fixed
but basically I make twice or more the exact same query and the query is not faster the second time that the first
so may be I'm wrong about what is caching and how to use it. Anyway how do you know if you hit the data in cache or not ?
Thanks
Created 11-28-2018 12:09 AM
I am seeing the same issue @Scott Shaw.. Followed your recommendation on daemon and heap settings but still no luck..
Created 11-28-2018 12:10 AM
Is there anyway to debug the io cache component to find out why it's not caching