Support Questions
Find answers, ask questions, and share your expertise

LLAP not using io cache

Explorer

I've setup LLAP and it is working fine, but it is not using the IO Cache. I've set the below in both the CLI and HS2, but Grafana shows no cache used (and HDFS name node is very busy keeping the edits). Any ideas on what I might be missing?

--hiveconf hive.execution.mode=llap

--hiveconf hive.llap.execution.mode=all

--hiveconf hive.llap.io.enabled=true

--hiveconf hive.llap.daemon.service.hosts=@llap0

1 ACCEPTED SOLUTION

@James Dinkel

Is there a typo n your question above? You mention hive.llap.iomemory.mode.cache

Correct is: set hive.llap.io.memory.mode=cache

Just checking before moving forward.

What makes me to believe is a typo is that you stated that it was null which is not correct. The default is actually "cache". That makes me to believe that you mistyped the variable.

View solution in original post

18 REPLIES 18

Explorer

Update for those out there thinking of answer, (and a cheat sheet for those browsing of good parameters to set - I have tested each individually and have response time down to 630ms and throughput to 40 queries/sec for a 1Tb dataset, but need to get the iocache going to get to the next level and improve throughput (q/sec)):

The io cache is created (12Gb), but is never used.

I am using OOTB LLAP settings except for below changes, all of which were positive in driving down response time and increasing throughput, except a few were neutral:

-increased mem for yarn, llap heap, llap concurrency

-disabled CBO (it's adding time to query during creating and submitting plan portion of query)

-set hive.llap.iomemory.mode=cache (OOTB this setting was null)

-increased /proc/sys/net/core/somaxconn to 4096

-increase metastore heap

-increased yarn threads (yarn.resourcemanager.amlauncher.thread-count)

-increased Yarn Resource Manager heartbeat interval (tez.am-rm.heartbeat.interval-ms.max)

-set hive.llap.daemon.vcpus.per.instance to 32, previously it is unrecognized variable (ambari bug) and gave message in ambari startup of "WARN conf.HiveConf: HiveConf hive.llap.daemon.vcpus.per.instance expects INT type value"

-added additional HS2 instance.

-added the 4 hive-conf options (hive.execution.mode, hive.llap.execution.mode, hive.lllap.io.enabled, hive.llap.daemon.service.host.) as additional properties in HS2 custom site.

-increased HDFS heap

Version of HDP is 2.5.3.0-37

Version of Hive is 2.1.0.2.5.3.0-37

Explorer

@James Dinkel

Is there a typo n your question above? You mention hive.llap.iomemory.mode.cache

Correct is: set hive.llap.io.memory.mode=cache

Just checking before moving forward.

What makes me to believe is a typo is that you stated that it was null which is not correct. The default is actually "cache". That makes me to believe that you mistyped the variable.

Explorer

yes, typo in post. it is hive.llap.io.memory.mode=cache

Thanks!

@James Dinkel

Ok. Still not explaining that you noticed that variable default value other than "cache". That is the default.

How is hive.llap.io.enabled? By default is null. Try with true.

Explorer

hivellapiomemorymode.png

Yeah, I thought so too. It was literally blank (see screenshot of first ambari version post enabling interactive query). Also odd was the hive.llap.daemon.vcpus.per.instance, it was a variable rather than the number. This generated a WARN at start ("WARN conf.HiveConf: HiveConf hive.llap.daemon.vcpus.per.instance expects INT type value"). So I put 32 in for the vcpu's and I put cache for hive.llap.io.memory.mode. They saved correctly to hive-site.xml.

The third anamoly is I get is a message that says "WARN conf.HiveConf: HiveConf of name hive.llap.daemon.allow.permanent.fns does not exist". This comes from ambari and I can remove from hive-site.xml, but ambari will put it back.

Everything else was as I would expect.

hive.llap.io.enabled is true.

According to Grafana, I have a 12Gb cache, but there isn't anything in it.

Thanks for your help, I appreciate it..

Explorer

I should also add everything looks correct, starts correctly and runs well. It just isn't using the io cache. I do have millions of these messages in yarn timeline server log, but from looking at the orc writer code it doesn't look like orc writers uses timeline server:

2017-02-21 00:20:36,581 WARN timeline.EntityGroupFSTimelineStore (LogInfo.java:doParse(207)) - Error putting entity: dag_1487651061826_0016_921 (TEZ_DAG_ID): 6

Other than that, everything is clean.

Thanks again, I appreciate it.

@James Dinkel

hive.llap.io.memory.mode is in Advanced hive-interactive-site configuration tab in Ambari UI. Could you make me a favor and check that it shows in that tab?

Explorer

yes - it is in advanced hive-interactive-site. Also notice, there is no set-recommended button (blue 3/4 circle arrow). Thanks again Constantin.

@James Dinkel

"In-Memory Cache per Daemon", by default is set to none. Did you allocate anything to it? This configuration is also available in hive-interactive-site.

Explorer

@Constantin Stanca yes, in-memory cache per daemon = 12288 (Mb)

Scott mentioned below some good practices for memory sizing.

Hi @James Dinkel

I'm guessing there is a memory sizing issue. Make sure you follow these sizing rules:

  • MemPerDaemon (Container Size) > LLAP Heapsize (Java process heap) + CacheSize (off heap) +headroom
    • Multiple of yarn min allocation
    • Should be less than yarn.nodemanager.resource.memory-mb
    • Headroom is capped at 6GB
  • QueueSize (Yarn Queue) >= MemPerDaemon * num daemons + slider + (tez AM size * concurrency)
  • Cachesize = MemPerDaemon - (hive tez container * num of executors)
  • Num executors per daemon = (MemPerDaemon - cache_size)/hive tez container size

In addition, be sure your LLAP queue is setup appropriately and has sufficient capacity:

  • <queue>.user_limit_factor =1
  • <queue>.ama-resource-percent =1 (its actually a factor between 0 and 1)
  • <queue>.capacity=100
  • <queue>.max-capacity=100

Explorer

i forgot to update this. Actually, the LLAP and the cache was setup correctly. In the queries taken from the Teradata, each of the 200k queries performs several joins and one of the joins was on a column that was a null. So, result set was null. Side benefit though was a nice gain in knowledge on all the nobs/levers of this product. And it is a very nice product. My experience is in Teradata/Netezza/Hawq and I've found LLAP to be a clear winner in replacing those. Very fast, very robust.

Couple of Notes on the things that mattered:

-Use ORC (duh)

-Get Partitioned by / Clustered by right (use hive --orcfiledump as shown in Sergey's slides to make sure you get 100k records per orc file)

-Get number of nodes / appmasters right

-Use local Metastore

-Get heap's right

-Get yarn capacity scheduler settings right

-Increase executors

-(probably not supported) clone IS2 jvm's for more throughput if needed.

Pull-up hive ui or grafana and sit back, smile and enjoy watching the transactions fly.

Explorer

New Contributor

Hi

I'm using hdp 2.6.4 with hive 2 + llap of course

I've followed all your recommendation here expect for your last comment @James Dinkel

because you do not explain all the other settings you've fixed

but basically I make twice or more the exact same query and the query is not faster the second time that the first

so may be I'm wrong about what is caching and how to use it. Anyway how do you know if you hit the data in cache or not ?

Thanks

Explorer

I am seeing the same issue @Scott Shaw.. Followed your recommendation on daemon and heap settings but still no luck..

Explorer

Is there anyway to debug the io cache component to find out why it's not caching

; ;