Created on 07-09-2018 10:09 PM - edited 08-18-2019 02:08 AM
I'm trying to set up Hive LLAP with IO cache (it's working fine without). I'm using ORC data, my memory settings are fine, and LLAP/IO-specific Hive configurations are:
--hiveconf hive.execution.mode=llap
--hiveconf hive.llap.execution.mode=all
--hiveconf hive.llap.io.enabled=true
--hiveconf hive.llap.daemon.service.hosts=@llap0
--hiveconf hive.llap.io.memory.mode=cache
When trying to print the LLAP IO summary metrics at the end of a query, all the values are coming as 0, indicating that LlapIOCounters are not getting registered and OrcEncodedDataReader is not getting called (see attached screenshot).
Any ideas what I might be missing?
Created 07-10-2018 09:55 PM
It could be counters not being propagated correctly; usually when LLAP IO is not used, this counter section is not displayed at all. You might want to check LLAP logs to see if threads with names beginning with "IO-Elevator"; these are IO threads. Also check that LLAP daemon logs have IO initialized correctly (in the beginning when the daemon starts, there should be lines about cache size, eviction policy, etc. Then, HS2 and LLAP logs may have lines from "wrapForLlap" method in HiveInputFormat indicating errors when trying to use LLAP IO. Also, is this an ACID table? ACID tables are not able to use LLAP cache until Hive 3.1 in Apache, or HDP 3.X
Created 07-10-2018 11:09 PM
Thanks for your reply. ACID is disabled. This is the log message I'm getting in HS2 logs:
io.HiveInputFormat: Not using llap for org.apache.hadoop.mapred.SequenceFileInputFormat@5c6e83c6: supported = false, vectorized = false
These are the commands I ran to ensure my sample data is created using ORC:
SET hive.default.fileformat=ORC; SET hive.default.fileformat.managed=ORC; create database tmp; use tmp; create table test (id int); insert into test (id) values (1); select count(*) from test;
What am I doing wrong here?
Created 07-10-2018 11:18 PM
Doesn't look like the default format is kicking in. Can you create the table as ORC explicitly (stored as ORC)?
Also depending on what version this is, vectorization may need to be enabled; it's good for performance anyway. Or hive.llap.io.row.wrapper.enabled needs to be set to true (if present in your version).
Created 07-10-2018 11:57 PM
Did try creating the table as ORC explicitly but still facing the same issue. What input format should it be ideally picking up?