Created 10-08-2017 08:16 PM
I am taking part in PoC project where we are a looking for solution for interactive analytics (Tableau client)
1. Apache Hive (version 2.1.0.2.6.1.0-129) Driver: Hive JDBC (version 1.2.1000.2.6.1.0-129)
2. We have configured 3 node HDP cluster with Hive + LLAP. All our test tables created in ORC format with "orc.compress"="ZLIB" option.
3. Fact table PARTITIONED BY (`date` date) with dynamic partitions.
4. Tables column statistics were collected for all tables.
Unfortunately some of our test queries have failed with error:
ERROR : Vertex failed, vertexName=Map 1, vertexId=vertex_1507032990279_0050_1_11, diagnostics=[Task failed, taskId=task_1507032990279_0050_1_11_000000, diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( failure ) : attempt_1507032990279_0050_1_11_000000_0:java.lang.RuntimeException: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row
Query runs with next parameters specified explicitly:
set tez.queue.name=llap; set hive.llap.execution.mode=all; set hive.execution.engine=tez; set mapred.reduce.tasks=-1; set hive.exec.parallel=true; set hive.exec.dynamic.partition=true; set hive.exec.dynamic.partition.mode = nonstrict; set hive.exec.max.dynamic.partitions.pernode=256; set hive.exec.max.dynamic.partitions=10000; set hive.optimize.sort.dynamic.partition=true; set hive.enforce.sorting=true; set optimize.sort.dynamic.partitioning=true; set hive.tez.exec.print.summary=true; set hive.optimize.ppd=true; set hive.optimize.ppd.storage=true; set hive.vectorized.execution.enabled=true; set hive.vectorized.execution.reduce.enabled = true; set hive.cbo.enable=true; set hive.compute.query.using.stats=true; set hive.stats.fetch.column.stats=true; set hive.stats.fetch.partition.stats=true; set hive.tez.auto.reducer.parallelism=true; set hive.tez.max.partition.factor=20; set hive.exec.reducers.bytes.per.reducer=128000000; set hive.optimize.index.filter=true; set hive.exec.orc.skip.corrupt.data=true; set hive.exec.compress.output=true; set tez.am.container.reuse.enabled=TRUE; set hive.compute.query.using.stats=true; set stats.reliable=true; set hive.merge.tezfiles=true;
Our findings:
1. Query works well on non-partitioned tables
2. Query works fine with Tez or MR configured but failed with LLAP.
3. If I remove "CAST(DATE_ADD(NEXT_DAY(`f_daily_funnel_report`.`date`,'SU'),-7) AS DATE) AS `twk_calculation_1485062019336982529_ok`" from select list and group by list the query start working.
In attachment you will find next files:
q3.sql - original queries that failed
q3.err - full execution log from beeline client
Any ideas ?
Thank you,
Created 10-09-2017 02:00 AM
Hi @Yevgen Shramko,
"date" is a reserve word in Hive, not sure what is the difference for the parameter "set hive.support.sql11.reserved.keywords=false/true" for LLAP and non-LLAP execution, hence to avoid the ambiguity, can you please change the column n name from "date" to something (like prepending with some prefix or post-pending with suffix).
Hope this helps !!
Created 10-09-2017 02:36 PM
Hi @bkosaraju,
Thank you for you answer but I don't believe that this the case. Anyway I tested today a few cases:
1. Renamed column name from `date` to dt.
2. Changed column (partition key) type from date to timestamp
3. Changed column type from date to string.
4. Change ORC partitioned table storage properties to 'orc.compress'='SNAPPY'
Nothing has help.
Meanwhile on non-partitioned table if I use "`date` date" specification the queries are also failed until I changed column type to timestamp. With timestamp on non-partitioned table it works.
Thank you,
Yevgen
Created 10-10-2017 03:34 PM
Finally, I found the solution.
set hive.auto.convert.join.noconditionaltask = true; set hive.auto.convert.join.noconditionaltask.size = 2000000;
By playing with hive.auto.convert.join.noconditionaltask.size got adequate performance. Low value provides performance degradation.
Next parameters also might be helpful:
set hive.auto.convert.sortmerge.join=true set hive.optimize.bucketmapjoin=true set hive.optimize.bucketmapjoin.sortedmerge=true set hive.auto.convert.sortmerge.join.noconditionaltask=true set hive.auto.convert.sortmerge.join.bigtable.selection.policy=org.apache.hadoop.hive.ql.optimizer.TableSizeBasedBigTableSelectorForAutoSMJ