Support Questions

Find answers, ask questions, and share your expertise

Hive query error with Vertex failed on partitioned table by date. Any recommendation?

avatar
Contributor

q3.tar.gz

Hello,

I am taking part in PoC project where we are a looking for solution for interactive analytics (Tableau client)

1. Apache Hive (version 2.1.0.2.6.1.0-129) Driver: Hive JDBC (version 1.2.1000.2.6.1.0-129)

2. We have configured 3 node HDP cluster with Hive + LLAP. All our test tables created in ORC format with "orc.compress"="ZLIB" option.

3. Fact table PARTITIONED BY (`date` date) with dynamic partitions.

4. Tables column statistics were collected for all tables.

Unfortunately some of our test queries have failed with error:

ERROR : Vertex failed, vertexName=Map 1, vertexId=vertex_1507032990279_0050_1_11, diagnostics=[Task failed, taskId=task_1507032990279_0050_1_11_000000, diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( failure ) : attempt_1507032990279_0050_1_11_000000_0:java.lang.RuntimeException: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row

Query runs with next parameters specified explicitly:

set tez.queue.name=llap;
set hive.llap.execution.mode=all;
set hive.execution.engine=tez;
set mapred.reduce.tasks=-1;
set hive.exec.parallel=true;
set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode = nonstrict;
set hive.exec.max.dynamic.partitions.pernode=256;
set hive.exec.max.dynamic.partitions=10000;
set hive.optimize.sort.dynamic.partition=true;
set hive.enforce.sorting=true;
set optimize.sort.dynamic.partitioning=true;
set hive.tez.exec.print.summary=true;
set hive.optimize.ppd=true;
set hive.optimize.ppd.storage=true;
set hive.vectorized.execution.enabled=true;
set hive.vectorized.execution.reduce.enabled = true;
set hive.cbo.enable=true;
set hive.compute.query.using.stats=true;
set hive.stats.fetch.column.stats=true;
set hive.stats.fetch.partition.stats=true;
set hive.tez.auto.reducer.parallelism=true;
set hive.tez.max.partition.factor=20;
set hive.exec.reducers.bytes.per.reducer=128000000;

set hive.optimize.index.filter=true;
set hive.exec.orc.skip.corrupt.data=true;
set hive.exec.compress.output=true;
set tez.am.container.reuse.enabled=TRUE;
set hive.compute.query.using.stats=true;
set stats.reliable=true;
set hive.merge.tezfiles=true;

Our findings:

1. Query works well on non-partitioned tables

2. Query works fine with Tez or MR configured but failed with LLAP.

3. If I remove "CAST(DATE_ADD(NEXT_DAY(`f_daily_funnel_report`.`date`,'SU'),-7) AS DATE) AS `twk_calculation_1485062019336982529_ok`" from select list and group by list the query start working.

In attachment you will find next files:

q3.sql - original queries that failed

q3.err - full execution log from beeline client

Any ideas ?

Thank you,

3 REPLIES 3

avatar
Super Collaborator

Hi @Yevgen Shramko,

"date" is a reserve word in Hive, not sure what is the difference for the parameter "set hive.support.sql11.reserved.keywords=false/true" for LLAP and non-LLAP execution, hence to avoid the ambiguity, can you please change the column n name from "date" to something (like prepending with some prefix or post-pending with suffix).

Hope this helps !!

avatar
Contributor

Hi @bkosaraju,

Thank you for you answer but I don't believe that this the case. Anyway I tested today a few cases:

1. Renamed column name from `date` to dt.

2. Changed column (partition key) type from date to timestamp

3. Changed column type from date to string.

4. Change ORC partitioned table storage properties to 'orc.compress'='SNAPPY'

Nothing has help.

Meanwhile on non-partitioned table if I use "`date` date" specification the queries are also failed until I changed column type to timestamp. With timestamp on non-partitioned table it works.

Thank you,

Yevgen

avatar
Contributor

Finally, I found the solution.

set hive.auto.convert.join.noconditionaltask = true; 
set hive.auto.convert.join.noconditionaltask.size = 2000000; 

By playing with hive.auto.convert.join.noconditionaltask.size got adequate performance. Low value provides performance degradation.

Next parameters also might be helpful:

set hive.auto.convert.sortmerge.join=true 
set hive.optimize.bucketmapjoin=true 
set hive.optimize.bucketmapjoin.sortedmerge=true 
set hive.auto.convert.sortmerge.join.noconditionaltask=true 
set hive.auto.convert.sortmerge.join.bigtable.selection.policy=org.apache.hadoop.hive.ql.optimizer.TableSizeBasedBigTableSelectorForAutoSMJ