Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Hive query error with Vertex failed on partitioned table by date. Any recommendation?

avatar
Contributor

q3.tar.gz

Hello,

I am taking part in PoC project where we are a looking for solution for interactive analytics (Tableau client)

1. Apache Hive (version 2.1.0.2.6.1.0-129) Driver: Hive JDBC (version 1.2.1000.2.6.1.0-129)

2. We have configured 3 node HDP cluster with Hive + LLAP. All our test tables created in ORC format with "orc.compress"="ZLIB" option.

3. Fact table PARTITIONED BY (`date` date) with dynamic partitions.

4. Tables column statistics were collected for all tables.

Unfortunately some of our test queries have failed with error:

ERROR : Vertex failed, vertexName=Map 1, vertexId=vertex_1507032990279_0050_1_11, diagnostics=[Task failed, taskId=task_1507032990279_0050_1_11_000000, diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( failure ) : attempt_1507032990279_0050_1_11_000000_0:java.lang.RuntimeException: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row

Query runs with next parameters specified explicitly:

set tez.queue.name=llap;
set hive.llap.execution.mode=all;
set hive.execution.engine=tez;
set mapred.reduce.tasks=-1;
set hive.exec.parallel=true;
set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode = nonstrict;
set hive.exec.max.dynamic.partitions.pernode=256;
set hive.exec.max.dynamic.partitions=10000;
set hive.optimize.sort.dynamic.partition=true;
set hive.enforce.sorting=true;
set optimize.sort.dynamic.partitioning=true;
set hive.tez.exec.print.summary=true;
set hive.optimize.ppd=true;
set hive.optimize.ppd.storage=true;
set hive.vectorized.execution.enabled=true;
set hive.vectorized.execution.reduce.enabled = true;
set hive.cbo.enable=true;
set hive.compute.query.using.stats=true;
set hive.stats.fetch.column.stats=true;
set hive.stats.fetch.partition.stats=true;
set hive.tez.auto.reducer.parallelism=true;
set hive.tez.max.partition.factor=20;
set hive.exec.reducers.bytes.per.reducer=128000000;

set hive.optimize.index.filter=true;
set hive.exec.orc.skip.corrupt.data=true;
set hive.exec.compress.output=true;
set tez.am.container.reuse.enabled=TRUE;
set hive.compute.query.using.stats=true;
set stats.reliable=true;
set hive.merge.tezfiles=true;

Our findings:

1. Query works well on non-partitioned tables

2. Query works fine with Tez or MR configured but failed with LLAP.

3. If I remove "CAST(DATE_ADD(NEXT_DAY(`f_daily_funnel_report`.`date`,'SU'),-7) AS DATE) AS `twk_calculation_1485062019336982529_ok`" from select list and group by list the query start working.

In attachment you will find next files:

q3.sql - original queries that failed

q3.err - full execution log from beeline client

Any ideas ?

Thank you,

3 REPLIES 3

avatar
Super Collaborator

Hi @Yevgen Shramko,

"date" is a reserve word in Hive, not sure what is the difference for the parameter "set hive.support.sql11.reserved.keywords=false/true" for LLAP and non-LLAP execution, hence to avoid the ambiguity, can you please change the column n name from "date" to something (like prepending with some prefix or post-pending with suffix).

Hope this helps !!

avatar
Contributor

Hi @bkosaraju,

Thank you for you answer but I don't believe that this the case. Anyway I tested today a few cases:

1. Renamed column name from `date` to dt.

2. Changed column (partition key) type from date to timestamp

3. Changed column type from date to string.

4. Change ORC partitioned table storage properties to 'orc.compress'='SNAPPY'

Nothing has help.

Meanwhile on non-partitioned table if I use "`date` date" specification the queries are also failed until I changed column type to timestamp. With timestamp on non-partitioned table it works.

Thank you,

Yevgen

avatar
Contributor

Finally, I found the solution.

set hive.auto.convert.join.noconditionaltask = true; 
set hive.auto.convert.join.noconditionaltask.size = 2000000; 

By playing with hive.auto.convert.join.noconditionaltask.size got adequate performance. Low value provides performance degradation.

Next parameters also might be helpful:

set hive.auto.convert.sortmerge.join=true 
set hive.optimize.bucketmapjoin=true 
set hive.optimize.bucketmapjoin.sortedmerge=true 
set hive.auto.convert.sortmerge.join.noconditionaltask=true 
set hive.auto.convert.sortmerge.join.bigtable.selection.policy=org.apache.hadoop.hive.ql.optimizer.TableSizeBasedBigTableSelectorForAutoSMJ