Support Questions
Find answers, ask questions, and share your expertise

Post HDP 2.5.6 upgrade, few Hive (1.2.1) SELECT queries are not working

Post HDP 2.5.6 upgrade, few Hive (1.2.1) SELECT queries are not working

Background : Recently my HDP 2.5.6 (dev) cluster got upgraded from HDP 2.3.2.
Since then, I face a weird issue while executing a very simple hive SELECT query.

Before this upgrade, my select query worked well in dev environment.
Fortunately, in production, this SELECT query is working everyday (Production HDP version - 2.3.2). Thank god, production HDP is not yet upgraded :) :)

Hive Version (Dev server):

/etc/hive/conf> hive --version 
Hive 1.2.1000. Subversion git:// -r 4db29b14fe24027bc28690a25058192019075e02 
Compiled by jenkins on Mon Jun 26 09:28:06 UTC 2017 From source with checksum 021897025d7879bcef4a8c2f0f8b0949

SELECT query that's giving error follows: (

hive (XYZ_DATABASE)> SELECT count(1) FROM xyz_database.v_fmtrade_big_view v1 inner join 
(select max(breakreportingdate) as maxdate FROM xyz_database.crhs_fmtrade_big_table) t1 
on v1.breakreportingdate = t1.maxdate; 
FAILED: IndexOutOfBoundsException Index: 75, Size: 75

Beauty is, if I split this joined query into two SELECT queries, it works beautifully as follows. But, above joined query is not working after this HDP upgrade. I've checked online if we need to enable/disable any hive parameter. But, couldn't figure out anything.

Anyone faced this kind of error? I'm pretty sure that this needs to attack from parameter setting end - either session level or cluster level.

Query #1

hive (XYZ_DATABASE)> SELECT count(1) FROM XYZ_DATABASE.v_fmtrade_big_view v1;
Time taken: 0.067 seconds, Fetched: 1 row(s)

Query #2:

hive (XYZ_DATABASE)> select max(breakreportingdate) as maxdate FROM XYZ_DATABASE.crhs_fmtrade_big_table;
Time taken: 12.059 seconds, Fetched: 1 row(s)

Below Error Stacktrace is from /tmp/<userid>/hive.log file

( - FAILED: IndexOutOfBoundsException Index: 75, Size: 75 java.lang.IndexOutOfBoundsException: Index: 75, Size: 75 at java.util.ArrayList.rangeCheck( at java.util.ArrayList.get( at org.apache.hadoop.hive.ql.optimizer.ColumnPrunerProcFactory$ColumnPrunerSelectProc.process( at org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch( at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn( at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch( at org.apache.hadoop.hive.ql.optimizer.ColumnPruner$ColumnPrunerWalker.walk( at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking( at org.apache.hadoop.hive.ql.optimizer.ColumnPruner.transform( at org.apache.hadoop.hive.ql.optimizer.Optimizer.optimize( at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal( at org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal( at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze( at org.apache.hadoop.hive.ql.Driver.compile( at org.apache.hadoop.hive.ql.Driver.compile( at org.apache.hadoop.hive.ql.Driver.compileInternal( at org.apache.hadoop.hive.ql.Driver.runInternal( at at at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd( at org.apache.hadoop.hive.cli.CliDriver.processCmd( at org.apache.hadoop.hive.cli.CliDriver.processLine( at org.apache.hadoop.hive.cli.CliDriver.executeDriver( at at org.apache.hadoop.hive.cli.CliDriver.main( at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke( at sun.reflect.DelegatingMethodAccessorImpl.invoke(

Re: Post HDP 2.5.6 upgrade, few Hive (1.2.1) SELECT queries are not working

Adding more info..

I came across this Q&A page. As per this page's last comment(by @gdavy), hive is with version and hive schema version is 2.1. is this a problem to me?