Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Failures when querying on hive tables using bucketed ORC format

Failures when querying on hive tables using bucketed ORC format

New Contributor

I am receiving some errors when I create tables utilizing the ORC format and set them to using a bucket of 1, but receive no errors when I make the tables non-bucketed/non-transactional.

Table Creation Query:

CREATE TABLE IF NOT EXISTS {database}.{table_name} (
<column1> STRING, <column2> STRING, <column3> STRING, <column4> BIGINT) 
CLUSTERED BY (<column1>) INTO 1 BUCKETS 
STORED AS ORC TBLPROPERTIES ("orc.compress"="SNAPPY", "transactional"="true")

I opted to use 1 Bucket and transactional=true because I wanted the ability to delete rows from this table if necessary (particularly important as I am still testing this code)

The tables are created just fine, and I was able to insert data into them. However, I noticed that selecting <column1> in the following statement:

SELECT <column1> FROM <database>.<table> where <column1> = '<some_value>' limit 1

caused this error:

Status: Failed
Vertex failed, vertexName=Map 1, vertexId=vertex_1488404443875_0035_6_00, diagnostics=[Vertex vertex_1488404443875_0035_6_00 [Map 1] killed/failed due to:ROOT_INPUT_INIT_FAILURE, Vertex Input: delta_stats initializer failed, vertex=vertex_1488404443875_0035_6_00 [Map 1], java.lang.RuntimeException: serious problem
    at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1273)
    at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:1300)
    at org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:307)
    at org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:409)
    at org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:155)
    at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:273)
    at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:266)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)
    at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:266)
    at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:253)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
Caused by: java.util.concurrent.ExecutionException: java.lang.ArrayIndexOutOfBoundsException: 1
    at java.util.concurrent.FutureTask.report(FutureTask.java:122)
    at java.util.concurrent.FutureTask.get(FutureTask.java:192)
    at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1268)
    ... 15 more
Caused by: java.lang.ArrayIndexOutOfBoundsException: 1
    at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSargColumnNames(OrcInputFormat.java:358)
    at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.setSearchArgument(OrcInputFormat.java:392)
    at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.callInternal(OrcInputFormat.java:1011)
    at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.access$2000(OrcInputFormat.java:838)
    at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator$1.run(OrcInputFormat.java:992)
    at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator$1.run(OrcInputFormat.java:989)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)
    at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.call(OrcInputFormat.java:989)
    at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.call(OrcInputFormat.java:838)
    ... 4 more
]
DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:0
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Map 1, vertexId=vertex_1488404443875_0035_6_00, diagnostics=[Vertex vertex_1488404443875_0035_6_00 [Map 1] killed/failed due to:ROOT_INPUT_INIT_FAILURE, Vertex Input: delta_stats initializer failed, vertex=vertex_1488404443875_0035_6_00 [Map 1], java.lang.RuntimeException: serious problem
    at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1273)
    at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:1300)
    at org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:307)
    at org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:409)
    at org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:155)
    at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:273)
    at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:266)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)
    at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:266)
    at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:253)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
Caused by: java.util.concurrent.ExecutionException: java.lang.ArrayIndexOutOfBoundsException: 1
    at java.util.concurrent.FutureTask.report(FutureTask.java:122)
    at java.util.concurrent.FutureTask.get(FutureTask.java:192)
    at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1268)
    ... 15 more
Caused by: java.lang.ArrayIndexOutOfBoundsException: 1
    at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSargColumnNames(OrcInputFormat.java:358)
    at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.setSearchArgument(OrcInputFormat.java:392)
    at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.callInternal(OrcInputFormat.java:1011)
    at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.access$2000(OrcInputFormat.java:838)
    at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator$1.run(OrcInputFormat.java:992)
    at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator$1.run(OrcInputFormat.java:989)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)
    at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.call(OrcInputFormat.java:989)
    at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.call(OrcInputFormat.java:838)
    ... 4 more
]DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:0

while selecting all values, as per the following query, returned the full row as expected:

SELECT * FROM <database>.<table> where <column1> = '<some_value>' limit 1;

Both of the queries listed above failed when the table was empty. I deleted all tables in this hive database, and attempted to recreate them using my queries. The tables were created normally, and then an INSERT INTO ... SELECT ... query of mine failed to insert data into the tables, with a similar OrcInput cause by ArrayIndexOutOfBounds:

Vertex failed, vertexName=Map 1, vertexId=vertex_1488404443875_0036_1_00, diagnostics=» Vertex vertex_1488404443875_0036_1_00 » Map 1
 killed/failed due to:ROOT_INPUT_INIT_FAILURE, Vertex Input: delta_new initializer failed, vertex=vertex_1488404443875_0036_1_00 » Map 1
, java.lang.RuntimeException: serious problem
  at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1273)
  at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:1300)
  at org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:307)
  at org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:409)
  at org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:155)
  at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:273)
  at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:266)
  at java.security.AccessController.doPrivileged(Native Method)
  at javax.security.auth.Subject.doAs(Subject.java:422)
  at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)
  at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:266)
  at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:253)
  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
  at java.lang.Thread.run(Thread.java:745)
Caused by: java.util.concurrent.ExecutionException: java.lang.ArrayIndexOutOfBoundsException: 2
  at java.util.concurrent.FutureTask.report(FutureTask.java:122)
  at java.util.concurrent.FutureTask.get(FutureTask.java:192)
  at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1268)
  ... 15 more
Caused by: java.lang.ArrayIndexOutOfBoundsException: 2
  at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSargColumnNames(OrcInputFormat.java:358)
  at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.setSearchArgument(OrcInputFormat.java:392)
  at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.callInternal(OrcInputFormat.java:1011)
  at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.access$2000(OrcInputFormat.java:838)
  at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator$1.run(OrcInputFormat.java:992)
  at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator$1.run(OrcInputFormat.java:989)
  at java.security.AccessController.doPrivileged(Native Method)
  at javax.security.auth.Subject.doAs(Subject.java:422)
  at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)
  at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.call(OrcInputFormat.java:989)
  at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.call(OrcInputFormat.java:838)
  ... 4 more


Vertex killed, vertexName=Reducer 2, vertexId=vertex_1488404443875_0036_1_01, diagnostics=» Vertex received Kill in INITED state., Vertex vertex_1488404443875_0036_1_01 » Reducer 2
 killed/failed due to:OTHER_VERTEX_FAILURE

DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:1

This one came as a surprise, because during my testing I had deleted rows and reinserted data multiple times. So the issue seems to have cropped up all of a sudden and, from what I can see, behaves erratically. I modified the table creation query to not use bucketing/not be transactional:

CREATE TABLE IF NOT EXISTS {database}.{table_name} (
<column1> STRING, <column2> STRING, <column3> STRING, <column4> BIGINT
) 
STORED AS ORC TBLPROPERTIES ("orc.compress"="SNAPPY")

And I can no longer seem to reproduce the errors. This imposes a limitation of never being able to delete rows from this table, short of dropping the table entirely. So it's less than ideal in my situation. Is there something obvious that I seem to be doing wrong/not understanding? Am I hitting some kind of known bug? Any help is greatly appreciated!

2 REPLIES 2

Re: Failures when querying on hive tables using bucketed ORC format

New Contributor
@Josh Giangrande

Which version of the product are you using? Also for the ArrayIndexOutOfBounds exception, could you execute "set hive.enforce.bucketing;" on your client and ensure it is set to true?

Highlighted

Re: Failures when querying on hive tables using bucketed ORC format

New Contributor

Also ensure all of the below ACID properties are set accordingly:

"hive.support.concurrency": true,

"hive.enforce.bucketing": true,

"hive.exec.dynamic.partition.mode": "nonstrict",

"hive.txn.manager": "org.apache.hadoop.hive.ql.lockmgr.DbTxnManager",

"hive.compactor.initiator.on": true.

Don't have an account?
Coming from Hortonworks? Activate your account here