Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Error on concatenating ORC Hive table (merge files)

avatar
Contributor

I have a Spark job running frequently that populates a Hive table backed by ORC files. Spark generates many small files and even using coalesce can't help to efficiently fill the HDFS large blocks. The best solution I found was to schedule a job to concatenate the hive table periodically.

The alter table actually works fine but it raises an exception as you see below:

CREATE TABLE my_test(id String) STORED AS ORC;

ALTER TABLE my_test CONCATENATE;

Loading data to table default.my_test
Table default.my_test stats: [numFiles=0, totalSize=0]
FAILED: Hive Internal Error: java.lang.IllegalArgumentException(No enum constant org.apache.hadoop.hive.ql.plan.HiveOperation.ALTER_TABLE_MERGE)
java.lang.IllegalArgumentException: No enum constant org.apache.hadoop.hive.ql.plan.HiveOperation.ALTER_TABLE_MERGE
    at java.lang.Enum.valueOf(Enum.java:238)
    at org.apache.hadoop.hive.ql.plan.HiveOperation.valueOf(HiveOperation.java:23)
    at org.apache.atlas.hive.hook.HiveHook.run(HiveHook.java:151)
    at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1522)
    at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1195)
    at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1059)
    at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1049)
    at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:213)
    at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:165)
    at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376)
    at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:736)
    at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681)
    at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:621)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:497)
    at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:136)

I also don't understand 2 things:

  1. What is the relation between running simple Hive shell query and Atlas?
    at org.apache.atlas.hive.hook.HiveHook.run(HiveHook.java:151)
  2. Why Hive is trying to use ALTER_TABLE_MERGE enum constant which actually implemented as ALTERTABLE_MERGEFILES? on https://github.com/hortonworks/hive-release/blob/HDP-2.3.2.0-tag/ql/src/java/org/apache/hadoop/hive/...
1 ACCEPTED SOLUTION

avatar
Master Guru

And you should be able to remove it in the hive.exec.pre/post/failure.hooks parameter in Ambari/Hive/Config/AdvancedConfig as a workaround if this is really resulting in the error. Perhaps a bug?

View solution in original post

5 REPLIES 5

avatar

Notifying @Alan Gates

avatar

Regarding 1) If you have Atlas installed, it will hook into the post-execution procedure of Hive (hive.exec.post.hooks). This hook ensures that queries or changes are recorded in Atlas (=>Data Governance)

avatar
Contributor

I had a quick check on Atlas HiveHook, shouldn't be a bug there because it simply gets the generated operation name string from hook context; hookContext.getOperationName()

https://github.com/hortonworks/atlas-release/blob/HDP-2.3.2.0-tag/addons/hive-bridge/src/main/java/o...

avatar
Master Guru

And you should be able to remove it in the hive.exec.pre/post/failure.hooks parameter in Ambari/Hive/Config/AdvancedConfig as a workaround if this is really resulting in the error. Perhaps a bug?

avatar
Contributor

I think this is a bug and the proposed workaround by @Benjamin Leonhardi is the only way to fix the issue so far!

For the record, as you see in the below hiveserver2.log the MR/Tez execution is completed and ATSHook is successfully finished. But post HiveHook caused the problem due to incompatible operation name!

The workaround solution is to remove "org.apache.atlas.hive.hook.HiveHook" from "hive.exec.post.hooks"

....
2016-01-12 11:00:21,188 INFO  [HiveServer2-Background-Pool: Thread-180]: log.PerfLogger (PerfLogger.java:PerfLogBegin(121)) - <PERFLOG method=task.STATS.Stage-2 from=org.apache.hadoop.hive.ql.Driver>
2016-01-12 11:00:21,189 INFO  [HiveServer2-Background-Pool: Thread-180]: ql.Driver (Driver.java:launchTask(1653)) - Starting task [Stage-2:STATS] in serial mode
2016-01-12 11:00:21,189 INFO  [HiveServer2-Background-Pool: Thread-180]: exec.StatsTask (StatsTask.java:execute(86)) - Executing stats task
2016-01-12 11:00:21,189 INFO  [HiveServer2-Background-Pool: Thread-180]: metastore.HiveMetaStore (HiveMetaStore.java:logInfo(747)) - 3: get_table : db=default tbl=my_test
2016-01-12 11:00:21,190 INFO  [HiveServer2-Background-Pool: Thread-180]: HiveMetaStore.audit (HiveMetaStore.java:logAuditEvent(372)) - ugi=anonymous    ip=unknown-ip-addr    cmd=get_table : db=default tbl=my_test    
2016-01-12 11:00:21,203 WARN  [HiveServer2-Background-Pool: Thread-180]: security.UserGroupInformation (UserGroupInformation.java:getGroupNames(1521)) - No groups available for user anonymous
2016-01-12 11:00:21,203 WARN  [HiveServer2-Background-Pool: Thread-180]: security.UserGroupInformation (UserGroupInformation.java:getGroupNames(1521)) - No groups available for user anonymous
2016-01-12 11:00:21,209 INFO  [HiveServer2-Background-Pool: Thread-180]: metastore.HiveMetaStore (HiveMetaStore.java:logInfo(747)) - 3: alter_table: db=default tbl=my_test newtbl=my_test
2016-01-12 11:00:21,209 INFO  [HiveServer2-Background-Pool: Thread-180]: HiveMetaStore.audit (HiveMetaStore.java:logAuditEvent(372)) - ugi=anonymous    ip=unknown-ip-addr    cmd=alter_table: db=default tbl=my_test newtbl=my_test    
2016-01-12 11:00:21,224 WARN  [HiveServer2-Background-Pool: Thread-180]: security.UserGroupInformation (UserGroupInformation.java:getGroupNames(1521)) - No groups available for user anonymous
2016-01-12 11:00:21,224 WARN  [HiveServer2-Background-Pool: Thread-180]: security.UserGroupInformation (UserGroupInformation.java:getGroupNames(1521)) - No groups available for user anonymous
2016-01-12 11:00:21,244 INFO  [HiveServer2-Background-Pool: Thread-180]: hive.log (MetaStoreUtils.java:updateUnpartitionedTableStatsFast(217)) - Updating table stats fast for my_test
2016-01-12 11:00:21,244 INFO  [HiveServer2-Background-Pool: Thread-180]: hive.log (MetaStoreUtils.java:updateUnpartitionedTableStatsFast(219)) - Updated size of table my_test to 0
2016-01-12 11:00:21,253 INFO  [HiveServer2-Background-Pool: Thread-180]: exec.Task (SessionState.java:printInfo(951)) - Table default.my_test stats: [numFiles=0, totalSize=0]
2016-01-12 11:00:21,254 INFO  [HiveServer2-Background-Pool: Thread-180]: log.PerfLogger (PerfLogger.java:PerfLogEnd(148)) - </PERFLOG method=runTasks start=1452592809214 end=1452592821254 duration=12040 from=org.apache.hadoop.hive.ql.Driver>
2016-01-12 11:00:21,254 INFO  [HiveServer2-Background-Pool: Thread-180]: hooks.ATSHook (ATSHook.java:<init>(84)) - Created ATS Hook
2016-01-12 11:00:21,254 INFO  [HiveServer2-Background-Pool: Thread-180]: log.PerfLogger (PerfLogger.java:PerfLogBegin(121)) - <PERFLOG method=PostHook.org.apache.hadoop.hive.ql.hooks.ATSHook from=org.apache.hadoop.hive.ql.Driver>
2016-01-12 11:00:21,255 INFO  [HiveServer2-Background-Pool: Thread-180]: log.PerfLogger (PerfLogger.java:PerfLogEnd(148)) - </PERFLOG method=PostHook.org.apache.hadoop.hive.ql.hooks.ATSHook start=1452592821254 end=1452592821255 duration=1 from=org.apache.hadoop.hive.ql.Driver>
2016-01-12 11:00:21,255 INFO  [HiveServer2-Background-Pool: Thread-180]: log.PerfLogger (PerfLogger.java:PerfLogBegin(121)) - <PERFLOG method=PostHook.org.apache.atlas.hive.hook.HiveHook from=org.apache.hadoop.hive.ql.Driver>
2016-01-12 11:00:21,255 ERROR [HiveServer2-Background-Pool: Thread-180]: ql.Driver (SessionState.java:printError(960)) - FAILED: Hive Internal Error: java.lang.IllegalArgumentException(No enum constant org.apache.hadoop.hive.ql.plan.HiveOperation.ALTER_TABLE_MERGE)
java.lang.IllegalArgumentException: No enum constant org.apache.hadoop.hive.ql.plan.HiveOperation.ALTER_TABLE_MERGE
    at java.lang.Enum.valueOf(Enum.java:238)
    at org.apache.hadoop.hive.ql.plan.HiveOperation.valueOf(HiveOperation.java:23)
    at org.apache.atlas.hive.hook.HiveHook.run(HiveHook.java:151)
    at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1522)
    at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1195)
    at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1059)
    at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1054)
    at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:154)
    at org.apache.hive.service.cli.operation.SQLOperation.access$100(SQLOperation.java:71)
    at org.apache.hive.service.cli.operation.SQLOperation$1$1.run(SQLOperation.java:206)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
    at org.apache.hive.service.cli.operation.SQLOperation$1.run(SQLOperation.java:218)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)

2016-01-12 11:00:21,255 INFO  [HiveServer2-Background-Pool: Thread-180]: log.PerfLogger (PerfLogger.java:PerfLogEnd(148)) - </PERFLOG method=Driver.execute start=1452592809211 end=1452592821255 duration=12044 from=org.apache.hadoop.hive.ql.Driver>