Created 12-14-2015 01:30 PM
I have a Spark job running frequently that populates a Hive table backed by ORC files. Spark generates many small files and even using coalesce can't help to efficiently fill the HDFS large blocks. The best solution I found was to schedule a job to concatenate the hive table periodically.
The alter table actually works fine but it raises an exception as you see below:
CREATE TABLE my_test(id String) STORED AS ORC;
ALTER TABLE my_test CONCATENATE;
Loading data to table default.my_test Table default.my_test stats: [numFiles=0, totalSize=0] FAILED: Hive Internal Error: java.lang.IllegalArgumentException(No enum constant org.apache.hadoop.hive.ql.plan.HiveOperation.ALTER_TABLE_MERGE) java.lang.IllegalArgumentException: No enum constant org.apache.hadoop.hive.ql.plan.HiveOperation.ALTER_TABLE_MERGE at java.lang.Enum.valueOf(Enum.java:238) at org.apache.hadoop.hive.ql.plan.HiveOperation.valueOf(HiveOperation.java:23) at org.apache.atlas.hive.hook.HiveHook.run(HiveHook.java:151) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1522) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1195) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1059) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1049) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:213) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:165) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376) at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:736) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:621) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
I also don't understand 2 things:
at org.apache.atlas.hive.hook.HiveHook.run(HiveHook.java:151)
Created 12-16-2015 11:12 AM
And you should be able to remove it in the hive.exec.pre/post/failure.hooks parameter in Ambari/Hive/Config/AdvancedConfig as a workaround if this is really resulting in the error. Perhaps a bug?
Created 12-15-2015 02:07 PM
Notifying @Alan Gates
Created 12-15-2015 02:41 PM
Regarding 1) If you have Atlas installed, it will hook into the post-execution procedure of Hive (hive.exec.post.hooks). This hook ensures that queries or changes are recorded in Atlas (=>Data Governance)
Created 12-17-2015 07:19 AM
I had a quick check on Atlas HiveHook, shouldn't be a bug there because it simply gets the generated operation name string from hook context; hookContext.getOperationName()
Created 12-16-2015 11:12 AM
And you should be able to remove it in the hive.exec.pre/post/failure.hooks parameter in Ambari/Hive/Config/AdvancedConfig as a workaround if this is really resulting in the error. Perhaps a bug?
Created 01-20-2016 10:58 AM
I think this is a bug and the proposed workaround by @Benjamin Leonhardi is the only way to fix the issue so far!
For the record, as you see in the below hiveserver2.log the MR/Tez execution is completed and ATSHook is successfully finished. But post HiveHook caused the problem due to incompatible operation name!
The workaround solution is to remove "org.apache.atlas.hive.hook.HiveHook" from "hive.exec.post.hooks"
.... 2016-01-12 11:00:21,188 INFO [HiveServer2-Background-Pool: Thread-180]: log.PerfLogger (PerfLogger.java:PerfLogBegin(121)) - <PERFLOG method=task.STATS.Stage-2 from=org.apache.hadoop.hive.ql.Driver> 2016-01-12 11:00:21,189 INFO [HiveServer2-Background-Pool: Thread-180]: ql.Driver (Driver.java:launchTask(1653)) - Starting task [Stage-2:STATS] in serial mode 2016-01-12 11:00:21,189 INFO [HiveServer2-Background-Pool: Thread-180]: exec.StatsTask (StatsTask.java:execute(86)) - Executing stats task 2016-01-12 11:00:21,189 INFO [HiveServer2-Background-Pool: Thread-180]: metastore.HiveMetaStore (HiveMetaStore.java:logInfo(747)) - 3: get_table : db=default tbl=my_test 2016-01-12 11:00:21,190 INFO [HiveServer2-Background-Pool: Thread-180]: HiveMetaStore.audit (HiveMetaStore.java:logAuditEvent(372)) - ugi=anonymous ip=unknown-ip-addr cmd=get_table : db=default tbl=my_test 2016-01-12 11:00:21,203 WARN [HiveServer2-Background-Pool: Thread-180]: security.UserGroupInformation (UserGroupInformation.java:getGroupNames(1521)) - No groups available for user anonymous 2016-01-12 11:00:21,203 WARN [HiveServer2-Background-Pool: Thread-180]: security.UserGroupInformation (UserGroupInformation.java:getGroupNames(1521)) - No groups available for user anonymous 2016-01-12 11:00:21,209 INFO [HiveServer2-Background-Pool: Thread-180]: metastore.HiveMetaStore (HiveMetaStore.java:logInfo(747)) - 3: alter_table: db=default tbl=my_test newtbl=my_test 2016-01-12 11:00:21,209 INFO [HiveServer2-Background-Pool: Thread-180]: HiveMetaStore.audit (HiveMetaStore.java:logAuditEvent(372)) - ugi=anonymous ip=unknown-ip-addr cmd=alter_table: db=default tbl=my_test newtbl=my_test 2016-01-12 11:00:21,224 WARN [HiveServer2-Background-Pool: Thread-180]: security.UserGroupInformation (UserGroupInformation.java:getGroupNames(1521)) - No groups available for user anonymous 2016-01-12 11:00:21,224 WARN [HiveServer2-Background-Pool: Thread-180]: security.UserGroupInformation (UserGroupInformation.java:getGroupNames(1521)) - No groups available for user anonymous 2016-01-12 11:00:21,244 INFO [HiveServer2-Background-Pool: Thread-180]: hive.log (MetaStoreUtils.java:updateUnpartitionedTableStatsFast(217)) - Updating table stats fast for my_test 2016-01-12 11:00:21,244 INFO [HiveServer2-Background-Pool: Thread-180]: hive.log (MetaStoreUtils.java:updateUnpartitionedTableStatsFast(219)) - Updated size of table my_test to 0 2016-01-12 11:00:21,253 INFO [HiveServer2-Background-Pool: Thread-180]: exec.Task (SessionState.java:printInfo(951)) - Table default.my_test stats: [numFiles=0, totalSize=0] 2016-01-12 11:00:21,254 INFO [HiveServer2-Background-Pool: Thread-180]: log.PerfLogger (PerfLogger.java:PerfLogEnd(148)) - </PERFLOG method=runTasks start=1452592809214 end=1452592821254 duration=12040 from=org.apache.hadoop.hive.ql.Driver> 2016-01-12 11:00:21,254 INFO [HiveServer2-Background-Pool: Thread-180]: hooks.ATSHook (ATSHook.java:<init>(84)) - Created ATS Hook 2016-01-12 11:00:21,254 INFO [HiveServer2-Background-Pool: Thread-180]: log.PerfLogger (PerfLogger.java:PerfLogBegin(121)) - <PERFLOG method=PostHook.org.apache.hadoop.hive.ql.hooks.ATSHook from=org.apache.hadoop.hive.ql.Driver> 2016-01-12 11:00:21,255 INFO [HiveServer2-Background-Pool: Thread-180]: log.PerfLogger (PerfLogger.java:PerfLogEnd(148)) - </PERFLOG method=PostHook.org.apache.hadoop.hive.ql.hooks.ATSHook start=1452592821254 end=1452592821255 duration=1 from=org.apache.hadoop.hive.ql.Driver> 2016-01-12 11:00:21,255 INFO [HiveServer2-Background-Pool: Thread-180]: log.PerfLogger (PerfLogger.java:PerfLogBegin(121)) - <PERFLOG method=PostHook.org.apache.atlas.hive.hook.HiveHook from=org.apache.hadoop.hive.ql.Driver> 2016-01-12 11:00:21,255 ERROR [HiveServer2-Background-Pool: Thread-180]: ql.Driver (SessionState.java:printError(960)) - FAILED: Hive Internal Error: java.lang.IllegalArgumentException(No enum constant org.apache.hadoop.hive.ql.plan.HiveOperation.ALTER_TABLE_MERGE) java.lang.IllegalArgumentException: No enum constant org.apache.hadoop.hive.ql.plan.HiveOperation.ALTER_TABLE_MERGE at java.lang.Enum.valueOf(Enum.java:238) at org.apache.hadoop.hive.ql.plan.HiveOperation.valueOf(HiveOperation.java:23) at org.apache.atlas.hive.hook.HiveHook.run(HiveHook.java:151) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1522) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1195) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1059) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1054) at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:154) at org.apache.hive.service.cli.operation.SQLOperation.access$100(SQLOperation.java:71) at org.apache.hive.service.cli.operation.SQLOperation$1$1.run(SQLOperation.java:206) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) at org.apache.hive.service.cli.operation.SQLOperation$1.run(SQLOperation.java:218) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) 2016-01-12 11:00:21,255 INFO [HiveServer2-Background-Pool: Thread-180]: log.PerfLogger (PerfLogger.java:PerfLogEnd(148)) - </PERFLOG method=Driver.execute start=1452592809211 end=1452592821255 duration=12044 from=org.apache.hadoop.hive.ql.Driver>