- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Error on concatenating ORC Hive table (merge files)
Created ‎12-14-2015 01:30 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have a Spark job running frequently that populates a Hive table backed by ORC files. Spark generates many small files and even using coalesce can't help to efficiently fill the HDFS large blocks. The best solution I found was to schedule a job to concatenate the hive table periodically.
The alter table actually works fine but it raises an exception as you see below:
CREATE TABLE my_test(id String) STORED AS ORC;
ALTER TABLE my_test CONCATENATE;
Loading data to table default.my_test Table default.my_test stats: [numFiles=0, totalSize=0] FAILED: Hive Internal Error: java.lang.IllegalArgumentException(No enum constant org.apache.hadoop.hive.ql.plan.HiveOperation.ALTER_TABLE_MERGE) java.lang.IllegalArgumentException: No enum constant org.apache.hadoop.hive.ql.plan.HiveOperation.ALTER_TABLE_MERGE at java.lang.Enum.valueOf(Enum.java:238) at org.apache.hadoop.hive.ql.plan.HiveOperation.valueOf(HiveOperation.java:23) at org.apache.atlas.hive.hook.HiveHook.run(HiveHook.java:151) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1522) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1195) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1059) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1049) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:213) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:165) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376) at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:736) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:621) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
I also don't understand 2 things:
- What is the relation between running simple Hive shell query and Atlas?
at org.apache.atlas.hive.hook.HiveHook.run(HiveHook.java:151)
- Why Hive is trying to use ALTER_TABLE_MERGE enum constant which actually implemented as ALTERTABLE_MERGEFILES? on https://github.com/hortonworks/hive-release/blob/HDP-2.3.2.0-tag/ql/src/java/org/apache/hadoop/hive/...
Created ‎12-16-2015 11:12 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
And you should be able to remove it in the hive.exec.pre/post/failure.hooks parameter in Ambari/Hive/Config/AdvancedConfig as a workaround if this is really resulting in the error. Perhaps a bug?
Created ‎12-15-2015 02:07 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Notifying @Alan Gates
Created ‎12-15-2015 02:41 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Regarding 1) If you have Atlas installed, it will hook into the post-execution procedure of Hive (hive.exec.post.hooks). This hook ensures that queries or changes are recorded in Atlas (=>Data Governance)
Created ‎12-17-2015 07:19 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I had a quick check on Atlas HiveHook, shouldn't be a bug there because it simply gets the generated operation name string from hook context; hookContext.getOperationName()
Created ‎12-16-2015 11:12 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
And you should be able to remove it in the hive.exec.pre/post/failure.hooks parameter in Ambari/Hive/Config/AdvancedConfig as a workaround if this is really resulting in the error. Perhaps a bug?
Created ‎01-20-2016 10:58 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I think this is a bug and the proposed workaround by @Benjamin Leonhardi is the only way to fix the issue so far!
For the record, as you see in the below hiveserver2.log the MR/Tez execution is completed and ATSHook is successfully finished. But post HiveHook caused the problem due to incompatible operation name!
The workaround solution is to remove "org.apache.atlas.hive.hook.HiveHook" from "hive.exec.post.hooks"
.... 2016-01-12 11:00:21,188 INFO [HiveServer2-Background-Pool: Thread-180]: log.PerfLogger (PerfLogger.java:PerfLogBegin(121)) - <PERFLOG method=task.STATS.Stage-2 from=org.apache.hadoop.hive.ql.Driver> 2016-01-12 11:00:21,189 INFO [HiveServer2-Background-Pool: Thread-180]: ql.Driver (Driver.java:launchTask(1653)) - Starting task [Stage-2:STATS] in serial mode 2016-01-12 11:00:21,189 INFO [HiveServer2-Background-Pool: Thread-180]: exec.StatsTask (StatsTask.java:execute(86)) - Executing stats task 2016-01-12 11:00:21,189 INFO [HiveServer2-Background-Pool: Thread-180]: metastore.HiveMetaStore (HiveMetaStore.java:logInfo(747)) - 3: get_table : db=default tbl=my_test 2016-01-12 11:00:21,190 INFO [HiveServer2-Background-Pool: Thread-180]: HiveMetaStore.audit (HiveMetaStore.java:logAuditEvent(372)) - ugi=anonymous ip=unknown-ip-addr cmd=get_table : db=default tbl=my_test 2016-01-12 11:00:21,203 WARN [HiveServer2-Background-Pool: Thread-180]: security.UserGroupInformation (UserGroupInformation.java:getGroupNames(1521)) - No groups available for user anonymous 2016-01-12 11:00:21,203 WARN [HiveServer2-Background-Pool: Thread-180]: security.UserGroupInformation (UserGroupInformation.java:getGroupNames(1521)) - No groups available for user anonymous 2016-01-12 11:00:21,209 INFO [HiveServer2-Background-Pool: Thread-180]: metastore.HiveMetaStore (HiveMetaStore.java:logInfo(747)) - 3: alter_table: db=default tbl=my_test newtbl=my_test 2016-01-12 11:00:21,209 INFO [HiveServer2-Background-Pool: Thread-180]: HiveMetaStore.audit (HiveMetaStore.java:logAuditEvent(372)) - ugi=anonymous ip=unknown-ip-addr cmd=alter_table: db=default tbl=my_test newtbl=my_test 2016-01-12 11:00:21,224 WARN [HiveServer2-Background-Pool: Thread-180]: security.UserGroupInformation (UserGroupInformation.java:getGroupNames(1521)) - No groups available for user anonymous 2016-01-12 11:00:21,224 WARN [HiveServer2-Background-Pool: Thread-180]: security.UserGroupInformation (UserGroupInformation.java:getGroupNames(1521)) - No groups available for user anonymous 2016-01-12 11:00:21,244 INFO [HiveServer2-Background-Pool: Thread-180]: hive.log (MetaStoreUtils.java:updateUnpartitionedTableStatsFast(217)) - Updating table stats fast for my_test 2016-01-12 11:00:21,244 INFO [HiveServer2-Background-Pool: Thread-180]: hive.log (MetaStoreUtils.java:updateUnpartitionedTableStatsFast(219)) - Updated size of table my_test to 0 2016-01-12 11:00:21,253 INFO [HiveServer2-Background-Pool: Thread-180]: exec.Task (SessionState.java:printInfo(951)) - Table default.my_test stats: [numFiles=0, totalSize=0] 2016-01-12 11:00:21,254 INFO [HiveServer2-Background-Pool: Thread-180]: log.PerfLogger (PerfLogger.java:PerfLogEnd(148)) - </PERFLOG method=runTasks start=1452592809214 end=1452592821254 duration=12040 from=org.apache.hadoop.hive.ql.Driver> 2016-01-12 11:00:21,254 INFO [HiveServer2-Background-Pool: Thread-180]: hooks.ATSHook (ATSHook.java:<init>(84)) - Created ATS Hook 2016-01-12 11:00:21,254 INFO [HiveServer2-Background-Pool: Thread-180]: log.PerfLogger (PerfLogger.java:PerfLogBegin(121)) - <PERFLOG method=PostHook.org.apache.hadoop.hive.ql.hooks.ATSHook from=org.apache.hadoop.hive.ql.Driver> 2016-01-12 11:00:21,255 INFO [HiveServer2-Background-Pool: Thread-180]: log.PerfLogger (PerfLogger.java:PerfLogEnd(148)) - </PERFLOG method=PostHook.org.apache.hadoop.hive.ql.hooks.ATSHook start=1452592821254 end=1452592821255 duration=1 from=org.apache.hadoop.hive.ql.Driver> 2016-01-12 11:00:21,255 INFO [HiveServer2-Background-Pool: Thread-180]: log.PerfLogger (PerfLogger.java:PerfLogBegin(121)) - <PERFLOG method=PostHook.org.apache.atlas.hive.hook.HiveHook from=org.apache.hadoop.hive.ql.Driver> 2016-01-12 11:00:21,255 ERROR [HiveServer2-Background-Pool: Thread-180]: ql.Driver (SessionState.java:printError(960)) - FAILED: Hive Internal Error: java.lang.IllegalArgumentException(No enum constant org.apache.hadoop.hive.ql.plan.HiveOperation.ALTER_TABLE_MERGE) java.lang.IllegalArgumentException: No enum constant org.apache.hadoop.hive.ql.plan.HiveOperation.ALTER_TABLE_MERGE at java.lang.Enum.valueOf(Enum.java:238) at org.apache.hadoop.hive.ql.plan.HiveOperation.valueOf(HiveOperation.java:23) at org.apache.atlas.hive.hook.HiveHook.run(HiveHook.java:151) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1522) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1195) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1059) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1054) at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:154) at org.apache.hive.service.cli.operation.SQLOperation.access$100(SQLOperation.java:71) at org.apache.hive.service.cli.operation.SQLOperation$1$1.run(SQLOperation.java:206) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) at org.apache.hive.service.cli.operation.SQLOperation$1.run(SQLOperation.java:218) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) 2016-01-12 11:00:21,255 INFO [HiveServer2-Background-Pool: Thread-180]: log.PerfLogger (PerfLogger.java:PerfLogEnd(148)) - </PERFLOG method=Driver.execute start=1452592809211 end=1452592821255 duration=12044 from=org.apache.hadoop.hive.ql.Driver>
