<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Error on concatenating ORC Hive table (merge files) in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Error-on-concatenating-ORC-Hive-table-merge-files/m-p/99126#M12408</link>
    <description>&lt;P&gt;I have a Spark job running frequently that populates a Hive table backed by ORC files. Spark generates many small files and even using coalesce can't help to efficiently fill the HDFS large blocks.
The best solution I found was to schedule a job to concatenate the hive table periodically.&lt;/P&gt;&lt;P&gt;The alter table actually works fine but it raises an exception as you see below:&lt;/P&gt;&lt;P&gt;CREATE TABLE my_test(id String) STORED AS ORC;&lt;/P&gt;&lt;P&gt;ALTER TABLE my_test CONCATENATE;&lt;/P&gt;&lt;PRE&gt;Loading data to table default.my_test
Table default.my_test stats: [numFiles=0, totalSize=0]
FAILED: Hive Internal Error: java.lang.IllegalArgumentException(No enum constant org.apache.hadoop.hive.ql.plan.HiveOperation.ALTER_TABLE_MERGE)
java.lang.IllegalArgumentException: No enum constant org.apache.hadoop.hive.ql.plan.HiveOperation.ALTER_TABLE_MERGE
    at java.lang.Enum.valueOf(Enum.java:238)
    at org.apache.hadoop.hive.ql.plan.HiveOperation.valueOf(HiveOperation.java:23)
    at org.apache.atlas.hive.hook.HiveHook.run(HiveHook.java:151)
    at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1522)
    at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1195)
    at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1059)
    at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1049)
    at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:213)
    at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:165)
    at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376)
    at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:736)
    at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681)
    at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:621)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:497)
    at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:136)&lt;/PRE&gt;&lt;P&gt;I also don't understand 2 things:&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;What is the relation between running simple Hive shell query and Atlas? &lt;PRE&gt;at org.apache.atlas.hive.hook.HiveHook.run(HiveHook.java:151)&lt;/PRE&gt;
&lt;/LI&gt;&lt;LI&gt;Why Hive is trying to use ALTER_TABLE_MERGE enum constant which actually implemented as ALTERTABLE_MERGEFILES? on &lt;A href="https://github.com/hortonworks/hive-release/blob/HDP-2.3.2.0-tag/ql/src/java/org/apache/hadoop/hive/ql/plan/HiveOperation.java"&gt;https://github.com/hortonworks/hive-release/blob/HDP-2.3.2.0-tag/ql/src/java/org/apache/hadoop/hive/ql/plan/HiveOperation.java&lt;/A&gt;&lt;/LI&gt;&lt;/OL&gt;</description>
    <pubDate>Mon, 14 Dec 2015 21:30:18 GMT</pubDate>
    <dc:creator>mahan</dc:creator>
    <dc:date>2015-12-14T21:30:18Z</dc:date>
    <item>
      <title>Error on concatenating ORC Hive table (merge files)</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Error-on-concatenating-ORC-Hive-table-merge-files/m-p/99126#M12408</link>
      <description>&lt;P&gt;I have a Spark job running frequently that populates a Hive table backed by ORC files. Spark generates many small files and even using coalesce can't help to efficiently fill the HDFS large blocks.
The best solution I found was to schedule a job to concatenate the hive table periodically.&lt;/P&gt;&lt;P&gt;The alter table actually works fine but it raises an exception as you see below:&lt;/P&gt;&lt;P&gt;CREATE TABLE my_test(id String) STORED AS ORC;&lt;/P&gt;&lt;P&gt;ALTER TABLE my_test CONCATENATE;&lt;/P&gt;&lt;PRE&gt;Loading data to table default.my_test
Table default.my_test stats: [numFiles=0, totalSize=0]
FAILED: Hive Internal Error: java.lang.IllegalArgumentException(No enum constant org.apache.hadoop.hive.ql.plan.HiveOperation.ALTER_TABLE_MERGE)
java.lang.IllegalArgumentException: No enum constant org.apache.hadoop.hive.ql.plan.HiveOperation.ALTER_TABLE_MERGE
    at java.lang.Enum.valueOf(Enum.java:238)
    at org.apache.hadoop.hive.ql.plan.HiveOperation.valueOf(HiveOperation.java:23)
    at org.apache.atlas.hive.hook.HiveHook.run(HiveHook.java:151)
    at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1522)
    at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1195)
    at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1059)
    at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1049)
    at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:213)
    at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:165)
    at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376)
    at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:736)
    at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681)
    at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:621)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:497)
    at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:136)&lt;/PRE&gt;&lt;P&gt;I also don't understand 2 things:&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;What is the relation between running simple Hive shell query and Atlas? &lt;PRE&gt;at org.apache.atlas.hive.hook.HiveHook.run(HiveHook.java:151)&lt;/PRE&gt;
&lt;/LI&gt;&lt;LI&gt;Why Hive is trying to use ALTER_TABLE_MERGE enum constant which actually implemented as ALTERTABLE_MERGEFILES? on &lt;A href="https://github.com/hortonworks/hive-release/blob/HDP-2.3.2.0-tag/ql/src/java/org/apache/hadoop/hive/ql/plan/HiveOperation.java"&gt;https://github.com/hortonworks/hive-release/blob/HDP-2.3.2.0-tag/ql/src/java/org/apache/hadoop/hive/ql/plan/HiveOperation.java&lt;/A&gt;&lt;/LI&gt;&lt;/OL&gt;</description>
      <pubDate>Mon, 14 Dec 2015 21:30:18 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Error-on-concatenating-ORC-Hive-table-merge-files/m-p/99126#M12408</guid>
      <dc:creator>mahan</dc:creator>
      <dc:date>2015-12-14T21:30:18Z</dc:date>
    </item>
    <item>
      <title>Re: Error on concatenating ORC Hive table (merge files)</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Error-on-concatenating-ORC-Hive-table-merge-files/m-p/99127#M12409</link>
      <description>&lt;P&gt;Notifying &lt;A rel="user" href="https://community.cloudera.com/users/521/gates.html" nodeid="521"&gt;@Alan Gates&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 15 Dec 2015 22:07:26 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Error-on-concatenating-ORC-Hive-table-merge-files/m-p/99127#M12409</guid>
      <dc:creator>SQLShaw</dc:creator>
      <dc:date>2015-12-15T22:07:26Z</dc:date>
    </item>
    <item>
      <title>Re: Error on concatenating ORC Hive table (merge files)</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Error-on-concatenating-ORC-Hive-table-merge-files/m-p/99128#M12410</link>
      <description>&lt;P&gt;Regarding 1) If you have Atlas installed, it will hook into the post-execution procedure of Hive (hive.exec.post.hooks). This hook ensures that queries or changes are recorded in Atlas (=&amp;gt;Data Governance)&lt;/P&gt;</description>
      <pubDate>Tue, 15 Dec 2015 22:41:51 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Error-on-concatenating-ORC-Hive-table-merge-files/m-p/99128#M12410</guid>
      <dc:creator>jstraub</dc:creator>
      <dc:date>2015-12-15T22:41:51Z</dc:date>
    </item>
    <item>
      <title>Re: Error on concatenating ORC Hive table (merge files)</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Error-on-concatenating-ORC-Hive-table-merge-files/m-p/99129#M12411</link>
      <description>&lt;P&gt;And you should be able to remove it in the hive.exec.pre/post/failure.hooks parameter in Ambari/Hive/Config/AdvancedConfig as a workaround if this is really resulting in the error. Perhaps a bug? &lt;/P&gt;</description>
      <pubDate>Wed, 16 Dec 2015 19:12:29 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Error-on-concatenating-ORC-Hive-table-merge-files/m-p/99129#M12411</guid>
      <dc:creator>bleonhardi</dc:creator>
      <dc:date>2015-12-16T19:12:29Z</dc:date>
    </item>
    <item>
      <title>Re: Error on concatenating ORC Hive table (merge files)</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Error-on-concatenating-ORC-Hive-table-merge-files/m-p/99130#M12412</link>
      <description>&lt;P&gt;I had a quick check on Atlas HiveHook, shouldn't be a bug there because it simply gets the generated operation name string from hook context; hookContext.getOperationName()&lt;/P&gt;&lt;P&gt;&lt;A href="https://github.com/hortonworks/atlas-release/blob/HDP-2.3.2.0-tag/addons/hive-bridge/src/main/java/org/apache/atlas/hive/hook/HiveHook.java"&gt;https://github.com/hortonworks/atlas-release/blob/HDP-2.3.2.0-tag/addons/hive-bridge/src/main/java/org/apache/atlas/hive/hook/HiveHook.java&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 17 Dec 2015 15:19:24 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Error-on-concatenating-ORC-Hive-table-merge-files/m-p/99130#M12412</guid>
      <dc:creator>mahan</dc:creator>
      <dc:date>2015-12-17T15:19:24Z</dc:date>
    </item>
    <item>
      <title>Re: Error on concatenating ORC Hive table (merge files)</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Error-on-concatenating-ORC-Hive-table-merge-files/m-p/99131#M12413</link>
      <description>&lt;P&gt;I think this is a bug and the proposed workaround by &lt;A rel="user" href="https://community.cloudera.com/users/168/bleonhardi.html" nodeid="168"&gt;@Benjamin Leonhardi&lt;/A&gt; is the only way to fix the issue so far!&lt;/P&gt;&lt;P&gt;For the record, as you see in the below hiveserver2.log the MR/Tez execution is completed and ATSHook is successfully finished. But post HiveHook caused the problem due to incompatible operation name!&lt;/P&gt;&lt;P&gt;The workaround solution is to remove "org.apache.atlas.hive.hook.HiveHook" from "hive.exec.post.hooks"&lt;/P&gt;&lt;PRE&gt;....
2016-01-12 11:00:21,188 INFO  [HiveServer2-Background-Pool: Thread-180]: log.PerfLogger (PerfLogger.java:PerfLogBegin(121)) - &amp;lt;PERFLOG method=task.STATS.Stage-2 from=org.apache.hadoop.hive.ql.Driver&amp;gt;
2016-01-12 11:00:21,189 INFO  [HiveServer2-Background-Pool: Thread-180]: ql.Driver (Driver.java:launchTask(1653)) - Starting task [Stage-2:STATS] in serial mode
2016-01-12 11:00:21,189 INFO  [HiveServer2-Background-Pool: Thread-180]: exec.StatsTask (StatsTask.java:execute(86)) - Executing stats task
2016-01-12 11:00:21,189 INFO  [HiveServer2-Background-Pool: Thread-180]: metastore.HiveMetaStore (HiveMetaStore.java:logInfo(747)) - 3: get_table : db=default tbl=my_test
2016-01-12 11:00:21,190 INFO  [HiveServer2-Background-Pool: Thread-180]: HiveMetaStore.audit (HiveMetaStore.java:logAuditEvent(372)) - ugi=anonymous    ip=unknown-ip-addr    cmd=get_table : db=default tbl=my_test    
2016-01-12 11:00:21,203 WARN  [HiveServer2-Background-Pool: Thread-180]: security.UserGroupInformation (UserGroupInformation.java:getGroupNames(1521)) - No groups available for user anonymous
2016-01-12 11:00:21,203 WARN  [HiveServer2-Background-Pool: Thread-180]: security.UserGroupInformation (UserGroupInformation.java:getGroupNames(1521)) - No groups available for user anonymous
2016-01-12 11:00:21,209 INFO  [HiveServer2-Background-Pool: Thread-180]: metastore.HiveMetaStore (HiveMetaStore.java:logInfo(747)) - 3: alter_table: db=default tbl=my_test newtbl=my_test
2016-01-12 11:00:21,209 INFO  [HiveServer2-Background-Pool: Thread-180]: HiveMetaStore.audit (HiveMetaStore.java:logAuditEvent(372)) - ugi=anonymous    ip=unknown-ip-addr    cmd=alter_table: db=default tbl=my_test newtbl=my_test    
2016-01-12 11:00:21,224 WARN  [HiveServer2-Background-Pool: Thread-180]: security.UserGroupInformation (UserGroupInformation.java:getGroupNames(1521)) - No groups available for user anonymous
2016-01-12 11:00:21,224 WARN  [HiveServer2-Background-Pool: Thread-180]: security.UserGroupInformation (UserGroupInformation.java:getGroupNames(1521)) - No groups available for user anonymous
2016-01-12 11:00:21,244 INFO  [HiveServer2-Background-Pool: Thread-180]: hive.log (MetaStoreUtils.java:updateUnpartitionedTableStatsFast(217)) - Updating table stats fast for my_test
2016-01-12 11:00:21,244 INFO  [HiveServer2-Background-Pool: Thread-180]: hive.log (MetaStoreUtils.java:updateUnpartitionedTableStatsFast(219)) - Updated size of table my_test to 0
2016-01-12 11:00:21,253 INFO  [HiveServer2-Background-Pool: Thread-180]: exec.Task (SessionState.java:printInfo(951)) - Table default.my_test stats: [numFiles=0, totalSize=0]
2016-01-12 11:00:21,254 INFO  [HiveServer2-Background-Pool: Thread-180]: log.PerfLogger (PerfLogger.java:PerfLogEnd(148)) - &amp;lt;/PERFLOG method=runTasks start=1452592809214 end=1452592821254 duration=12040 from=org.apache.hadoop.hive.ql.Driver&amp;gt;
2016-01-12 11:00:21,254 INFO  [HiveServer2-Background-Pool: Thread-180]: hooks.ATSHook (ATSHook.java:&amp;lt;init&amp;gt;(84)) - Created ATS Hook
2016-01-12 11:00:21,254 INFO  [HiveServer2-Background-Pool: Thread-180]: log.PerfLogger (PerfLogger.java:PerfLogBegin(121)) - &amp;lt;PERFLOG method=PostHook.org.apache.hadoop.hive.ql.hooks.ATSHook from=org.apache.hadoop.hive.ql.Driver&amp;gt;
2016-01-12 11:00:21,255 INFO  [HiveServer2-Background-Pool: Thread-180]: log.PerfLogger (PerfLogger.java:PerfLogEnd(148)) - &amp;lt;/PERFLOG method=PostHook.org.apache.hadoop.hive.ql.hooks.ATSHook start=1452592821254 end=1452592821255 duration=1 from=org.apache.hadoop.hive.ql.Driver&amp;gt;
2016-01-12 11:00:21,255 INFO  [HiveServer2-Background-Pool: Thread-180]: log.PerfLogger (PerfLogger.java:PerfLogBegin(121)) - &amp;lt;PERFLOG method=PostHook.org.apache.atlas.hive.hook.HiveHook from=org.apache.hadoop.hive.ql.Driver&amp;gt;
2016-01-12 11:00:21,255 ERROR [HiveServer2-Background-Pool: Thread-180]: ql.Driver (SessionState.java:printError(960)) - FAILED: Hive Internal Error: java.lang.IllegalArgumentException(No enum constant org.apache.hadoop.hive.ql.plan.HiveOperation.ALTER_TABLE_MERGE)
java.lang.IllegalArgumentException: No enum constant org.apache.hadoop.hive.ql.plan.HiveOperation.ALTER_TABLE_MERGE
    at java.lang.Enum.valueOf(Enum.java:238)
    at org.apache.hadoop.hive.ql.plan.HiveOperation.valueOf(HiveOperation.java:23)
    at org.apache.atlas.hive.hook.HiveHook.run(HiveHook.java:151)
    at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1522)
    at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1195)
    at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1059)
    at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1054)
    at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:154)
    at org.apache.hive.service.cli.operation.SQLOperation.access$100(SQLOperation.java:71)
    at org.apache.hive.service.cli.operation.SQLOperation$1$1.run(SQLOperation.java:206)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
    at org.apache.hive.service.cli.operation.SQLOperation$1.run(SQLOperation.java:218)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)

2016-01-12 11:00:21,255 INFO  [HiveServer2-Background-Pool: Thread-180]: log.PerfLogger (PerfLogger.java:PerfLogEnd(148)) - &amp;lt;/PERFLOG method=Driver.execute start=1452592809211 end=1452592821255 duration=12044 from=org.apache.hadoop.hive.ql.Driver&amp;gt;&lt;/PRE&gt;</description>
      <pubDate>Wed, 20 Jan 2016 18:58:47 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Error-on-concatenating-ORC-Hive-table-merge-files/m-p/99131#M12413</guid>
      <dc:creator>mahan</dc:creator>
      <dc:date>2016-01-20T18:58:47Z</dc:date>
    </item>
  </channel>
</rss>

