<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: INSERT INTO TABLE failing with  error while moving files from org.apache.hadoop.hive.ql.exec.Task in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/INSERT-INTO-TABLE-failing-with-error-while-moving-files-from/m-p/126797#M89525</link>
    <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/5386/christianguegi.html" nodeid="5386"&gt;@Christian Guegi&lt;/A&gt; &lt;/P&gt;&lt;P&gt;You could write your audits to nfs share with shell action. Every few minutes you can load all your audits from that folder into HDFS. This is an example of microbatching strategy. &lt;/P&gt;&lt;P&gt;You can also try putting JMS messages and use storm-jms to direct them to HDFS. This is a streaming approach. &lt;/P&gt;</description>
    <pubDate>Tue, 03 May 2016 11:53:12 GMT</pubDate>
    <dc:creator>ravi1</dc:creator>
    <dc:date>2016-05-03T11:53:12Z</dc:date>
    <item>
      <title>INSERT INTO TABLE failing with  error while moving files from org.apache.hadoop.hive.ql.exec.Task</title>
      <link>https://community.cloudera.com/t5/Support-Questions/INSERT-INTO-TABLE-failing-with-error-while-moving-files-from/m-p/126794#M89522</link>
      <description>&lt;P&gt;Hi all,&lt;/P&gt;&lt;P&gt;We have an
Hive action within a Oozie workflow that throws the following error
occasionally:&lt;/P&gt;&lt;PRE&gt;2016-04-29 16:16:19,129 INFO  [main] ql.Driver (Driver.java:launchTask(1604)) - Starting task [Stage-0:MOVE] in serial mode
73997 [main] INFO  org.apache.hadoop.hive.ql.exec.Task  - Loading data to table audit.aud_tbl_validation_result from hdfs://&amp;lt;nameservice&amp;gt;/tmp/hive/&amp;lt;user&amp;gt;/6218606e-a08c-4912-ad02-6a147165b7d7/hive_2016-04-29_16-15-51_649_3401400148024571575-1/-ext-10000
2016-04-29 16:16:19,129 INFO  [main] exec.Task (SessionState.java:printInfo(824)) - Loading data to table audit.aud_tbl_validation_result from hdfs://&amp;lt;nameservice&amp;gt;/tmp/hive/&amp;lt;user&amp;gt;/6218606e-a08c-4912-ad02-6a147165b7d7/hive_2016-04-29_16-15-51_649_3401400148024571575-1/-ext-10000
76263 [main] INFO  hive.ql.metadata.Hive  - Renaming src:hdfs://&amp;lt;nameservice&amp;gt;/tmp/hive/&amp;lt;user&amp;gt;/6218606e-a08c-4912-ad02-6a147165b7d7/hive_2016-04-29_16-15-51_649_3401400148024571575-1/-ext-10000/000000_0;dest: hdfs://&amp;lt;nameservice&amp;gt;/apps/hive/warehouse/audit.db/aud_tbl_validation_result/000000_0_copy_348;Status:false
2016-04-29 16:16:21,395 INFO  [main] metadata.Hive (Hive.java:renameFile(2461)) - Renaming src:hdfs://&amp;lt;nameservice&amp;gt;/tmp/hive/&amp;lt;user&amp;gt;/6218606e-a08c-4912-ad02-6a147165b7d7/hive_2016-04-29_16-15-51_649_3401400148024571575-1/-ext-10000/000000_0;dest: hdfs://&amp;lt;nameservice&amp;gt;/apps/hive/warehouse/audit.db/aud_tbl_validation_result/000000_0_copy_348;Status:false
76274 [main] ERROR org.apache.hadoop.hive.ql.exec.Task  - Failed with exception copyFiles: error while moving files!!! Cannot move hdfs://&amp;lt;nameservice&amp;gt;/tmp/hive/&amp;lt;user&amp;gt;/6218606e-a08c-4912-ad02-6a147165b7d7/hive_2016-04-29_16-15-51_649_3401400148024571575-1/-ext-10000/000000_0 to hdfs://&amp;lt;nameservice&amp;gt;/apps/hive/warehouse/audit.db/aud_tbl_validation_result/000000_0_copy_348
org.apache.hadoop.hive.ql.metadata.HiveException: copyFiles: error while moving files!!! Cannot move hdfs://&amp;lt;nameservice&amp;gt;/tmp/hive/&amp;lt;user&amp;gt;/6218606e-a08c-4912-ad02-6a147165b7d7/hive_2016-04-29_16-15-51_649_3401400148024571575-1/-ext-10000/000000_0 to hdfs://&amp;lt;nameservice&amp;gt;/apps/hive/warehouse/audit.db/aud_tbl_validation_result/000000_0_copy_348
 at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:2536)
 at org.apache.hadoop.hive.ql.metadata.Table.copyFiles(Table.java:673)
 at org.apache.hadoop.hive.ql.metadata.Hive.loadTable(Hive.java:1571)
 at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:288)
 at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160)
 at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:85)
 at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1606)
 at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1367)
 at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1179)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1006)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:996)
 at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:247)
 at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:199)
 at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:410)
 at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:345)
 at org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:443)
 at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:459)
 at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:739)
 at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:677)
 at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:616)
 at org.apache.oozie.action.hadoop.HiveMain.runHive(HiveMain.java:323)
 at org.apache.oozie.action.hadoop.HiveMain.run(HiveMain.java:284)
 at org.apache.oozie.action.hadoop.LauncherMain.run(LauncherMain.java:39)
 at org.apache.oozie.action.hadoop.HiveMain.main(HiveMain.java:66)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:226)
 at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
 at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
 at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
 at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.io.IOException: Cannot move hdfs://&amp;lt;nameservice&amp;gt;/tmp/hive/&amp;lt;user&amp;gt;/6218606e-a08c-4912-ad02-6a147165b7d7/hive_2016-04-29_16-15-51_649_3401400148024571575-1/-ext-10000/000000_0 to hdfs://&amp;lt;nameservice&amp;gt;/apps/hive/warehouse/audit.db/aud_tbl_validation_result/000000_0_copy_348
 at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:2530)
 ... 36 more&lt;/PRE&gt;&lt;P&gt;The Namenode logs reveals more details --&amp;gt; &lt;STRONG&gt;destination exists!&lt;/STRONG&gt;&lt;/P&gt;&lt;PRE&gt;hadoop-hdfs-namenode-&amp;lt;host&amp;gt;.log.3:2016-04-29 16:16:21,394 WARN  hdfs.StateChange (FSDirectory.java:unprotectedRenameTo(540)) - DIR* FSDirectory.unprotectedRenameTo: failed to rename /tmp/hive/&amp;lt;user&amp;gt;/6218606e-a08c-4912-ad02-6a147165b7d7/hive_2016-04-29_16-15-51_649_3401400148024571575-1/-ext-10000/000000_0 to /apps/hive/warehouse/audit.db/aud_tbl_validation_result/000000_0_copy_348 because destination exists&lt;/PRE&gt;&lt;P&gt;Cross checking HDFS, the file is in the Hive warehouse directory.&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="3925-hdfs.png" style="width: 1066px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/22627iAE2F8C157520C9AA/image-size/medium?v=v2&amp;amp;px=400" role="button" title="3925-hdfs.png" alt="3925-hdfs.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Stack/Settings:&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;HDP 2.2.4, Kerberos enabled, NN HA, Hive ACID disabled&lt;/P&gt;&lt;P&gt;Hive
statements used in Oozie workflow:&lt;/P&gt;&lt;PRE&gt;INSERT INTO TABLE AUDIT.AUD_TBL_BATCH_RUN_LOG VALUES(${Batch_ID},"${Business_DT}", ...);

INSERT INTO TABLE AUDIT.AUD_TBL_VALIDATION_RESULT VALUES(${Batch_ID},"${Job_ID}","${Status}", ...);&lt;/PRE&gt;&lt;P&gt;Hive DLL:&lt;/P&gt;&lt;PRE&gt;CREATE TABLE aud_tbl_batch_run_log (
    AUD_Batch_ID BIGINT,
    AUD_JOB_ID STRING,
    ... )
INTO 10 BUCKETS stored as orc TBLPROPERTIES ('transactional'='false');

CREATE TABLE aud_tbl_batch_validation_result (
    AUD_Batch_ID BIGINT,
    AUD_JOB_ID STRING,
    AUD_STATUS STRING,
    ... )
INTO 10 BUCKETS stored as orc TBLPROPERTIES ('transactional'='false');&lt;/PRE&gt;&lt;P&gt;We see this error occasionally for table aud_tbl_batch_run_log as well as aud_tbl_batch_validation_result.&lt;/P&gt;&lt;P&gt;Why does the file sometimes already exists? How does the insert into table internally works?&lt;/P&gt;&lt;P&gt;Any hints to solve this are highly appreciated.&lt;/P&gt;&lt;P&gt;Thank you &amp;amp; best regards, Chris&lt;/P&gt;</description>
      <pubDate>Mon, 19 Aug 2019 10:03:06 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/INSERT-INTO-TABLE-failing-with-error-while-moving-files-from/m-p/126794#M89522</guid>
      <dc:creator>chri</dc:creator>
      <dc:date>2019-08-19T10:03:06Z</dc:date>
    </item>
    <item>
      <title>Re: INSERT INTO TABLE failing with  error while moving files from org.apache.hadoop.hive.ql.exec.Task</title>
      <link>https://community.cloudera.com/t5/Support-Questions/INSERT-INTO-TABLE-failing-with-error-while-moving-files-from/m-p/126795#M89523</link>
      <description>&lt;P&gt;It looks like you are executing one insert per row. I see you have 900byte files in HDFS. Most likely reason for your error is two workflows going in parallel trying to insert into the table. &lt;/P&gt;&lt;P&gt;Even if you get the insert flow right, 900 byte files in HDFS will create performance hit for hive and NN overload for HDFS. You should try to change your oozie workflow. You should consider microbatching or streaming in your data into HDFS/Hive.&lt;/P&gt;&lt;P&gt;You could write your audits to nfs share with shell action. Every few minutes you can load all your audits from that folder into HDFS. This is an example of microbatching strategy.&lt;/P&gt;&lt;P&gt;You can also try putting JMS messages and use storm-jms to direct them to HDFS. This is a streaming approach.&lt;/P&gt;</description>
      <pubDate>Tue, 03 May 2016 10:49:24 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/INSERT-INTO-TABLE-failing-with-error-while-moving-files-from/m-p/126795#M89523</guid>
      <dc:creator>ravi1</dc:creator>
      <dc:date>2016-05-03T10:49:24Z</dc:date>
    </item>
    <item>
      <title>Re: INSERT INTO TABLE failing with  error while moving files from org.apache.hadoop.hive.ql.exec.Task</title>
      <link>https://community.cloudera.com/t5/Support-Questions/INSERT-INTO-TABLE-failing-with-error-while-moving-files-from/m-p/126796#M89524</link>
      <description>&lt;P&gt;Hi &lt;A rel="user" href="https://community.cloudera.com/users/216/ravi.html" nodeid="216"&gt;@Ravi Mutyala&lt;/A&gt;, Thank you for your response. It's even worser: In sum we have 8 Oozie workflows containing in total 500+ sub-workflows for loading source tables with a Sqoop action. Each sub-workflow contains the above mentioned Hive action for auditing. This creates for each loaded source table two INSERT INTO TABLE statements in Hive.&lt;/P&gt;&lt;P&gt;We hit the error when running those 8 workflows in parallel. Would a sequential execution of the workflows help in this case?&lt;/P&gt;&lt;P&gt;What do you exactly mean with microbatching?&lt;/P&gt;&lt;P&gt;Thanks, Chris&lt;/P&gt;</description>
      <pubDate>Tue, 03 May 2016 11:28:05 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/INSERT-INTO-TABLE-failing-with-error-while-moving-files-from/m-p/126796#M89524</guid>
      <dc:creator>chri</dc:creator>
      <dc:date>2016-05-03T11:28:05Z</dc:date>
    </item>
    <item>
      <title>Re: INSERT INTO TABLE failing with  error while moving files from org.apache.hadoop.hive.ql.exec.Task</title>
      <link>https://community.cloudera.com/t5/Support-Questions/INSERT-INTO-TABLE-failing-with-error-while-moving-files-from/m-p/126797#M89525</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/5386/christianguegi.html" nodeid="5386"&gt;@Christian Guegi&lt;/A&gt; &lt;/P&gt;&lt;P&gt;You could write your audits to nfs share with shell action. Every few minutes you can load all your audits from that folder into HDFS. This is an example of microbatching strategy. &lt;/P&gt;&lt;P&gt;You can also try putting JMS messages and use storm-jms to direct them to HDFS. This is a streaming approach. &lt;/P&gt;</description>
      <pubDate>Tue, 03 May 2016 11:53:12 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/INSERT-INTO-TABLE-failing-with-error-while-moving-files-from/m-p/126797#M89525</guid>
      <dc:creator>ravi1</dc:creator>
      <dc:date>2016-05-03T11:53:12Z</dc:date>
    </item>
    <item>
      <title>Re: INSERT INTO TABLE failing with  error while moving files from org.apache.hadoop.hive.ql.exec.Task</title>
      <link>https://community.cloudera.com/t5/Support-Questions/INSERT-INTO-TABLE-failing-with-error-while-moving-files-from/m-p/126798#M89526</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/5386/christianguegi.html" nodeid="5386"&gt;@Christian Guegi&lt;/A&gt; Do you see this issue only when running multiple worflows in parallel?&lt;/P&gt;</description>
      <pubDate>Fri, 20 May 2016 23:21:54 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/INSERT-INTO-TABLE-failing-with-error-while-moving-files-from/m-p/126798#M89526</guid>
      <dc:creator>vgumashta</dc:creator>
      <dc:date>2016-05-20T23:21:54Z</dc:date>
    </item>
    <item>
      <title>Re: INSERT INTO TABLE failing with  error while moving files from org.apache.hadoop.hive.ql.exec.Task</title>
      <link>https://community.cloudera.com/t5/Support-Questions/INSERT-INTO-TABLE-failing-with-error-while-moving-files-from/m-p/126799#M89527</link>
      <description>&lt;P&gt;Can you please check if you are using KMS in those scenarios you can't copy the data from EZ to other EZ which gives the same kind of error ...To avoid this error use scratch directory which will resolve your issue.&lt;/P&gt;</description>
      <pubDate>Mon, 23 May 2016 05:52:40 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/INSERT-INTO-TABLE-failing-with-error-while-moving-files-from/m-p/126799#M89527</guid>
      <dc:creator>Sreedhar_ch</dc:creator>
      <dc:date>2016-05-23T05:52:40Z</dc:date>
    </item>
    <item>
      <title>Re: INSERT INTO TABLE failing with  error while moving files from org.apache.hadoop.hive.ql.exec.Task</title>
      <link>https://community.cloudera.com/t5/Support-Questions/INSERT-INTO-TABLE-failing-with-error-while-moving-files-from/m-p/126800#M89528</link>
      <description>&lt;P&gt;There is no KMS used in those szenarios.&lt;/P&gt;</description>
      <pubDate>Tue, 31 May 2016 16:55:34 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/INSERT-INTO-TABLE-failing-with-error-while-moving-files-from/m-p/126800#M89528</guid>
      <dc:creator>chri</dc:creator>
      <dc:date>2016-05-31T16:55:34Z</dc:date>
    </item>
    <item>
      <title>Re: INSERT INTO TABLE failing with  error while moving files from org.apache.hadoop.hive.ql.exec.Task</title>
      <link>https://community.cloudera.com/t5/Support-Questions/INSERT-INTO-TABLE-failing-with-error-while-moving-files-from/m-p/126801#M89529</link>
      <description>&lt;P&gt;Yes, we see this issue only when running multiple Oozie worklflows in parallel.&lt;/P&gt;</description>
      <pubDate>Tue, 31 May 2016 16:57:40 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/INSERT-INTO-TABLE-failing-with-error-while-moving-files-from/m-p/126801#M89529</guid>
      <dc:creator>chri</dc:creator>
      <dc:date>2016-05-31T16:57:40Z</dc:date>
    </item>
    <item>
      <title>Re: INSERT INTO TABLE failing with  error while moving files from org.apache.hadoop.hive.ql.exec.Task</title>
      <link>https://community.cloudera.com/t5/Support-Questions/INSERT-INTO-TABLE-failing-with-error-while-moving-files-from/m-p/126802#M89530</link>
      <description>&lt;P&gt;Below our  findings:&lt;/P&gt;&lt;P&gt;As shown in
the DDL above, bucketing is used in the problematic tables. Bucket number gets
decided according to hashing algorithm, out of 10 buckets for each insert 1
bucket will have actual data file and other 9 buckets will have same file name
with zero size. During this hash calculation race condition is happening when inserting
a new row into the bucketed table via multiple different threads/processes, due
to which 2 or more threads/processes are trying to create the same bucket file.&lt;/P&gt;&lt;P&gt;In addition,
as discussed here, the current architecture is not really recommended as over the period of time there would be millions of files on HDFS,
which would create extra overhead on the Namenode. Also select * statement
would take lot of time as it will have to merge all the files from bucket.&lt;/P&gt;&lt;P&gt;Solutions which solved both issues:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;Removed buckets from the two
problematic tables, hece the probability of race conditions will be very less&lt;/LI&gt;&lt;LI&gt;Added hive.support.concurrency=true before the insert statements&lt;/LI&gt;&lt;LI&gt;Weekly Oozie workflow that uses implicit Hive concatenate command on both tables to mitigate the small file problem&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;FYI &lt;A rel="user" href="https://community.cloudera.com/users/216/ravi.html" nodeid="216"&gt;@Ravi Mutyala&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 05 Jul 2016 16:24:07 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/INSERT-INTO-TABLE-failing-with-error-while-moving-files-from/m-p/126802#M89530</guid>
      <dc:creator>chri</dc:creator>
      <dc:date>2016-07-05T16:24:07Z</dc:date>
    </item>
  </channel>
</rss>

