<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: &amp;quot;Load data into table&amp;quot; behavior is different between HDP 2.6.2 and HDP 2.3.0 in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/quot-Load-data-into-table-quot-behavior-is-different-between/m-p/185923#M73364</link>
    <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/11773/vgarg.html" nodeid="11773"&gt;@vgarg&lt;/A&gt; &lt;/P&gt;&lt;P&gt;Thanks for checking and the reply.&lt;/P&gt;&lt;P&gt;I have opened JIRA to report this.&lt;/P&gt;&lt;P&gt;&lt;A href="https://issues.apache.org/jira/browse/HIVE-18563" target="_blank"&gt;https://issues.apache.org/jira/browse/HIVE-18563&lt;/A&gt;&lt;/P&gt;&lt;P&gt;I can use the workaround for this issue.&lt;/P&gt;&lt;P&gt;Regards,&lt;/P&gt;&lt;P&gt;Jun&lt;/P&gt;</description>
    <pubDate>Mon, 29 Jan 2018 17:43:32 GMT</pubDate>
    <dc:creator>odajun</dc:creator>
    <dc:date>2018-01-29T17:43:32Z</dc:date>
    <item>
      <title>"Load data into table" behavior is different between HDP 2.6.2 and HDP 2.3.0</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/quot-Load-data-into-table-quot-behavior-is-different-between/m-p/185921#M73362</link>
      <description>&lt;P&gt;After upgrading HDP from 2.3.2.0 to 2.6.2.0, "load data into table" behavior changed.&lt;/P&gt;&lt;P&gt;The input data is hourly data. All file names is same name.&lt;/P&gt;&lt;PRE&gt;/user/user1/logs/yyyymmdd/00/part-r-00000.gz
/user/user1/logs/yyyymmdd/01/part-r-00000.gz
/user/user1/logs/yyyymmdd/02/part-r-00000.gz
/user/user1/logs/yyyymmdd/03/part-r-00000.gz
・・・・・・・・・・・・・・・・・・・・・・・・・・・・・・・・・・・・・・・・・・・・・・・・・・・・・
/user/user1/logs/yyyymmdd/22/part-r-00000.gz
/user/user1/logs/yyyymmdd/23/part-r-00000.gz
&lt;/PRE&gt;&lt;P&gt;Before upgrade (HDP 2.3.2.0 )&lt;/P&gt;&lt;PRE&gt;HQL
hive&amp;gt; load data inpath '/user/user1/logs/yyyymmdd/*/*.gz' into table sample_db.sample_tbl partition (dt='yyyymmdd');


Result
/hive/warehouse/sample_db.db/sample_tbl/dt=yyyymmdd
/hive/warehouse/sample_db.db/sample_tbl/dt=yyyymmdd/part-r-00000.gz
/hive/warehouse/sample_db.db/sample_tbl/dt=yyyymmdd/part-r-00000_copy_1.gz
/hive/warehouse/sample_db.db/sample_tbl/dt=yyyymmdd/part-r-00000_copy_10.gz
/hive/warehouse/sample_db.db/sample_tbl/dt=yyyymmdd/part-r-00000_copy_11.gz
/hive/warehouse/sample_db.db/sample_tbl/dt=yyyymmdd/part-r-00000_copy_12.gz
/hive/warehouse/sample_db.db/sample_tbl/dt=yyyymmdd/part-r-00000_copy_13.gz
/hive/warehouse/sample_db.db/sample_tbl/dt=yyyymmdd/part-r-00000_copy_14.gz
/hive/warehouse/sample_db.db/sample_tbl/dt=yyyymmdd/part-r-00000_copy_15.gz
/hive/warehouse/sample_db.db/sample_tbl/dt=yyyymmdd/part-r-00000_copy_16.gz
/hive/warehouse/sample_db.db/sample_tbl/dt=yyyymmdd/part-r-00000_copy_17.gz
/hive/warehouse/sample_db.db/sample_tbl/dt=yyyymmdd/part-r-00000_copy_18.gz
/hive/warehouse/sample_db.db/sample_tbl/dt=yyyymmdd/part-r-00000_copy_19.gz
/hive/warehouse/sample_db.db/sample_tbl/dt=yyyymmdd/part-r-00000_copy_2.gz
/hive/warehouse/sample_db.db/sample_tbl/dt=yyyymmdd/part-r-00000_copy_20.gz
/hive/warehouse/sample_db.db/sample_tbl/dt=yyyymmdd/part-r-00000_copy_21.gz
/hive/warehouse/sample_db.db/sample_tbl/dt=yyyymmdd/part-r-00000_copy_22.gz
/hive/warehouse/sample_db.db/sample_tbl/dt=yyyymmdd/part-r-00000_copy_23.gz
/hive/warehouse/sample_db.db/sample_tbl/dt=yyyymmdd/part-r-00000_copy_3.gz
/hive/warehouse/sample_db.db/sample_tbl/dt=yyyymmdd/part-r-00000_copy_4.gz
/hive/warehouse/sample_db.db/sample_tbl/dt=yyyymmdd/part-r-00000_copy_5.gz
/hive/warehouse/sample_db.db/sample_tbl/dt=yyyymmdd/part-r-00000_copy_6.gz
/hive/warehouse/sample_db.db/sample_tbl/dt=yyyymmdd/part-r-00000_copy_7.gz
/hive/warehouse/sample_db.db/sample_tbl/dt=yyyymmdd/part-r-00000_copy_8.gz
/hive/warehouse/sample_db.db/sample_tbl/dt=yyyymmdd/part-r-00000_copy_9.gz
&lt;/PRE&gt;&lt;P&gt;All files were renamed into part-r-0000_copy_*.gz without the file part-r-0000.gz.&lt;/P&gt;&lt;P&gt;After upgrade(HDP 2.6.2.0 )&lt;/P&gt;&lt;PRE&gt;HQL
hive&amp;gt; load data inpath '/user/user1/logs/yyyymmdd/*/*.gz' into table sample_db.sample_tbl partition (dt='yyyymmdd');

Result
/hive/warehouse/sample_db.db/sample_tbl/dt=yyyymmdd
/hive/warehouse/sample_db.db/sample_tbl/dt=yyyymmdd/part-r-00000.gz
&lt;/PRE&gt;&lt;P&gt;There is only part-r-0000.gz.&lt;/P&gt;&lt;P&gt;This file was the same file as part-r-0000_copy_23.gz.&lt;/P&gt;&lt;P&gt;When files are loaded one by one, I can load all files like as HDP 2.3.2.0 environment.&lt;/P&gt;&lt;P&gt;Why is the behavior different between 2.3.2.0 and 2.6.2.0 ?&lt;/P&gt;&lt;P&gt;Thanks in advance&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;OS : CentOS6&lt;/LI&gt;&lt;LI&gt;JDK : 1.8.0_152(Oracle) &lt;/LI&gt;&lt;LI&gt;HDP : 2.3.2.0 and 2.6.2.0&lt;/LI&gt;&lt;LI&gt;Hive : 1.2.1.2.3.2.0-2950 and 1.2.1000.2.6.2.0-205&lt;/LI&gt;&lt;/UL&gt;</description>
      <pubDate>Thu, 11 Jan 2018 18:49:05 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/quot-Load-data-into-table-quot-behavior-is-different-between/m-p/185921#M73362</guid>
      <dc:creator>odajun</dc:creator>
      <dc:date>2018-01-11T18:49:05Z</dc:date>
    </item>
    <item>
      <title>Re: "Load data into table" behavior is different between HDP 2.6.2 and HDP 2.3.0</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/quot-Load-data-into-table-quot-behavior-is-different-between/m-p/185922#M73363</link>
      <description>&lt;P&gt;This looks like BUG (regression). I am able to observe/reproduce the same behavior on latest hive master. Though I haven't confirmed that it was working in previous version. Feel free to open a JIRA to report this.&lt;/P&gt;&lt;P&gt;EDIT: Digged more into the code and found a workaround for this. Use set hive.mv.files.thread=0. This will disable parallel load of directories and LOAD should be able to load all directories by renaming them. &lt;/P&gt;&lt;P&gt;This is definitely a bug which needs to be fixed. Please go ahead with the JIRA report if you can. Otherwise let me know and I'll file one.&lt;/P&gt;</description>
      <pubDate>Tue, 23 Jan 2018 07:59:18 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/quot-Load-data-into-table-quot-behavior-is-different-between/m-p/185922#M73363</guid>
      <dc:creator>vgarg</dc:creator>
      <dc:date>2018-01-23T07:59:18Z</dc:date>
    </item>
    <item>
      <title>Re: "Load data into table" behavior is different between HDP 2.6.2 and HDP 2.3.0</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/quot-Load-data-into-table-quot-behavior-is-different-between/m-p/185923#M73364</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/11773/vgarg.html" nodeid="11773"&gt;@vgarg&lt;/A&gt; &lt;/P&gt;&lt;P&gt;Thanks for checking and the reply.&lt;/P&gt;&lt;P&gt;I have opened JIRA to report this.&lt;/P&gt;&lt;P&gt;&lt;A href="https://issues.apache.org/jira/browse/HIVE-18563" target="_blank"&gt;https://issues.apache.org/jira/browse/HIVE-18563&lt;/A&gt;&lt;/P&gt;&lt;P&gt;I can use the workaround for this issue.&lt;/P&gt;&lt;P&gt;Regards,&lt;/P&gt;&lt;P&gt;Jun&lt;/P&gt;</description>
      <pubDate>Mon, 29 Jan 2018 17:43:32 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/quot-Load-data-into-table-quot-behavior-is-different-between/m-p/185923#M73364</guid>
      <dc:creator>odajun</dc:creator>
      <dc:date>2018-01-29T17:43:32Z</dc:date>
    </item>
  </channel>
</rss>

