<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Bug in Hive (HDP 2.1), Timestamp data overwritten during INSERT processing in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/Bug-in-Hive-HDP-2-1-Timestamp-data-overwritten-during-INSERT/m-p/98435#M61548</link>
    <description>&lt;P&gt;Thanks to &lt;A rel="user" href="https://community.cloudera.com/users/222/deepesh.html" nodeid="222"&gt;@Deepesh&lt;/A&gt; for the workaround. Also wanted to add (for info) that these steps will not be required after HDP upgrade. We will use &lt;/P&gt;&lt;PRE&gt; ALTER TABLE activeTable CONCATENATE;
&lt;/PRE&gt;&lt;P&gt;to combine the many smaller ORC files into fewer larger ones (possible from Hive 0.14+).&lt;/P&gt;&lt;P&gt;&lt;A href="https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-AlterTable/PartitionConcatenate"&gt;https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-AlterTable/PartitionConcatenate&lt;/A&gt;&lt;/P&gt;</description>
    <pubDate>Fri, 11 Dec 2015 13:37:26 GMT</pubDate>
    <dc:creator>emilysharpe</dc:creator>
    <dc:date>2015-12-11T13:37:26Z</dc:date>
    <item>
      <title>Bug in Hive (HDP 2.1), Timestamp data overwritten during INSERT processing</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Bug-in-Hive-HDP-2-1-Timestamp-data-overwritten-during-INSERT/m-p/98429#M61542</link>
      <description>&lt;P&gt;
	We have experienced an issue where (re)processing data in Hive overwrites timestamp data.
This occurs with HDP 2.1, but not 2.3.&lt;/P&gt;&lt;P&gt;
	We are using Hive to run an ad hoc 'reorg' or 'reprocess' on existing Hive tables to reduce the number of files stored - improving query performance and reducing pressure on the cluster (found a nice explanation from @david.streever here 
	&lt;A href="https://community.hortonworks.com/questions/4024/how-many-files-is-too-many-on-a-modern-hdp-cluster.html"&gt;https://community.hortonworks.com/questions/4024/how-many-files-is-too-many-on-a-modern-hdp-cluster.html&lt;/A&gt;).&lt;/P&gt;&lt;P&gt;
	The active Hive table is added to daily, creating at least one ORC file per day. The schema contains several timestamp columns (e.g. created_timestamp for when each record was origingally created on the source system).&lt;/P&gt;&lt;P&gt;
	We then create a reorgTable with an identical schema to activeTable, copy the data from activeTable to the reorgTable which combines many of the smaller daily files reducing the overall number.&lt;/P&gt;&lt;P&gt;
	However, this process edits/overwrites timestamp data (and does not touch other columns):&lt;/P&gt;&lt;P&gt;
	1. Contents of activeTable&lt;/P&gt;&lt;P&gt;
	ID    
 created_timestamp&lt;/P&gt;&lt;P&gt;
		01
    2000-01-01 13:08:21.110&lt;/P&gt;&lt;P&gt;
		02
    1970-01-01 01:02:03.450&lt;/P&gt;&lt;P&gt;
		03
   		1990-10-08 03:09:02.780&lt;/P&gt;&lt;P&gt;
		2. Copy data from activeTable to reorgTable&lt;/P&gt;
&lt;PRE&gt;INSERT INTO TABLE reorgTable SELECT * FROM activeTable;
&lt;/PRE&gt;&lt;P&gt;
		3. Contents of the reorgTable&lt;/P&gt;&lt;P&gt;
		ID
		    created_timestamp&lt;/P&gt;&lt;P&gt;
		01
		   1990-10-08 03:09:02.780&lt;/P&gt;&lt;P&gt;
		02
		   1990-10-08 03:09:02.780&lt;/P&gt;&lt;P&gt;
		03
		   1990-10-08 03:09:02.780&lt;/P&gt;&lt;P&gt;
Has anyone else experienced this? Is there a solution other than upgrading?
Or an alternative way to reprocess the data that might not have the same effect?
	&lt;/P&gt;&lt;P&gt;
	Thank you!
		&lt;/P&gt;</description>
      <pubDate>Thu, 10 Dec 2015 12:36:52 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Bug-in-Hive-HDP-2-1-Timestamp-data-overwritten-during-INSERT/m-p/98429#M61542</guid>
      <dc:creator>emilysharpe</dc:creator>
      <dc:date>2015-12-10T12:36:52Z</dc:date>
    </item>
    <item>
      <title>Re: Bug in Hive (HDP 2.1), Timestamp data overwritten during INSERT processing</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Bug-in-Hive-HDP-2-1-Timestamp-data-overwritten-during-INSERT/m-p/98430#M61543</link>
      <description>&lt;P&gt;Hi Emily, what maintenance version are you using for 2.1.x? Also, what's the data type for the "created_timestamp" column?  I assume the DDL for the activeTable is identical to the reorgTable?&lt;/P&gt;</description>
      <pubDate>Thu, 10 Dec 2015 21:18:55 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Bug-in-Hive-HDP-2-1-Timestamp-data-overwritten-during-INSERT/m-p/98430#M61543</guid>
      <dc:creator>SQLShaw</dc:creator>
      <dc:date>2015-12-10T21:18:55Z</dc:date>
    </item>
    <item>
      <title>Re: Bug in Hive (HDP 2.1), Timestamp data overwritten during INSERT processing</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Bug-in-Hive-HDP-2-1-Timestamp-data-overwritten-during-INSERT/m-p/98431#M61544</link>
      <description>&lt;P&gt;Can you try the same with hive.vectorized.execution.enabled = false?&lt;/P&gt;</description>
      <pubDate>Fri, 11 Dec 2015 04:22:53 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Bug-in-Hive-HDP-2-1-Timestamp-data-overwritten-during-INSERT/m-p/98431#M61544</guid>
      <dc:creator>deepesh1</dc:creator>
      <dc:date>2015-12-11T04:22:53Z</dc:date>
    </item>
    <item>
      <title>Re: Bug in Hive (HDP 2.1), Timestamp data overwritten during INSERT processing</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Bug-in-Hive-HDP-2-1-Timestamp-data-overwritten-during-INSERT/m-p/98432#M61545</link>
      <description>&lt;P&gt;Hi Scott, it's HDP 2.1.11 (Hive 0.13.1), and the data type is "timestamp". The DDLs are identical. I am trying to avoid storing the data as a different type but can do this until an upgrade if necessary&lt;/P&gt;</description>
      <pubDate>Fri, 11 Dec 2015 07:31:02 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Bug-in-Hive-HDP-2-1-Timestamp-data-overwritten-during-INSERT/m-p/98432#M61545</guid>
      <dc:creator>emilysharpe</dc:creator>
      <dc:date>2015-12-11T07:31:02Z</dc:date>
    </item>
    <item>
      <title>Re: Bug in Hive (HDP 2.1), Timestamp data overwritten during INSERT processing</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Bug-in-Hive-HDP-2-1-Timestamp-data-overwritten-during-INSERT/m-p/98433#M61546</link>
      <description>&lt;P&gt;Hi Deepesh, gave this a try - worked perfectly! Thank you!&lt;/P&gt;</description>
      <pubDate>Fri, 11 Dec 2015 08:28:46 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Bug-in-Hive-HDP-2-1-Timestamp-data-overwritten-during-INSERT/m-p/98433#M61546</guid>
      <dc:creator>emilysharpe</dc:creator>
      <dc:date>2015-12-11T08:28:46Z</dc:date>
    </item>
    <item>
      <title>Re: Bug in Hive (HDP 2.1), Timestamp data overwritten during INSERT processing</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Bug-in-Hive-HDP-2-1-Timestamp-data-overwritten-during-INSERT/m-p/98434#M61547</link>
      <description>&lt;P&gt;Just elaborating on my above comment which &lt;A rel="user" href="https://community.cloudera.com/users/1104/emilysharpe.html" nodeid="1104"&gt;@Emily Sharpe&lt;/A&gt; has already verified as the workaround.&lt;/P&gt;&lt;P&gt;The issue is in the Vectorization code path, see Apache Hive JIRA &lt;A href="https://issues.apache.org/jira/browse/HIVE-8197"&gt;HIVE-8197&lt;/A&gt;, the issue should be fixed in both HDP 2.2.x and HDP 2.3.x.&lt;/P&gt;&lt;P&gt;The workaround is to disable vectorization by setting hive.vectorized.execution.enabled = false.&lt;/P&gt;</description>
      <pubDate>Fri, 11 Dec 2015 08:57:04 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Bug-in-Hive-HDP-2-1-Timestamp-data-overwritten-during-INSERT/m-p/98434#M61547</guid>
      <dc:creator>deepesh1</dc:creator>
      <dc:date>2015-12-11T08:57:04Z</dc:date>
    </item>
    <item>
      <title>Re: Bug in Hive (HDP 2.1), Timestamp data overwritten during INSERT processing</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Bug-in-Hive-HDP-2-1-Timestamp-data-overwritten-during-INSERT/m-p/98435#M61548</link>
      <description>&lt;P&gt;Thanks to &lt;A rel="user" href="https://community.cloudera.com/users/222/deepesh.html" nodeid="222"&gt;@Deepesh&lt;/A&gt; for the workaround. Also wanted to add (for info) that these steps will not be required after HDP upgrade. We will use &lt;/P&gt;&lt;PRE&gt; ALTER TABLE activeTable CONCATENATE;
&lt;/PRE&gt;&lt;P&gt;to combine the many smaller ORC files into fewer larger ones (possible from Hive 0.14+).&lt;/P&gt;&lt;P&gt;&lt;A href="https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-AlterTable/PartitionConcatenate"&gt;https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-AlterTable/PartitionConcatenate&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 11 Dec 2015 13:37:26 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Bug-in-Hive-HDP-2-1-Timestamp-data-overwritten-during-INSERT/m-p/98435#M61548</guid>
      <dc:creator>emilysharpe</dc:creator>
      <dc:date>2015-12-11T13:37:26Z</dc:date>
    </item>
  </channel>
</rss>

