<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Issue with String to Timestamp Conversion in Hive and Spark Resulting in Unexpected Hour Changes in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/Issue-with-String-to-Timestamp-Conversion-in-Hive-and-Spark/m-p/391590#M247682</link>
    <description>&lt;P&gt;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/82698"&gt;@smruti&lt;/a&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I've validated it with the user, and here's what happens.&lt;/P&gt;&lt;P&gt;The date field comes in string format and it only comes with 8 fields "Yyyy-mm-dd" and it was converting to time stamp which in this case aggregates the time fields, and that's where the divergence was.&lt;/P&gt;&lt;P&gt;Is there anything in the tool that can be done to standardize this or do I need to change the data engineering?&lt;/P&gt;</description>
    <pubDate>Fri, 09 Aug 2024 18:03:48 GMT</pubDate>
    <dc:creator>evanle96</dc:creator>
    <dc:date>2024-08-09T18:03:48Z</dc:date>
    <item>
      <title>Issue with String to Timestamp Conversion in Hive and Spark Resulting in Unexpected Hour Changes</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Issue-with-String-to-Timestamp-Conversion-in-Hive-and-Spark/m-p/391312#M247559</link>
      <description>&lt;P&gt;&lt;SPAN&gt;We are experiencing an issue with converting data from string to timestamp in Hive, where the timestamps undergo unexpected hour changes during conversion and processing. &lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;We would like to verify if this could be a bug in the tool or if there are additional configurations required. &lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Process and Steps to Reproduce the Issue: Loading Data into HDFS:&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;1. We use the hdfs put command to load a positional file into HDFS and store it in a directory accessible via an external table ex_sa_beneficios_mantidos where the data is in string format. &lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;2. We create an external table in Hive to access the data as strings. &lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;3. Converting String to Timestamp and Writing to ORC Table: When converting the string to timestamp, we notice an alteration in the hour of the dt_afast_trabalho field. &lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;4. Verifying the Conversion and Data Reading: When reading the data from the ORC table, the hours in the dt_afast_trabalho field were different from expected. &lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;5. Counting Timestamps by Hour: We performed a count by hour of the timestamps to verify the changes. Spark Configuration in Zeppelin: When initializing Hive in Zeppelin, we used the following parameters: &lt;/SPAN&gt;&lt;/P&gt;&lt;LI-SPOILER&gt;%livy.spark from pyspark_llap import HiveWarehouseSession hive = HiveWarehouseSession.session(spark).build() spark.conf.set("spark.datasource.hive.warehouse.read.mode", value="SECURE_ACCESS") spark.conf.set("spark.datasource.hive.warehouse.load.staging.dir", value="hdfs://pcdpclusterdatalakennha:8020/dados/datalake/zona_temporaria/dadosprovisorios/hwc") spark.conf.set("spark.sql.legacy.allowCreatingManagedTableUsingNoemptyLocation", value="true") spark.conf.set("spark.sql.session.timeZone", "America/Sao_Paulo")&lt;/LI-SPOILER&gt;&lt;P&gt;&lt;SPAN&gt;Expected Behavior: We expect the hour values of the dt_afast_trabalho field to remain consistent throughout the conversion and processing stages, respecting the America/Sao_Paulo timezone. &lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Current Behavior: We observe that the hour values of the dt_afast_trabalho field are being altered during the process of converting from string to timestamp and subsequent processing in Hive and Spark.&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="thumbnail_image-20240801-204117.png" style="width: 665px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/41344iCD6B5296D2FE59A2/image-size/large?v=v2&amp;amp;px=999" role="button" title="thumbnail_image-20240801-204117.png" alt="thumbnail_image-20240801-204117.png" /&gt;&lt;/span&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="thumbnail_image-20240801-204102.png" style="width: 646px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/41342iDDAD3591A7EDB463/image-size/large?v=v2&amp;amp;px=999" role="button" title="thumbnail_image-20240801-204102.png" alt="thumbnail_image-20240801-204102.png" /&gt;&lt;/span&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="thumbnail_image-20240801-204045.png" style="width: 770px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/41343i3C0F07FA1B7EB138/image-size/large?v=v2&amp;amp;px=999" role="button" title="thumbnail_image-20240801-204045.png" alt="thumbnail_image-20240801-204045.png" /&gt;&lt;/span&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="thumbnail_image-20240801-204011.png" style="width: 999px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/41345iF9B368EDC1877EC8/image-size/large?v=v2&amp;amp;px=999" role="button" title="thumbnail_image-20240801-204011.png" alt="thumbnail_image-20240801-204011.png" /&gt;&lt;/span&gt;&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 02 Aug 2024 14:49:05 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Issue-with-String-to-Timestamp-Conversion-in-Hive-and-Spark/m-p/391312#M247559</guid>
      <dc:creator>evanle96</dc:creator>
      <dc:date>2024-08-02T14:49:05Z</dc:date>
    </item>
    <item>
      <title>Re: Issue with String to Timestamp Conversion in Hive and Spark Resulting in Unexpected Hour Changes</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Issue-with-String-to-Timestamp-Conversion-in-Hive-and-Spark/m-p/391352#M247579</link>
      <description>&lt;P&gt;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/113842"&gt;@evanle96&lt;/a&gt;,&amp;nbsp;Welcome to our community! To help you get the best possible answer, I have tagged our Hive/Spark experts&amp;nbsp;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/78612"&gt;@RangaReddy&lt;/a&gt;&amp;nbsp;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/71090"&gt;@asish&lt;/a&gt;&amp;nbsp;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/82698"&gt;@smruti&lt;/a&gt;&amp;nbsp; who may assist you further.&lt;BR /&gt;&lt;BR /&gt;Please feel free to provide any additional information or details about your query, and we hope that you will find a satisfactory solution to your question.&lt;/P&gt;</description>
      <pubDate>Mon, 05 Aug 2024 09:59:28 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Issue-with-String-to-Timestamp-Conversion-in-Hive-and-Spark/m-p/391352#M247579</guid>
      <dc:creator>VidyaSargur</dc:creator>
      <dc:date>2024-08-05T09:59:28Z</dc:date>
    </item>
    <item>
      <title>Re: Issue with String to Timestamp Conversion in Hive and Spark Resulting in Unexpected Hour Changes</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Issue-with-String-to-Timestamp-Conversion-in-Hive-and-Spark/m-p/391356#M247581</link>
      <description>&lt;P&gt;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/113842"&gt;@evanle96&lt;/a&gt;&amp;nbsp;Hive would treat the &lt;A href="https://docs.cloudera.com/cdp-private-cloud-upgrade/latest/release-guide/topics/hive-cast.html" target="_blank" rel="noopener"&gt;timestamp data to be in UTC format&lt;/A&gt;, and when we read it, it converts it from UTC to local time. But I understand that the actual data is already in&amp;nbsp;&lt;SPAN&gt;America/Sao_Paulo tz. It'd be interesting to know how do we transform the epoch data in string format to timestamp in ORC?&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;from_utc_timestamp(to_utc_timestamp(dt_afast_trabalho, 'America/Sao_Paulo'), 'America/Sao_Paulo') AS sao_paulo_time_corrected&lt;BR /&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Can we try reading with the above transformation and see if the house field shows up as expected?&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 05 Aug 2024 10:31:46 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Issue-with-String-to-Timestamp-Conversion-in-Hive-and-Spark/m-p/391356#M247581</guid>
      <dc:creator>smruti</dc:creator>
      <dc:date>2024-08-05T10:31:46Z</dc:date>
    </item>
    <item>
      <title>Re: Issue with String to Timestamp Conversion in Hive and Spark Resulting in Unexpected Hour Changes</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Issue-with-String-to-Timestamp-Conversion-in-Hive-and-Spark/m-p/391553#M247670</link>
      <description>&lt;P&gt;Hello&amp;nbsp;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/82698"&gt;@smruti&lt;/a&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Here's a printout of a query in the data source table in string format, I'm going to provide a query in ORC format&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="08ddc420-429f-48e7-a272-041cfc7bf91a.png" style="width: 999px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/41402iC057132EE7F234A0/image-size/large?v=v2&amp;amp;px=999" role="button" title="08ddc420-429f-48e7-a272-041cfc7bf91a.png" alt="08ddc420-429f-48e7-a272-041cfc7bf91a.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 08 Aug 2024 15:38:51 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Issue-with-String-to-Timestamp-Conversion-in-Hive-and-Spark/m-p/391553#M247670</guid>
      <dc:creator>evanle96</dc:creator>
      <dc:date>2024-08-08T15:38:51Z</dc:date>
    </item>
    <item>
      <title>Re: Issue with String to Timestamp Conversion in Hive and Spark Resulting in Unexpected Hour Changes</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Issue-with-String-to-Timestamp-Conversion-in-Hive-and-Spark/m-p/391590#M247682</link>
      <description>&lt;P&gt;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/82698"&gt;@smruti&lt;/a&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I've validated it with the user, and here's what happens.&lt;/P&gt;&lt;P&gt;The date field comes in string format and it only comes with 8 fields "Yyyy-mm-dd" and it was converting to time stamp which in this case aggregates the time fields, and that's where the divergence was.&lt;/P&gt;&lt;P&gt;Is there anything in the tool that can be done to standardize this or do I need to change the data engineering?&lt;/P&gt;</description>
      <pubDate>Fri, 09 Aug 2024 18:03:48 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Issue-with-String-to-Timestamp-Conversion-in-Hive-and-Spark/m-p/391590#M247682</guid>
      <dc:creator>evanle96</dc:creator>
      <dc:date>2024-08-09T18:03:48Z</dc:date>
    </item>
  </channel>
</rss>

