<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Hive with spark table schema changes sensitivity in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/Hive-with-spark-table-schema-changes-sensitivity/m-p/362862#M238852</link>
    <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/103235"&gt;@hades_63146&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;If you are creating a Managed table in Hive via Spark, you need to use HiveWarehouseConnector.&lt;BR /&gt;&lt;A href="https://docs.cloudera.com/cdp-private-cloud-base/7.1.3/integrating-hive-and-bi/topics/hive_hivewarehouseconnector_for_handling_apache_spark_data.html" target="_blank" rel="noopener"&gt;https://docs.cloudera.com/cdp-private-cloud-base/7.1.3/integrating-hive-and-bi/topics/hive_hivewarehouseconnector_for_handling_apache_spark_data.html&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;If you are already using HWC, and it's failing, please share the code here and we can try to check what is missing.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Good Luck.&lt;/P&gt;</description>
    <pubDate>Wed, 01 Feb 2023 20:59:11 GMT</pubDate>
    <dc:creator>Shmoo</dc:creator>
    <dc:date>2023-02-01T20:59:11Z</dc:date>
    <item>
      <title>Hive with spark table schema changes sensitivity</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Hive-with-spark-table-schema-changes-sensitivity/m-p/362843#M238844</link>
      <description>&lt;P&gt;Hi all,&lt;/P&gt;&lt;P&gt;I work with cloudera 7.4.4 as our solution works with hive over spark.&lt;/P&gt;&lt;P&gt;As I load text files into hive , I may have schema changes on 3 manners:&lt;/P&gt;&lt;P&gt;1. the source data has added a column - causes data loss on insertion till the column is updated on Hive&lt;/P&gt;&lt;P&gt;2. the source data has omitted a column - fails the insert since that column was not dropped on hive&lt;/P&gt;&lt;P&gt;3. the data type has escalated to different type - had Hive is not updated in the new type , for example , int to bigint , the result will be null&lt;/P&gt;&lt;P&gt;nevertheless, inferschema of spark may change numeric fields to alpha and vice versa.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;is there a certain way , to make a non external Hive table to comply with these changes.&lt;/P&gt;&lt;P&gt;I did manage to create a program that do a filler of omitted columns to the dataframe and auto add new columns and escalates the data type, but is there a built in method?&lt;/P&gt;&lt;P&gt;for change alphanumeric to numeric and vice versa i don't have a solution.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Or, would you suggest to put the Hive as an external table over hbase/mongo/cassandra (any other that is better?!) and is a "refresh" of the structure will be as a snap of update a structure or lock my table till data will be rebalanced?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;the attachment shows that i have an initial schema and the necessity to update&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;thx in advanced&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 01 Feb 2023 16:48:39 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Hive-with-spark-table-schema-changes-sensitivity/m-p/362843#M238844</guid>
      <dc:creator>hades_63146</dc:creator>
      <dc:date>2023-02-01T16:48:39Z</dc:date>
    </item>
    <item>
      <title>Re: Hive with spark table schema changes sensitivity</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Hive-with-spark-table-schema-changes-sensitivity/m-p/362862#M238852</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/103235"&gt;@hades_63146&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;If you are creating a Managed table in Hive via Spark, you need to use HiveWarehouseConnector.&lt;BR /&gt;&lt;A href="https://docs.cloudera.com/cdp-private-cloud-base/7.1.3/integrating-hive-and-bi/topics/hive_hivewarehouseconnector_for_handling_apache_spark_data.html" target="_blank" rel="noopener"&gt;https://docs.cloudera.com/cdp-private-cloud-base/7.1.3/integrating-hive-and-bi/topics/hive_hivewarehouseconnector_for_handling_apache_spark_data.html&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;If you are already using HWC, and it's failing, please share the code here and we can try to check what is missing.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Good Luck.&lt;/P&gt;</description>
      <pubDate>Wed, 01 Feb 2023 20:59:11 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Hive-with-spark-table-schema-changes-sensitivity/m-p/362862#M238852</guid>
      <dc:creator>Shmoo</dc:creator>
      <dc:date>2023-02-01T20:59:11Z</dc:date>
    </item>
    <item>
      <title>Re: Hive with spark table schema changes sensitivity</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Hive-with-spark-table-schema-changes-sensitivity/m-p/363117#M238915</link>
      <description>&lt;P&gt;Hi,&amp;nbsp;&lt;/P&gt;&lt;P&gt;my question is not about the connector. my question is how dynamically i can work with spark dataframe that should handle multiple different schema.&lt;/P&gt;&lt;P&gt;look at the attachment given.&lt;/P&gt;&lt;P&gt;nevertheless, let me add some insights. on spark 3.0 , we have allowmissingcolumns parameter for unionbyname command; what do we have on 2.0 which is equevalent?&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 06 Feb 2023 06:00:40 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Hive-with-spark-table-schema-changes-sensitivity/m-p/363117#M238915</guid>
      <dc:creator>hades_63146</dc:creator>
      <dc:date>2023-02-06T06:00:40Z</dc:date>
    </item>
    <item>
      <title>Re: Hive with spark table schema changes sensitivity</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Hive-with-spark-table-schema-changes-sensitivity/m-p/373906#M241803</link>
      <description>&lt;P&gt;&lt;SPAN&gt;If my understanding is correct, the schema is altered for different input files, which implies that the data itself lacks a structured schema.&lt;BR /&gt;&lt;BR /&gt;Given the frequent changes in the schema, it is advisable to store the data in a column-oriented system such as HBASE.&lt;BR /&gt;&lt;BR /&gt;The Same HBASE data can be accessed through spark using HBase-Spark Connector.&amp;nbsp;&lt;BR /&gt;&lt;BR /&gt;Ref -&amp;nbsp;&lt;A href="https://docs.cloudera.com/cdp-private-cloud-base/7.1.8/accessing-hbase/topics/hbase-example-using-hbase-spark-connector.html" target="_blank"&gt;https://docs.cloudera.com/cdp-private-cloud-base/7.1.8/accessing-hbase/topics/hbase-example-using-hbase-spark-connector.html&lt;/A&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 14 Jul 2023 09:39:58 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Hive-with-spark-table-schema-changes-sensitivity/m-p/373906#M241803</guid>
      <dc:creator>ggangadharan</dc:creator>
      <dc:date>2023-07-14T09:39:58Z</dc:date>
    </item>
  </channel>
</rss>

