<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Spark cannot read hive orc table in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/Spark-cannot-read-hive-orc-table/m-p/351196#M236188</link>
    <description>What is the HDP version. if it is HDP3.x then you need to use Hive&lt;BR /&gt;Warehouse Connector (HWC).&lt;BR /&gt;</description>
    <pubDate>Wed, 31 Aug 2022 06:18:10 GMT</pubDate>
    <dc:creator>RangaReddy</dc:creator>
    <dc:date>2022-08-31T06:18:10Z</dc:date>
    <item>
      <title>Spark cannot read hive orc table</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Spark-cannot-read-hive-orc-table/m-p/348155#M235328</link>
      <description>&lt;P&gt;Hello all,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I cannot read data from hive orc table and load to dataframe. If someone know, could you help me to fix it? Below is my scripts:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;from pyspark import SparkContext, SparkConf&lt;BR /&gt;from pyspark.conf import SparkConf&lt;BR /&gt;from pyspark.sql import SparkSession&lt;BR /&gt;from pyspark.sql import HiveContext,SQLContext&lt;/P&gt;
&lt;P&gt;spark = SparkSession.builder.appName("Testing....").enableHiveSupport().getOrCreate()&lt;/P&gt;
&lt;P&gt;hive_context = HiveContext(spark)&lt;BR /&gt;sqlContext = SQLContext(spark)&lt;/P&gt;
&lt;P&gt;df_pgw=hive_context.sql("select * from orc_table")&lt;BR /&gt;Hive Session ID = 79c9e6c0-1649-41dc-9aea-493c0f62d046&lt;BR /&gt;22/07/20 11:50:52 WARN SessionState: METASTORE_FILTER_HOOK will be ignored, since hive.security.authorization.manager is set to instance of HiveAuthorizerFactory.&lt;BR /&gt;22/07/20 11:50:56 WARN HiveMetastoreCatalog: Unable to infer schema for table orc_table from file format ORC (inference mode: INFER_AND_SAVE). Using metastore schema.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;df_pgw.show()&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;=&amp;gt; ....Don't have data presents&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Thanks,&lt;/P&gt;</description>
      <pubDate>Wed, 20 Jul 2022 08:48:56 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Spark-cannot-read-hive-orc-table/m-p/348155#M235328</guid>
      <dc:creator>mala_etl</dc:creator>
      <dc:date>2022-07-20T08:48:56Z</dc:date>
    </item>
    <item>
      <title>Re: Spark cannot read hive orc table</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Spark-cannot-read-hive-orc-table/m-p/351140#M236166</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/93771"&gt;@mala_etl&lt;/a&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I think you didn't mention you are running the application in CDH/HDP/CDP. Could you please share your hive script and check you are using hive catalog instead of in-memory catalog.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 30 Aug 2022 11:33:56 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Spark-cannot-read-hive-orc-table/m-p/351140#M236166</guid>
      <dc:creator>RangaReddy</dc:creator>
      <dc:date>2022-08-30T11:33:56Z</dc:date>
    </item>
    <item>
      <title>Re: Spark cannot read hive orc table</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Spark-cannot-read-hive-orc-table/m-p/351192#M236184</link>
      <description>&lt;P&gt;Hello&amp;nbsp;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/78612"&gt;@RangaReddy&lt;/a&gt;&amp;nbsp;, I run in Hortonwork, and hive table is orc format.&lt;/P&gt;&lt;P&gt;What you mean hive catalog or in-memory catalog?&lt;/P&gt;</description>
      <pubDate>Wed, 31 Aug 2022 01:45:00 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Spark-cannot-read-hive-orc-table/m-p/351192#M236184</guid>
      <dc:creator>mala_etl</dc:creator>
      <dc:date>2022-08-31T01:45:00Z</dc:date>
    </item>
    <item>
      <title>Re: Spark cannot read hive orc table</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Spark-cannot-read-hive-orc-table/m-p/351193#M236185</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/93771"&gt;@mala_etl&lt;/a&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;You can find the catalog information in the below link:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;A href="https://stackoverflow.com/questions/59894454/spark-and-hive-in-hadoop-3-difference-between-metastore-catalog-default-and-spa" target="_blank"&gt;https://stackoverflow.com/questions/59894454/spark-and-hive-in-hadoop-3-difference-between-metastore-catalog-default-and-spa&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Could you please confirm, the table is internal or external table in Hive and also verify the data in Hive.&lt;/P&gt;</description>
      <pubDate>Wed, 31 Aug 2022 01:49:30 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Spark-cannot-read-hive-orc-table/m-p/351193#M236185</guid>
      <dc:creator>RangaReddy</dc:creator>
      <dc:date>2022-08-31T01:49:30Z</dc:date>
    </item>
    <item>
      <title>Re: Spark cannot read hive orc table</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Spark-cannot-read-hive-orc-table/m-p/351195#M236187</link>
      <description>&lt;P&gt;It is internal table. Data in hive is normal, it can select/update/delete from openquery in sql server and can query from dbeaver.&lt;/P&gt;</description>
      <pubDate>Wed, 31 Aug 2022 05:05:04 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Spark-cannot-read-hive-orc-table/m-p/351195#M236187</guid>
      <dc:creator>mala_etl</dc:creator>
      <dc:date>2022-08-31T05:05:04Z</dc:date>
    </item>
    <item>
      <title>Re: Spark cannot read hive orc table</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Spark-cannot-read-hive-orc-table/m-p/351196#M236188</link>
      <description>What is the HDP version. if it is HDP3.x then you need to use Hive&lt;BR /&gt;Warehouse Connector (HWC).&lt;BR /&gt;</description>
      <pubDate>Wed, 31 Aug 2022 06:18:10 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Spark-cannot-read-hive-orc-table/m-p/351196#M236188</guid>
      <dc:creator>RangaReddy</dc:creator>
      <dc:date>2022-08-31T06:18:10Z</dc:date>
    </item>
  </channel>
</rss>

