<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Error in reading database through hive using pyspark in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/Error-in-reading-database-through-hive-using-pyspark/m-p/292009#M215834</link>
    <description>&lt;P&gt;I am using spark 2.3.2 and i am trying to read tables from database. I established spark connection.&lt;/P&gt;
&lt;P&gt;But i am unable to read database tables from HUE cloudera and unable to query them in pyspark as well.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Here is my code,&lt;/P&gt;
&lt;P&gt;&lt;BR /&gt;import findspark&lt;BR /&gt;findspark.init('C:\spark-2.3.2-bin-hadoop2.7\spark-2.3.2-bin-hadoop2.7')&lt;BR /&gt;import pyspark&lt;BR /&gt;from pyspark.sql import SparkSession&lt;BR /&gt;spark = SparkSession.builder.config("hive.metastore.uris", "thrift://10.1.1.70:8888").enableHiveSupport().getOrCreate()&lt;BR /&gt;#spark.catalog.listTables("tp_policy_operation")&lt;BR /&gt;import pandas as pd&lt;BR /&gt;sc = spark.sparkContext&lt;BR /&gt;sc&lt;/P&gt;
&lt;P&gt;&lt;BR /&gt;from pyspark import SparkContext&lt;BR /&gt;from pyspark.sql import SQLContext&lt;BR /&gt;sql_sc = SQLContext(sc)&lt;BR /&gt;SparkContext.setSystemProperty("hive.metastore.uris", "thrift://10.1.1.70:8888")&lt;BR /&gt;spark.sql("SELECT * FROM tp_policy_operation")&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;######The error i am getting&lt;/P&gt;
&lt;P&gt;Traceback (most recent call last):&lt;/P&gt;
&lt;P&gt;File "&amp;lt;ipython-input-4-8f0aa5852b01&amp;gt;", line 16, in &amp;lt;module&amp;gt;&lt;BR /&gt;spark.sql("SELECT * FROM tp_policy_operation")## Database ?&lt;/P&gt;
&lt;P&gt;File "C:\spark-2.3.2-bin-hadoop2.7\spark-2.3.2-bin-hadoop2.7\python\pyspark\sql\session.py", line 710, in sql&lt;BR /&gt;return DataFrame(self._jsparkSession.sql(sqlQuery), self._wrapped)&lt;/P&gt;
&lt;P&gt;File "C:\spark-2.3.2-bin-hadoop2.7\spark-2.3.2-bin-hadoop2.7\python\lib\py4j-0.10.7-src.zip\py4j\java_gateway.py", line 1257, in __call__&lt;BR /&gt;answer, self.gateway_client, self.target_id, self.name)&lt;/P&gt;
&lt;P&gt;File "C:\spark-2.3.2-bin-hadoop2.7\spark-2.3.2-bin-hadoop2.7\python\pyspark\sql\utils.py", line 69, in deco&lt;BR /&gt;raise AnalysisException(s.split(': ', 1)[1], stackTrace)&lt;/P&gt;
&lt;P&gt;AnalysisException: 'org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.thrift.transport.TTransportException;'&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Kindly help me resolve the issue or guide me the changes in the code above.&lt;/P&gt;</description>
    <pubDate>Tue, 21 Apr 2026 11:31:58 GMT</pubDate>
    <dc:creator>Logica</dc:creator>
    <dc:date>2026-04-21T11:31:58Z</dc:date>
    <item>
      <title>Error in reading database through hive using pyspark</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Error-in-reading-database-through-hive-using-pyspark/m-p/292009#M215834</link>
      <description>&lt;P&gt;I am using spark 2.3.2 and i am trying to read tables from database. I established spark connection.&lt;/P&gt;
&lt;P&gt;But i am unable to read database tables from HUE cloudera and unable to query them in pyspark as well.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Here is my code,&lt;/P&gt;
&lt;P&gt;&lt;BR /&gt;import findspark&lt;BR /&gt;findspark.init('C:\spark-2.3.2-bin-hadoop2.7\spark-2.3.2-bin-hadoop2.7')&lt;BR /&gt;import pyspark&lt;BR /&gt;from pyspark.sql import SparkSession&lt;BR /&gt;spark = SparkSession.builder.config("hive.metastore.uris", "thrift://10.1.1.70:8888").enableHiveSupport().getOrCreate()&lt;BR /&gt;#spark.catalog.listTables("tp_policy_operation")&lt;BR /&gt;import pandas as pd&lt;BR /&gt;sc = spark.sparkContext&lt;BR /&gt;sc&lt;/P&gt;
&lt;P&gt;&lt;BR /&gt;from pyspark import SparkContext&lt;BR /&gt;from pyspark.sql import SQLContext&lt;BR /&gt;sql_sc = SQLContext(sc)&lt;BR /&gt;SparkContext.setSystemProperty("hive.metastore.uris", "thrift://10.1.1.70:8888")&lt;BR /&gt;spark.sql("SELECT * FROM tp_policy_operation")&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;######The error i am getting&lt;/P&gt;
&lt;P&gt;Traceback (most recent call last):&lt;/P&gt;
&lt;P&gt;File "&amp;lt;ipython-input-4-8f0aa5852b01&amp;gt;", line 16, in &amp;lt;module&amp;gt;&lt;BR /&gt;spark.sql("SELECT * FROM tp_policy_operation")## Database ?&lt;/P&gt;
&lt;P&gt;File "C:\spark-2.3.2-bin-hadoop2.7\spark-2.3.2-bin-hadoop2.7\python\pyspark\sql\session.py", line 710, in sql&lt;BR /&gt;return DataFrame(self._jsparkSession.sql(sqlQuery), self._wrapped)&lt;/P&gt;
&lt;P&gt;File "C:\spark-2.3.2-bin-hadoop2.7\spark-2.3.2-bin-hadoop2.7\python\lib\py4j-0.10.7-src.zip\py4j\java_gateway.py", line 1257, in __call__&lt;BR /&gt;answer, self.gateway_client, self.target_id, self.name)&lt;/P&gt;
&lt;P&gt;File "C:\spark-2.3.2-bin-hadoop2.7\spark-2.3.2-bin-hadoop2.7\python\pyspark\sql\utils.py", line 69, in deco&lt;BR /&gt;raise AnalysisException(s.split(': ', 1)[1], stackTrace)&lt;/P&gt;
&lt;P&gt;AnalysisException: 'org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.thrift.transport.TTransportException;'&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Kindly help me resolve the issue or guide me the changes in the code above.&lt;/P&gt;</description>
      <pubDate>Tue, 21 Apr 2026 11:31:58 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Error-in-reading-database-through-hive-using-pyspark/m-p/292009#M215834</guid>
      <dc:creator>Logica</dc:creator>
      <dc:date>2026-04-21T11:31:58Z</dc:date>
    </item>
    <item>
      <title>Re: Error in reading database through hive using pyspark</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Error-in-reading-database-through-hive-using-pyspark/m-p/292015#M215840</link>
      <description>&lt;P&gt;Hi &lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/75663"&gt;@Logica&lt;/a&gt; .&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;please check whether database is selected or not for running the query-&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;below is code for reading hive table -&lt;/P&gt;&lt;LI-CODE lang="python"&gt;from pyspark.conf import SparkConf
from pyspark.context import SparkContext
from pyspark.sql import HiveContext
sc= SparkContext('local','example')
hc = HiveContext(sc)
tf1 = sc.textFile("/user/BigData/nooo/SparkTest/train.csv")
#print(tf1.show(10))

#here reading hive table from pyspark
#print(data)
#data=tf1.top(10)
#print(data)
hc.sql("use default") #selected db here 
spf = hc.sql("SELECT * FROM tempaz LIMIT 100")
print(spf.show(5))&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks&lt;/P&gt;&lt;P&gt;HadoopHelp&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 18 Mar 2020 08:42:12 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Error-in-reading-database-through-hive-using-pyspark/m-p/292015#M215840</guid>
      <dc:creator>HadoopHelp</dc:creator>
      <dc:date>2020-03-18T08:42:12Z</dc:date>
    </item>
    <item>
      <title>Re: Error in reading database through hive using pyspark</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Error-in-reading-database-through-hive-using-pyspark/m-p/292016#M215841</link>
      <description>&lt;P&gt;I changed the port no from 8888 to 9083 and it is working fine but when i tried to show the query result it shows;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;df.show()&lt;BR /&gt;Traceback (most recent call last):&lt;/P&gt;&lt;P&gt;File "&amp;lt;ipython-input-12-1a6ce2362cd4&amp;gt;", line 1, in &amp;lt;module&amp;gt;&lt;BR /&gt;df.show()&lt;/P&gt;&lt;P&gt;File "C:\spark-2.3.2-bin-hadoop2.7\spark-2.3.2-bin-hadoop2.7\python\pyspark\sql\dataframe.py", line 350, in show&lt;BR /&gt;print(self._jdf.showString(n, 20, vertical))&lt;/P&gt;&lt;P&gt;File "C:\spark-2.3.2-bin-hadoop2.7\spark-2.3.2-bin-hadoop2.7\python\lib\py4j-0.10.7-src.zip\py4j\java_gateway.py", line 1257, in __call__&lt;BR /&gt;answer, self.gateway_client, self.target_id, self.name)&lt;/P&gt;&lt;P&gt;File "C:\spark-2.3.2-bin-hadoop2.7\spark-2.3.2-bin-hadoop2.7\python\pyspark\sql\utils.py", line 79, in deco&lt;BR /&gt;raise IllegalArgumentException(s.split(': ', 1)[1], stackTrace)&lt;/P&gt;&lt;P&gt;IllegalArgumentException: 'java.net.UnknownHostException: quickstart.cloudera'&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Can you help me regarding this&amp;nbsp;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/32608"&gt;@HadoopHelp&lt;/a&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 18 Mar 2020 08:48:45 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Error-in-reading-database-through-hive-using-pyspark/m-p/292016#M215841</guid>
      <dc:creator>Logica</dc:creator>
      <dc:date>2020-03-18T08:48:45Z</dc:date>
    </item>
    <item>
      <title>Re: Error in reading database through hive using pyspark</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Error-in-reading-database-through-hive-using-pyspark/m-p/292026#M215848</link>
      <description>&lt;P&gt;Hi &lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/75663"&gt;@Logica&lt;/a&gt; .&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I think you need keep hive-site.xml file into spark -&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Please follow the below steps for running the hive query or accessing the hive table through pyspark-&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;A title="access-hive-tables-to-spark" href="https://acadgild.com/blog/how-to-access-hive-tables-to-spark-sql" target="_self"&gt;https://acadgild.com/blog/how-to-access-hive-tables-to-spark-sql&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks&lt;/P&gt;&lt;P&gt;HadoopHelp&lt;/P&gt;</description>
      <pubDate>Wed, 18 Mar 2020 09:39:38 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Error-in-reading-database-through-hive-using-pyspark/m-p/292026#M215848</guid>
      <dc:creator>HadoopHelp</dc:creator>
      <dc:date>2020-03-18T09:39:38Z</dc:date>
    </item>
  </channel>
</rss>

