<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Issue when using PySpark with Impala via JDBC in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/Issue-when-using-PySpark-with-Impala-via-JDBC/m-p/409084#M252807</link>
    <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/124890"&gt;@leoeiji&lt;/a&gt;&amp;nbsp;Could you please confirm on how did you resolve this issue, I am also facing the same problem.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;</description>
    <pubDate>Sun, 01 Jun 2025 10:23:23 GMT</pubDate>
    <dc:creator>akb2025</dc:creator>
    <dc:date>2025-06-01T10:23:23Z</dc:date>
    <item>
      <title>Issue when using PySpark with Impala via JDBC</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Issue-when-using-PySpark-with-Impala-via-JDBC/m-p/404338#M252311</link>
      <description>&lt;P&gt;Due to data masking, I can't read tables directly using 'vanilla' Spark. The workaround is connecting Spark to Impala via JDBC and the problem is: &lt;STRONG&gt;when I use reserved words or some operations like `+ INTERVAL 1 DAY` Impala returns the column names as values in the DataFrame.&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;That's how I start the Spark session:&lt;/P&gt;&lt;LI-CODE lang="python"&gt;spark = (
    SparkSession
    .builder
    .config("spark.jars", "/home/cdsw/ImpalaJDBC42.jar")
    .getOrCreate()
)&lt;/LI-CODE&gt;&lt;P&gt;and how I query data:&lt;/P&gt;&lt;LI-CODE lang="python"&gt;(
    spark
    .read
    .format("jdbc")
    .option("driver", "com.cloudera.impala.jdbc.Driver")
    .option("url", "jdbc:impala://MY_IMPALA_HOST:443/default;AuthMech=3;transportMode=http;httpPath=cliservice;ssl=1")
    .option("PWD", "MY_PASSWORD")
    .option("UID", "MY_USERNAME")
    .option("query", "SELECT 'a' AS index FROM MY_TABLE")
    .load()
    .show()
)&lt;/LI-CODE&gt;&lt;P&gt;That's what I get:&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;+-----+
|index|
+-----+
|index|
|index|
|index|
|index|
|index|
|index|
|index|
|index|
|index|
|index|
|index|
|index|
|index|
|index|
|index|
|index|
+-----+&lt;/LI-CODE&gt;&lt;P&gt;Other errors are derived from this one. For example, when running the query:&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;SELECT current_date() + interval 1 day FROM MY_TABLE&lt;/LI-CODE&gt;&lt;P&gt;raises the exception:&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;java.sql.SQLDataException: [Cloudera][JDBC](10140) Error converting value to Date.&lt;/LI-CODE&gt;&lt;P&gt;This happens because Spark is expecting a date to be parsed but Impala returns the column name as a value. We can see the returned value by casting to string:&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;SELECT CAST(current_date() + interval 1 day AS STRING) FROM MY_TABLE&lt;/LI-CODE&gt;&lt;LI-CODE lang="markup"&gt;+-----------------------------------------------+
|cast(current_date() + interval 1 day as string)|
+-----------------------------------------------+
|                           cast(current_date...|
|                           cast(current_date...|
|                           cast(current_date...|
|                           cast(current_date...|
|                           cast(current_date...|
|                           cast(current_date...|
|                           cast(current_date...|
|                           cast(current_date...|
|                           cast(current_date...|
|                           cast(current_date...|
|                           cast(current_date...|
|                           cast(current_date...|
|                           cast(current_date...|
|                           cast(current_date...|
|                           cast(current_date...|
|                           cast(current_date...|
+-----------------------------------------------+&lt;/LI-CODE&gt;&lt;P&gt;Can someone help me? I searched for a while and found some &lt;A href="https://community.cloudera.com/t5/Support-Questions/Spark-sql-with-impala-on-kerberos-returning-only-column/td-p/69544" target="_self"&gt;people facing this issue some years ago&lt;/A&gt;. Is there a solution already?&lt;/P&gt;</description>
      <pubDate>Tue, 18 Mar 2025 19:46:17 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Issue-when-using-PySpark-with-Impala-via-JDBC/m-p/404338#M252311</guid>
      <dc:creator>leoeiji</dc:creator>
      <dc:date>2025-03-18T19:46:17Z</dc:date>
    </item>
    <item>
      <title>Re: Issue when using PySpark with Impala via JDBC</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Issue-when-using-PySpark-with-Impala-via-JDBC/m-p/404409#M252323</link>
      <description>&lt;P&gt;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/124890"&gt;@leoeiji&lt;/a&gt;&amp;nbsp;Welcome to the Cloudera Community!&lt;BR /&gt;&lt;BR /&gt;To help you get the best possible solution, I have tagged our Impala experts&amp;nbsp;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/81584"&gt;@jAnshula&lt;/a&gt;&amp;nbsp;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/92797"&gt;@Saurabhatiyal&lt;/a&gt;&amp;nbsp;&amp;nbsp;who may be able to assist you further.&lt;BR /&gt;&lt;BR /&gt;Please keep us updated on your post, and we hope you find a satisfactory solution to your query.&lt;/P&gt;</description>
      <pubDate>Wed, 19 Mar 2025 17:13:49 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Issue-when-using-PySpark-with-Impala-via-JDBC/m-p/404409#M252323</guid>
      <dc:creator>DianaTorres</dc:creator>
      <dc:date>2025-03-19T17:13:49Z</dc:date>
    </item>
    <item>
      <title>Re: Issue when using PySpark with Impala via JDBC</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Issue-when-using-PySpark-with-Impala-via-JDBC/m-p/404418#M252324</link>
      <description>&lt;P&gt;Hello-&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Pasting here the reply from 6 yrs ago, which I still find relevant:&lt;/P&gt;&lt;P&gt;&lt;EM&gt;&lt;STRONG&gt;Running Impala query over driver from Spark is not currently supported by Cloudera. Why don't you just use SparkSQL instead? Why need to have extra layer of impala here?&lt;/STRONG&gt;&lt;/EM&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 19 Mar 2025 19:39:31 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Issue-when-using-PySpark-with-Impala-via-JDBC/m-p/404418#M252324</guid>
      <dc:creator>Boris G</dc:creator>
      <dc:date>2025-03-19T19:39:31Z</dc:date>
    </item>
    <item>
      <title>Re: Issue when using PySpark with Impala via JDBC</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Issue-when-using-PySpark-with-Impala-via-JDBC/m-p/404483#M252331</link>
      <description>&lt;P&gt;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/8448"&gt;@Boris G&lt;/a&gt;, I literaly started my thread explaining why I need Impala. P&lt;SPAN&gt;roblem solved by the way.&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 20 Mar 2025 12:20:54 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Issue-when-using-PySpark-with-Impala-via-JDBC/m-p/404483#M252331</guid>
      <dc:creator>leoeiji</dc:creator>
      <dc:date>2025-03-20T12:20:54Z</dc:date>
    </item>
    <item>
      <title>Re: Issue when using PySpark with Impala via JDBC</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Issue-when-using-PySpark-with-Impala-via-JDBC/m-p/409084#M252807</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/124890"&gt;@leoeiji&lt;/a&gt;&amp;nbsp;Could you please confirm on how did you resolve this issue, I am also facing the same problem.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Sun, 01 Jun 2025 10:23:23 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Issue-when-using-PySpark-with-Impala-via-JDBC/m-p/409084#M252807</guid>
      <dc:creator>akb2025</dc:creator>
      <dc:date>2025-06-01T10:23:23Z</dc:date>
    </item>
  </channel>
</rss>

