- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Issue when using PySpark with Impala via JDBC
- Labels:
-
Apache Impala
-
Apache Spark
Created 03-18-2025 12:46 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Due to data masking, I can't read tables directly using 'vanilla' Spark. The workaround is connecting Spark to Impala via JDBC and the problem is: when I use reserved words or some operations like `+ INTERVAL 1 DAY` Impala returns the column names as values in the DataFrame.
That's how I start the Spark session:
spark = (
SparkSession
.builder
.config("spark.jars", "/home/cdsw/ImpalaJDBC42.jar")
.getOrCreate()
)
and how I query data:
(
spark
.read
.format("jdbc")
.option("driver", "com.cloudera.impala.jdbc.Driver")
.option("url", "jdbc:impala://MY_IMPALA_HOST:443/default;AuthMech=3;transportMode=http;httpPath=cliservice;ssl=1")
.option("PWD", "MY_PASSWORD")
.option("UID", "MY_USERNAME")
.option("query", "SELECT 'a' AS index FROM MY_TABLE")
.load()
.show()
)
That's what I get:
+-----+
|index|
+-----+
|index|
|index|
|index|
|index|
|index|
|index|
|index|
|index|
|index|
|index|
|index|
|index|
|index|
|index|
|index|
|index|
+-----+
Other errors are derived from this one. For example, when running the query:
SELECT current_date() + interval 1 day FROM MY_TABLE
raises the exception:
java.sql.SQLDataException: [Cloudera][JDBC](10140) Error converting value to Date.
This happens because Spark is expecting a date to be parsed but Impala returns the column name as a value. We can see the returned value by casting to string:
SELECT CAST(current_date() + interval 1 day AS STRING) FROM MY_TABLE
+-----------------------------------------------+
|cast(current_date() + interval 1 day as string)|
+-----------------------------------------------+
| cast(current_date...|
| cast(current_date...|
| cast(current_date...|
| cast(current_date...|
| cast(current_date...|
| cast(current_date...|
| cast(current_date...|
| cast(current_date...|
| cast(current_date...|
| cast(current_date...|
| cast(current_date...|
| cast(current_date...|
| cast(current_date...|
| cast(current_date...|
| cast(current_date...|
| cast(current_date...|
+-----------------------------------------------+
Can someone help me? I searched for a while and found some people facing this issue some years ago. Is there a solution already?
Created 03-19-2025 12:39 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello-
Pasting here the reply from 6 yrs ago, which I still find relevant:
Running Impala query over driver from Spark is not currently supported by Cloudera. Why don't you just use SparkSQL instead? Why need to have extra layer of impala here?
Created 03-19-2025 10:13 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@leoeiji Welcome to the Cloudera Community!
To help you get the best possible solution, I have tagged our Impala experts @jAnshula @Saurabhatiyal who may be able to assist you further.
Please keep us updated on your post, and we hope you find a satisfactory solution to your query.
Regards,
Diana Torres,Community Moderator
Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.
Learn more about the Cloudera Community:
Created 03-19-2025 12:39 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello-
Pasting here the reply from 6 yrs ago, which I still find relevant:
Running Impala query over driver from Spark is not currently supported by Cloudera. Why don't you just use SparkSQL instead? Why need to have extra layer of impala here?
Created 03-20-2025 05:20 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Boris G, I literaly started my thread explaining why I need Impala. Problem solved by the way.
