Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

some impala built-in function can't work in spark data source reader

avatar
New Contributor

I want to get subquery from impala table as one dataset.

Code like this:

String subQuery = "(select to_timestamp(unix_timestamp(now())) as ts from my_table) t"
Dataset<Row> ds = spark.read().jdbc(myImpalaUrl, subQuery, prop);

But result is error:

Caused by: java.sql.SQLDataException: [Cloudera][JDBC](10140) Error converting value to Timestamp.

to_timestmap() function failed ,but unix_timestmap() , now() work fine.

ps. I found another problem, when I use hive udf in the "jdbc" api still failed. 

Can anyone help me?

4 REPLIES 4

avatar
New Contributor

Getting same conversion issues for below schema. 

root
 |-- instance: string (nullable = true)
 |-- count(*): long (nullable = true)

Error Message:

java.sql.SQLDataException: [Simba][JDBC](10140) Error converting value to long.
	at com.cloudera.exceptions.ExceptionConverter.toSQLException(Unknown Source)
	at com.cloudera.utilities.conversion.TypeConverter.toLong(Unknown Source)
	at com.cloudera.jdbc.common.SForwardResultSet.getLong(Unknown Source)
	at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$org$apache$spark$sql$execution$datasources$jdbc$JdbcUtils$$makeGetter$8.apply(JdbcUtils.scala:409)
	at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$org$apache$spark$sql$execution$datasources$jdbc$JdbcUtils$$makeGetter$8.apply(JdbcUtils.scala:408)
	at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anon$1.getNext(JdbcUtils.scala:330)
	at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anon$1.getNext(JdbcUtils.scala:312)

 

Can anyone please help on resolving this issue in spark java. 

 

avatar
New Contributor

Hey cloudera, @Nathan @Shushruth @is this issue addressed getting the same issue with python3 and pyspark2.3 while fetching data from Impala using spark jdbc.

avatar
Expert Contributor

Hello,

 

Are you trying to connect impala from spark via JDBC?

 

if yes, we don't support this feature yet. please refer to the below document.

 

https://docs.cloudera.com/documentation/enterprise/6/release-notes/topics/rg_cdh_621_unsupported_fea...

avatar
New Contributor

@ShankerSharma is cloudera planning to include this functionality anytime soon. If not, what other ways do we have to read tables from Impala using pyspark 2.3.