Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

spark jdbc netezza bad value exception.

Highlighted

spark jdbc netezza bad value exception.

New Contributor

Hello,

 

We are getting below exception while reading table from netezza  using spark,

py4j.protocol.Py4JJavaError: An error occurred while calling o287.count.

: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 10.0 failed 4 times, most recent failure: Lost task 0.3 in stage 10.0 (TID 13, executor 10): org.netezza.error.NzSQLException: netezza.bad.value

        at org.netezza.sql.NzResultSet.getDbosTimestamp(NzResultSet.java:4053)

        at org.netezza.sql.NzResultSet.getTimestamp(NzResultSet.java:1578)

        at org.netezza.sql.NzResultSet.getTimestamp(NzResultSet.java:1528).

Attaching full stackstarce.

we are using nzjdbc3.jar for netezza and spark connection and below are the connection string,

input_df = spark.read.format('jdbc').options(url='jdbc:netezza://server_name:port/dbname', user='', password='’, driver='org.netezza.Driver',dbtable="(select * from schema_name.table_name limit 100) as t").load()

I am able to print schema of dataframe but when i performed some action like show(),count() It is failing for timestamp column for selected tables, for other tables it is working fine. Also i am able to select other columns other than timestamp columns.

The below workaround we tried,

1) convert timestamp to stringType() still failing.

 

What will be the fix for this issue?

 

Thanks

 

 

 

 

1 REPLY 1
Highlighted

Re: spark jdbc netezza bad value exception.

Cloudera Employee

Hi,

 

This seems to be typeconversion issue in the timestamp field.  did you tried casting the timestamp into a string while populating spark data frame and then you can again convert that string into spark timestamp datatype? (i.e) after fetching the value from the Query the timestamp value needs to be converted into string in spark df and then reconvert that string to spark timestamp instead of directly pushing values from Netezza to spark because if you convert to string, it will not have datatype compatibility issues and this should work.

 

Thanks

AKR

Don't have an account?
Coming from Hortonworks? Activate your account here