Support Questions

Find answers, ask questions, and share your expertise

Cast expetion while reading through hivecontext in spark-shell

avatar
New Contributor

External Hive table is created and underlying ORC files having string datatypes and hive schema has different data types such as int, string . 

While reading and showing data from underlying ORC is success but reading from hiveContext as dataframe and while showing data is getting failed.

Ex :

Schema ORC :

|

 

 

 

 

HiveSchema
 |-- c1: integer (nullable = true)
 |-- c2: string (nullable = true)
 |-- c3: string (nullable = true)
 |-- c4: date (nullable = true)
 |-- c5: date (nullable = true)
 |-- c6: string (nullable = true)
 |-- c7: string (nullable = true)
 |-- c8: string (nullable = true)
 |-- c9: string (nullable = true)
 |-- c10: string (nullable = true)
ORC schema
 |-- c1: string (nullable = true)
 |-- c2: string (nullable = true)
 |-- c3: string (nullable = true)
 |-- c4: date (nullable = true)
 |-- c5: date (nullable = true)
 |-- c6: string (nullable = true)
 |-- c7: string (nullable = true)
 |-- c8: string (nullable = true)
 |-- c9: date (nullable = true)
 |-- c10: string (nullable = true)

 

 

 

 

When reading from hive context getting below issue 

 

 

 

 

val resultDF = sqlContext.sql("select * from hive_table_name")
 19/11/29 17:19:35 WARN TaskSetManager: Lost task 0.0 in stage 11.0 (TID 71, <host>, executor 2): java.lang.NullPointerException
        at org.apache.spark.sql.execution.datasources.orc.OrcColumnVector.getInt(OrcColumnVector.java:132)
        at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
        at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
        at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$10$$anon$1.hasNext(WholeStageCodegenExec.scala:614)
        at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:253)

 

 

 

 

 

1 REPLY 1

avatar
Super Guru
@arunkumarc

Is there any particular reason you need to keep schema between Hive and ORC out of sync? Can you ALTER the Hive table schema to match with ORC data? Spark might be more restrictive on this checking.

Cheers
Eric