External Hive table is created and underlying ORC files having string datatypes and hive schema has different data types such as int, string .
While reading and showing data from underlying ORC is success but reading from hiveContext as dataframe and while showing data is getting failed.
Ex :
Schema ORC :
|
HiveSchema
|-- c1: integer (nullable = true)
|-- c2: string (nullable = true)
|-- c3: string (nullable = true)
|-- c4: date (nullable = true)
|-- c5: date (nullable = true)
|-- c6: string (nullable = true)
|-- c7: string (nullable = true)
|-- c8: string (nullable = true)
|-- c9: string (nullable = true)
|-- c10: string (nullable = true)
ORC schema
|-- c1: string (nullable = true)
|-- c2: string (nullable = true)
|-- c3: string (nullable = true)
|-- c4: date (nullable = true)
|-- c5: date (nullable = true)
|-- c6: string (nullable = true)
|-- c7: string (nullable = true)
|-- c8: string (nullable = true)
|-- c9: date (nullable = true)
|-- c10: string (nullable = true)
When reading from hive context getting below issue
val resultDF = sqlContext.sql("select * from hive_table_name")
19/11/29 17:19:35 WARN TaskSetManager: Lost task 0.0 in stage 11.0 (TID 71, <host>, executor 2): java.lang.NullPointerException
at org.apache.spark.sql.execution.datasources.orc.OrcColumnVector.getInt(OrcColumnVector.java:132)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$10$$anon$1.hasNext(WholeStageCodegenExec.scala:614)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:253)