Support Questions

cjervis · ‎12-08-2019

External Hive table is created and underlying ORC files having string datatypes and hive schema has different data types such as int, string .

While reading and showing data from underlying ORC is success but reading from hiveContext as dataframe and while showing data is getting failed.

Ex :

Schema ORC :

|

HiveSchema
 |-- c1: integer (nullable = true)
 |-- c2: string (nullable = true)
 |-- c3: string (nullable = true)
 |-- c4: date (nullable = true)
 |-- c5: date (nullable = true)
 |-- c6: string (nullable = true)
 |-- c7: string (nullable = true)
 |-- c8: string (nullable = true)
 |-- c9: string (nullable = true)
 |-- c10: string (nullable = true)
ORC schema
 |-- c1: string (nullable = true)
 |-- c2: string (nullable = true)
 |-- c3: string (nullable = true)
 |-- c4: date (nullable = true)
 |-- c5: date (nullable = true)
 |-- c6: string (nullable = true)
 |-- c7: string (nullable = true)
 |-- c8: string (nullable = true)
 |-- c9: date (nullable = true)
 |-- c10: string (nullable = true)

When reading from hive context getting below issue

val resultDF = sqlContext.sql("select * from hive_table_name")

 19/11/29 17:19:35 WARN TaskSetManager: Lost task 0.0 in stage 11.0 (TID 71, <host>, executor 2): java.lang.NullPointerException
        at org.apache.spark.sql.execution.datasources.orc.OrcColumnVector.getInt(OrcColumnVector.java:132)
        at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
        at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
        at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$10$$anon$1.hasNext(WholeStageCodegenExec.scala:614)
        at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:253)

EricL · ‎12-09-2019

@arunkumarc

Is there any particular reason you need to keep schema between Hive and ORC out of sync? Can you ALTER the Hive table schema to match with ORC data? Spark might be more restrictive on this checking.

Cheers
Eric

Cloudera Community

Support Questions

Cast expetion while reading through hivecontext in spark-shell