Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Spark SQL insert overwrite with avro format?

Spark SQL insert overwrite with avro format?

New Contributor

I'm having some issues insert overwriting in spark with avro format tables. No matter what I do, I always seem to get the same error. This occurs on queries as simple as "INSERT OVERWRITE TABLE avroTable SELECT * FROM avroTable", and anything more complex as long as the target table is avro. Does anyone know of a solution to this?

 

Here is the error in question:

 

An error occurred while calling o43.sql.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 1020.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1020.0 (TID 4790): org.apache.hadoop.hive.serde2.SerDeException: Encountered exception determining schema. Returning signal schema to indicate problem: null
	at org.apache.hadoop.hive.serde2.SerDeUtils.initializeSerDe(SerDeUtils.java:523)
	at org.apache.hadoop.hive.ql.plan.TableDesc.getDeserializer(TableDesc.java:97)
	at org.apache.hadoop.hive.ql.plan.TableDesc.getDeserializer(TableDesc.java:88)
	at org.apache.hadoop.hive.ql.plan.TableDesc.getDeserializer(TableDesc.java:81)
	at org.apache.spark.sql.hive.execution.InsertIntoHiveTable.org$apache$spark$sql$hive$execution$InsertIntoHiveTable$$writeToFile$1(InsertIntoHiveTable.scala:92)
	at org.apache.spark.sql.hive.execution.InsertIntoHiveTable$$anonfun$saveAsHiveFile$3.apply(InsertIntoHiveTable.scala:84)
	at org.apache.spark.sql.hive.execution.InsertIntoHiveTable$$anonfun$saveAsHiveFile$3.apply(InsertIntoHiveTable.scala:84)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
	at org.apache.spark.scheduler.Task.run(Task.scala:89)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)

 

 

1 REPLY 1

Re: Spark SQL insert overwrite with avro format?

New Contributor

This issue is addressed in spark 2.2.1 . Please find tracking below.

https://issues.apache.org/jira/browse/SPARK-17920?page=com.atlassian.jira.plugin.system.issuetabpane...