Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Hive insert oquery failed in HDP 3.1

avatar
New Contributor

Hi

 

I've been developing a spark application that inserts data(as a temp view from spark datarame) into a hive external table in HDP 3.1.0.

 

the problem is, when the application is just about to insert data, an error is thrown and insertion is failed.

( org.apache.hadoop.hive.ql.metadata.HiveException: Directory hdfs://platform/warehouse/tablespace/external/hive/sample_test/basis_ymd=20201022 could not be cleaned up. )

 

amount.createOrReplaceTempView("temp_sample_test")
// ERROR here
sparkSession.sql("INSERT OVERWRITE TABLE sample_test" +
  "PARTITION(basis_ymd='" + rightYmd + "') " +
  "SELECT product_id, amount, playtime, play_rate FROM temp_sample_test")

 

the error said to me, like below.

 

         client token: N/A
         diagnostics: User class threw exception: org.apache.spark.sql.AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: Directory hdfs://platform/warehouse/tablespace/external/hive/sample_test/basis_ymd=20201022 could not be cleaned up.;
        at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:106)
        at org.apache.spark.sql.hive.HiveExternalCatalog.loadPartition(HiveExternalCatalog.scala:843)
        at org.apache.spark.sql.hive.execution.InsertIntoHiveTable.processInsert(InsertIntoHiveTable.scala:248)
        at org.apache.spark.sql.hive.execution.InsertIntoHiveTable.run(InsertIntoHiveTable.scala:99)
        at org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:104)
        at org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult(commands.scala:102)
        at org.apache.spark.sql.execution.command.DataWritingCommandExec.executeCollect(commands.scala:115)
        at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:190)
        at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:190)
        at org.apache.spark.sql.Dataset$$anonfun$52.apply(Dataset.scala:3259)
        at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:77)
        at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3258)
        at org.apache.spark.sql.Dataset.<init>(Dataset.scala:190)
        at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:75)
        at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:642)

        at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:97)
        ... 23 more
Caused by: java.io.FileNotFoundException: File hdfs://platform/warehouse/tablespace/external/hive/sample_test/basis_ymd=20201022 does not exist.

 

 

I submitted the spark application with cluster mode (master=yarn)

 

I couldn't understand why the insert overwrite table query failed,, Is it right no matter how the partition is empty or not, the query should've succeeded?

 

This code used to be run at Hive 1, and now it's failed in Hive 3. Is there something I missed?

 

if anyone has any idea, please help me.

Thank you.

 

1 REPLY 1

avatar
Master Mentor

@dooby 

There is a Jira out theret see the solution 

https://issues.apache.org/jira/browse/SPARK-32536