Support Questions
Find answers, ask questions, and share your expertise

In Spark 3.1.1 , Failed while InsertInto with dynamic partition for External Table

Explorer
set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nostrict;

dataset.write.mode("overwrite").InsertInto("table") with dynamic partitions 

 

Tried in spark program and also in SPARK3-SHELL

ERROR Hive: Exception when loading partition with parameters  partPath=hdfs://nameservice/user/hive/warehouse/tmp.db/id_name2/.hive-staging_hive_2020-08-05_14-38-00_715_3629476922121193803-1/-ext-10000/dt=2001,  table=id_name2,  partSpec={dt=2001},  loadFileType=REPLACE_ALL,  listBucketingLevel=0,  isAcid=false,  resetStatistics=false
org.apache.hadoop.hive.ql.metadata.HiveException: Directory hdfs://nameservice/user/hive/warehouse/tmp.db/id_name2/dt=2001 could not be cleaned up.    at org.apache.hadoop.hive.ql.metadata.Hive.deleteOldPathForReplace(Hive.java:4666)
    at org.apache.hadoop.hive.ql.metadata.Hive.replaceFiles(Hive.java:4597)
    at org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:2132)
    at org.apache.hadoop.hive.ql.metadata.Hive$5.call(Hive.java:2588)
    at org.apache.hadoop.hive.ql.metadata.Hive$5.call(Hive.java:2579)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.FileNotFoundException: File hdfs://nameservice/user/hive/warehouse/tmp.db/id_name2/dt=2001 does not exist.    at org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:1053)
    at org.apache.hadoop.hdfs.DistributedFileSystem.access$1000(DistributedFileSystem.java:131)
    at org.apache.hadoop.hdfs.DistributedFileSystem$24.doCall(DistributedFileSystem.java:1113)
    at org.apache.hadoop.hdfs.DistributedFileSystem$24.doCall(DistributedFileSystem.java:1110)
    at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
    at org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:1120)
    at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1868)
    at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1910)
    at org.apache.hadoop.hive.ql.metadata.Hive.cleanUpOneDirectoryForReplace(Hive.java:4681)
    at org.apache.hadoop.hive.ql.metadata.Hive.deleteOldPathForReplace(Hive.java:4661)
    ... 8 more
Error in query: org.apache.hadoop.hive.ql.metadata.HiveException: Exception when loading 1 in table id_name2 with loadPath=hdfs://nameservice/user/hive/warehouse/tmp.db/id_name2/.hive-staging_hive_2020-08-05_14-38-00_715_3629476922121193803-1/-ext-10000;

 

4 REPLIES 4

Cloudera Employee

Hi @DURAISAM 

 

You are not able to execute "Insert Into" command only or not able to perform other actions as well, say "Delete", etc.

 

Can you check if the user with which you are updating/ querying the table has required permissions on the table

Explorer

Hi @jAnshula ,

 

Tried and checked

  • Permission of folder all fine in HDFS
  • spark.sql.files.ignoreMissingFiles=true
  • spark.sql.sources.partitionOverwriteMode=DYNAMIC
  • Works fine in Spark 2.4 

 

Facing the issue in Spark 3.1.1:

dataset.write.mode("overwrite").insertInto("external_table"); this should remove existing partitions and persists new data right? 

Actual: On Rerun case,  Data got removed from HDFS and still partition details in table metadata so trying again to remove the data and throws FileNotFoundException.

 

To reproduce very quickly, please check once in spark3-shell .

 

 

 

Cloudera Employee

what's the Ambari/ CM version you are using, Also can you share us the HMS logs for this time range

Explorer

Hi,

I am going with workaround as of now for this issue.

 

Workaround: Drop partition explicit via code from metadata 

 

 

; ;