Created on 07-26-2021 05:54 AM - edited 07-26-2021 06:32 AM
set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nostrict;
dataset.write.mode("overwrite").InsertInto("table") with dynamic partitions
Tried in spark program and also in SPARK3-SHELL
ERROR Hive: Exception when loading partition with parameters partPath=hdfs://nameservice/user/hive/warehouse/tmp.db/id_name2/.hive-staging_hive_2020-08-05_14-38-00_715_3629476922121193803-1/-ext-10000/dt=2001, table=id_name2, partSpec={dt=2001}, loadFileType=REPLACE_ALL, listBucketingLevel=0, isAcid=false, resetStatistics=false org.apache.hadoop.hive.ql.metadata.HiveException: Directory hdfs://nameservice/user/hive/warehouse/tmp.db/id_name2/dt=2001 could not be cleaned up. at org.apache.hadoop.hive.ql.metadata.Hive.deleteOldPathForReplace(Hive.java:4666) at org.apache.hadoop.hive.ql.metadata.Hive.replaceFiles(Hive.java:4597) at org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:2132) at org.apache.hadoop.hive.ql.metadata.Hive$5.call(Hive.java:2588) at org.apache.hadoop.hive.ql.metadata.Hive$5.call(Hive.java:2579) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.io.FileNotFoundException: File hdfs://nameservice/user/hive/warehouse/tmp.db/id_name2/dt=2001 does not exist. at org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:1053) at org.apache.hadoop.hdfs.DistributedFileSystem.access$1000(DistributedFileSystem.java:131) at org.apache.hadoop.hdfs.DistributedFileSystem$24.doCall(DistributedFileSystem.java:1113) at org.apache.hadoop.hdfs.DistributedFileSystem$24.doCall(DistributedFileSystem.java:1110) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:1120) at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1868) at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1910) at org.apache.hadoop.hive.ql.metadata.Hive.cleanUpOneDirectoryForReplace(Hive.java:4681) at org.apache.hadoop.hive.ql.metadata.Hive.deleteOldPathForReplace(Hive.java:4661) ... 8 more Error in query: org.apache.hadoop.hive.ql.metadata.HiveException: Exception when loading 1 in table id_name2 with loadPath=hdfs://nameservice/user/hive/warehouse/tmp.db/id_name2/.hive-staging_hive_2020-08-05_14-38-00_715_3629476922121193803-1/-ext-10000;
Created 07-26-2021 07:31 AM
Hi @DURAISAM
You are not able to execute "Insert Into" command only or not able to perform other actions as well, say "Delete", etc.
Can you check if the user with which you are updating/ querying the table has required permissions on the table
Created on 07-26-2021 09:09 PM - edited 07-26-2021 09:10 PM
Hi @jAnshula ,
Tried and checked
Facing the issue in Spark 3.1.1:
dataset.write.mode("overwrite").insertInto("external_table"); this should remove existing partitions and persists new data right?
Actual: On Rerun case, Data got removed from HDFS and still partition details in table metadata so trying again to remove the data and throws FileNotFoundException.
To reproduce very quickly, please check once in spark3-shell .
Created 07-27-2021 01:45 AM
what's the Ambari/ CM version you are using, Also can you share us the HMS logs for this time range
Created 08-16-2021 06:13 AM
Hi,
I am going with workaround as of now for this issue.
Workaround: Drop partition explicit via code from metadata