Support Questions

DURAISAM · ‎07-26-2021

set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nostrict;

dataset.write.mode("overwrite").InsertInto("table") with dynamic partitions

Tried in spark program and also in SPARK3-SHELL

ERROR Hive: Exception when loading partition with parameters  partPath=hdfs://nameservice/user/hive/warehouse/tmp.db/id_name2/.hive-staging_hive_2020-08-05_14-38-00_715_3629476922121193803-1/-ext-10000/dt=2001,  table=id_name2,  partSpec={dt=2001},  loadFileType=REPLACE_ALL,  listBucketingLevel=0,  isAcid=false,  resetStatistics=false
org.apache.hadoop.hive.ql.metadata.HiveException: Directory hdfs://nameservice/user/hive/warehouse/tmp.db/id_name2/dt=2001 could not be cleaned up.    at org.apache.hadoop.hive.ql.metadata.Hive.deleteOldPathForReplace(Hive.java:4666)
    at org.apache.hadoop.hive.ql.metadata.Hive.replaceFiles(Hive.java:4597)
    at org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:2132)
    at org.apache.hadoop.hive.ql.metadata.Hive$5.call(Hive.java:2588)
    at org.apache.hadoop.hive.ql.metadata.Hive$5.call(Hive.java:2579)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.FileNotFoundException: File hdfs://nameservice/user/hive/warehouse/tmp.db/id_name2/dt=2001 does not exist.    at org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:1053)
    at org.apache.hadoop.hdfs.DistributedFileSystem.access$1000(DistributedFileSystem.java:131)
    at org.apache.hadoop.hdfs.DistributedFileSystem$24.doCall(DistributedFileSystem.java:1113)
    at org.apache.hadoop.hdfs.DistributedFileSystem$24.doCall(DistributedFileSystem.java:1110)
    at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
    at org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:1120)
    at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1868)
    at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1910)
    at org.apache.hadoop.hive.ql.metadata.Hive.cleanUpOneDirectoryForReplace(Hive.java:4681)
    at org.apache.hadoop.hive.ql.metadata.Hive.deleteOldPathForReplace(Hive.java:4661)
    ... 8 more
Error in query: org.apache.hadoop.hive.ql.metadata.HiveException: Exception when loading 1 in table id_name2 with loadPath=hdfs://nameservice/user/hive/warehouse/tmp.db/id_name2/.hive-staging_hive_2020-08-05_14-38-00_715_3629476922121193803-1/-ext-10000;

jAnshula · ‎07-26-2021

Hi @DURAISAM

You are not able to execute "Insert Into" command only or not able to perform other actions as well, say "Delete", etc.

Can you check if the user with which you are updating/ querying the table has required permissions on the table

DURAISAM · ‎07-26-2021

Hi @jAnshula ,

Tried and checked

Permission of folder all fine in HDFS
spark.sql.files.ignoreMissingFiles=true
spark.sql.sources.partitionOverwriteMode=DYNAMIC
Works fine in Spark 2.4

Facing the issue in Spark 3.1.1:

dataset.write.mode("overwrite").insertInto("external_table"); this should remove existing partitions and persists new data right?

Actual: On Rerun case, Data got removed from HDFS and still partition details in table metadata so trying again to remove the data and throws FileNotFoundException.

To reproduce very quickly, please check once in spark3-shell .

jAnshula · ‎07-27-2021

what's the Ambari/ CM version you are using, Also can you share us the HMS logs for this time range

DURAISAM · ‎08-16-2021

Hi,

I am going with workaround as of now for this issue.

Workaround: Drop partition explicit via code from metadata

Cloudera Community

Support Questions

In Spark 3.1.1 , Failed while InsertInto with dynamic partition for External Table