Member since
07-26-2021
8
Posts
0
Kudos Received
0
Solutions
11-11-2021
04:24 PM
while running the spark job , we are facing "com.esotericsoftware.kryo.KryoException: Buffer underflow." . this is happening intermittently . Logic :Read few tables via spark-sql and write into hdfs using DS.insertinto(..). here insertInto only makes an action in the flow of the code. Analysis : As checked in spark-history UI ,while execution moves from one stage to another stage it happens. Env: CDH 7.x., Spark 2.x with Java
... View more
Labels:
- Labels:
-
Apache Spark
08-16-2021
06:13 AM
Hi, I am going with workaround as of now for this issue. Workaround: Drop partition explicit via code from metadata
... View more
07-28-2021
12:58 AM
Hi , Are you manually removing the partitions? Yes . we cant use "set hive.msck.path.validation=ignore" because if we run msck repair .. automatically to sync HDFS folders and Table partitions right? this is not happening and no err. our aim: Make HDFS path and partitions in table should sync in any condition
... View more
07-28-2021
12:48 AM
Hi, Mostly the issue occured due to partitions path in HDFS based on defined partitions in the table if partition columns and HDFS path matches, this issue never occur
... View more
07-26-2021
09:09 PM
Hi @jAnshula , Tried and checked Permission of folder all fine in HDFS spark.sql.files.ignoreMissingFiles=true spark.sql.sources.partitionOverwriteMode=DYNAMIC Works fine in Spark 2.4 Facing the issue in Spark 3.1.1: dataset.write.mode("overwrite").insertInto("external_table"); this should remove existing partitions and persists new data right? Actual: On Rerun case, Data got removed from HDFS and still partition details in table metadata so trying again to remove the data and throws FileNotFoundException. To reproduce very quickly, please check once in spark3-shell .
... View more
07-26-2021
06:14 AM
Use Case: - Delete the partitions from HDFS by Manual - Run MSCK repair - HDFS and partition is in metadata -Not getting sync. Tried multiple times and Not getting sync after upgrading CDH 6.x to CDH 7.x Any solutions please
... View more
Labels:
- Labels:
-
Apache Hive
07-26-2021
05:54 AM
set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nostrict; dataset.write.mode("overwrite").InsertInto("table") with dynamic partitions Tried in spark program and also in SPARK3-SHELL ERROR Hive: Exception when loading partition with parameters partPath=hdfs://nameservice/user/hive/warehouse/tmp.db/id_name2/.hive-staging_hive_2020-08-05_14-38-00_715_3629476922121193803-1/-ext-10000/dt=2001, table=id_name2, partSpec={dt=2001}, loadFileType=REPLACE_ALL, listBucketingLevel=0, isAcid=false, resetStatistics=false
org.apache.hadoop.hive.ql.metadata.HiveException: Directory hdfs://nameservice/user/hive/warehouse/tmp.db/id_name2/dt=2001 could not be cleaned up. at org.apache.hadoop.hive.ql.metadata.Hive.deleteOldPathForReplace(Hive.java:4666)
at org.apache.hadoop.hive.ql.metadata.Hive.replaceFiles(Hive.java:4597)
at org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:2132)
at org.apache.hadoop.hive.ql.metadata.Hive$5.call(Hive.java:2588)
at org.apache.hadoop.hive.ql.metadata.Hive$5.call(Hive.java:2579)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.FileNotFoundException: File hdfs://nameservice/user/hive/warehouse/tmp.db/id_name2/dt=2001 does not exist. at org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:1053)
at org.apache.hadoop.hdfs.DistributedFileSystem.access$1000(DistributedFileSystem.java:131)
at org.apache.hadoop.hdfs.DistributedFileSystem$24.doCall(DistributedFileSystem.java:1113)
at org.apache.hadoop.hdfs.DistributedFileSystem$24.doCall(DistributedFileSystem.java:1110)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:1120)
at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1868)
at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1910)
at org.apache.hadoop.hive.ql.metadata.Hive.cleanUpOneDirectoryForReplace(Hive.java:4681)
at org.apache.hadoop.hive.ql.metadata.Hive.deleteOldPathForReplace(Hive.java:4661)
... 8 more
Error in query: org.apache.hadoop.hive.ql.metadata.HiveException: Exception when loading 1 in table id_name2 with loadPath=hdfs://nameservice/user/hive/warehouse/tmp.db/id_name2/.hive-staging_hive_2020-08-05_14-38-00_715_3629476922121193803-1/-ext-10000;
... View more
Labels:
- Labels:
-
Apache Spark