About DURAISAM

DURAISAM · ‎11-11-2021

while running the spark job , we are facing "com.esotericsoftware.kryo.KryoException: Buffer underflow." . this is happening intermittently . Logic :Read few tables via spark-sql and write into hdfs using DS.insertinto(..). here insertInto only makes an action in the flow of the code. Analysis : As checked in spark-history UI ,while execution moves from one stage to another stage it happens. Env: CDH 7.x., Spark 2.x with Java

DURAISAM · ‎08-16-2021

Hi, I am going with workaround as of now for this issue. Workaround: Drop partition explicit via code from metadata

DURAISAM · ‎07-28-2021

Hi , Are you manually removing the partitions? Yes . we cant use "set hive.msck.path.validation=ignore" because if we run msck repair .. automatically to sync HDFS folders and Table partitions right? this is not happening and no err. our aim: Make HDFS path and partitions in table should sync in any condition

DURAISAM · ‎07-28-2021

Hi, Mostly the issue occured due to partitions path in HDFS based on defined partitions in the table if partition columns and HDFS path matches, this issue never occur

DURAISAM · ‎07-26-2021

Hi @jAnshula , Tried and checked Permission of folder all fine in HDFS spark.sql.files.ignoreMissingFiles=true spark.sql.sources.partitionOverwriteMode=DYNAMIC Works fine in Spark 2.4 Facing the issue in Spark 3.1.1: dataset.write.mode("overwrite").insertInto("external_table"); this should remove existing partitions and persists new data right? Actual: On Rerun case, Data got removed from HDFS and still partition details in table metadata so trying again to remove the data and throws FileNotFoundException. To reproduce very quickly, please check once in spark3-shell .

DURAISAM · ‎07-26-2021

Use Case: - Delete the partitions from HDFS by Manual - Run MSCK repair - HDFS and partition is in metadata -Not getting sync. Tried multiple times and Not getting sync after upgrading CDH 6.x to CDH 7.x Any solutions please

DURAISAM · ‎07-26-2021

set hive.exec.dynamic.partition=true; set hive.exec.dynamic.partition.mode=nostrict; dataset.write.mode("overwrite").InsertInto("table") with dynamic partitions Tried in spark program and also in SPARK3-SHELL ERROR Hive: Exception when loading partition with parameters partPath=hdfs://nameservice/user/hive/warehouse/tmp.db/id_name2/.hive-staging_hive_2020-08-05_14-38-00_715_3629476922121193803-1/-ext-10000/dt=2001, table=id_name2, partSpec={dt=2001}, loadFileType=REPLACE_ALL, listBucketingLevel=0, isAcid=false, resetStatistics=false org.apache.hadoop.hive.ql.metadata.HiveException: Directory hdfs://nameservice/user/hive/warehouse/tmp.db/id_name2/dt=2001 could not be cleaned up. at org.apache.hadoop.hive.ql.metadata.Hive.deleteOldPathForReplace(Hive.java:4666) at org.apache.hadoop.hive.ql.metadata.Hive.replaceFiles(Hive.java:4597) at org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:2132) at org.apache.hadoop.hive.ql.metadata.Hive$5.call(Hive.java:2588) at org.apache.hadoop.hive.ql.metadata.Hive$5.call(Hive.java:2579) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.io.FileNotFoundException: File hdfs://nameservice/user/hive/warehouse/tmp.db/id_name2/dt=2001 does not exist. at org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:1053) at org.apache.hadoop.hdfs.DistributedFileSystem.access$1000(DistributedFileSystem.java:131) at org.apache.hadoop.hdfs.DistributedFileSystem$24.doCall(DistributedFileSystem.java:1113) at org.apache.hadoop.hdfs.DistributedFileSystem$24.doCall(DistributedFileSystem.java:1110) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:1120) at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1868) at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1910) at org.apache.hadoop.hive.ql.metadata.Hive.cleanUpOneDirectoryForReplace(Hive.java:4681) at org.apache.hadoop.hive.ql.metadata.Hive.deleteOldPathForReplace(Hive.java:4661) ... 8 more Error in query: org.apache.hadoop.hive.ql.metadata.HiveException: Exception when loading 1 in table id_name2 with loadPath=hdfs://nameservice/user/hive/warehouse/tmp.db/id_name2/.hive-staging_hive_2020-08-05_14-38-00_715_3629476922121193803-1/-ext-10000;

Online	Offline
Last Visited	‎11-11-2021 08:04 PM

Member Since	‎07-26-2021 02:07 AM
Last Visited	‎11-11-2021 08:04 PM
Posts	8

Cloudera Community

com.esotericsoftware.kryo.KryoException: Buffer un...

Re: In Spark 3.1.1 , Failed while InsertInto with...

Re: CDH 7.1 : MSCK Repair is not working properly...

Re: MSCK repair issue

Re: In Spark 3.1.1 , Failed while InsertInto with...

CDH 7.1 : MSCK Repair is not working properly if ...

In Spark 3.1.1 , Failed while InsertInto with dyn...