About klprathyusha

stevel · ‎11-30-2018

Sorry, missed this. the issue here is that "S3" isn't a "real" filesystem, there's no file/directory rename, and instead we have to list every file created and copy it over. Which relies on listings being correct, which S3, being eventually consistent, doesn't always hold up. Looks like you've hit an inconsistency on a job commit To get consistent listings (HDP 3) enable S3Guard To avoid the slow rename process and the problems caused by inconistency within a single query, switch to the "S3A Committers" which come with Spark on HDP-3.0. These are specially designed to safely write work into S3 If you can't do either of those, you cannot safely use S3 as a direct destination of work. You should write into HDFS and then, afterwards, copy it to S3.

asirna · ‎10-09-2018

@Lakshmi Prathyusha, I'm not sure of how to do this in Scala. I guess you may have similar date time functions in Scala as well. You can apply this logic in Scala.

prakash_r_a · ‎07-22-2019

Im facing same issue . did anyone resolved it? please post here how it got fixed?

Online	Offline
Last Visited	‎11-19-2018 01:43 PM

Member Since	‎10-03-2018 01:40 PM
Last Visited	‎11-19-2018 01:43 PM
Posts	6

Cloudera Community

Re: How to Export DF data to S3 bucket

Re: Reading files from s3 bucket sub folders

Re: Spark Scala : S3native.NativeS3Filesystem Not ...