About klprathyusha

klprathyusha · ‎11-15-2018

Any help on this request? Please.

klprathyusha · ‎11-14-2018

Hi All, I am trying to export the DF data to S3 bucket but i am not able to do. I am getting below error. WARN FileOutputCommitter: Could not delete s3a://bucketname/Output/CheckResult/_temporary/0/_temporary/attempt_20181114215639_0002_m_000000_0 18/11/14 21:56:40 ERROR FileFormatWriter: Job job_20181114215639_0002 aborted. I have tried below code for testing. res.coalesce(1).write.format("csv").save("s3a://bucketname/Output/CheckResult") I am not sure what is the issue exactly here? I heard that Spark does not really support writes to non-distributed storage. Kindly help me how to achieve this? Many thanks.

klprathyusha · ‎10-09-2018

Hi Aditya, Thanks a lot for your help. Is it possible to do in scala? As i dont have knowledge on python.

klprathyusha · ‎10-09-2018

Hi all, I am trying to read the files from s3 bucket (which contain many sub directories). As of now i am giving the phyisical path to read the files. How to read the files without hard coded values. File path : S3 bucket name/Folder/1005/SoB/20180722_zpsx3Gcc7J2MlNnViVp61/JPR_DM2_ORG/ *.gz files "S3 bucket name/Folder/" this path is fixed one and client id(1005) we have to pass as a parameter. Under Sob folder, we are having monthly wise folders and I have to take only latest two months data. Please help me how to read the data without hard-coded. Many thanks for your help.

klprathyusha · ‎10-04-2018

Hi Aditya, Thanks for your reply. I have downloaded hadoop-aws.jar file and aws-java-sdk-1.7.4.jar also. I am using intellij and from Intellij am trying to access S3 bucket to read the data but no luck. Even, in core-site.xml also i have configured the aws key and secret key. <configuration> <property> <name>fs.s3n.awsAccessKeyId</name> <value>......</value> </property> <property> <name>fs.s3n.awsSecretAccessKey</name> <value>......</value> </property> <property> <name>fs.s3a.awsAccessKeyId</name> <value>......</value> </property> <property> <name>fs.s3a.awsSecretAccessKey</name> <value>......</value> </property> </configuration> Many thanks for your help.

klprathyusha · ‎10-03-2018

Hi All, I am new to Scala coding and trying to access AWS S3 bucket but it is failed. Please find the below error. Exception in thread "main" java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.s3native.NativeS3FileSystem not found I want to read multiple files(*.gz) from S3 bucket and make a single CSV file (merge all gz files) but unable to read the data and getting exception as shown above. Here is my code : import org.apache.spark.SparkConf import org.apache.spark.sql.SparkSession object ReadS3Files { def main(args: Array[String]) { val spark = SparkSession.builder.master("local[*]").appName("ReadS3Files").getOrCreate() val sc = spark.sparkContext val conf = new SparkConf().setAppName("ReadS3Files").setMaster("local[*]") val sqlContext = spark.sqlContextspark.sparkContext.hadoopConfiguration.set("fs.s3n.impl", "org.apache.hadoop.fs.s3native.NativeS3FileSystem") spark.sparkContext.hadoopConfiguration.set("fs.s3n.awsAccessKeyId", "AccessKey") spark.sparkContext.hadoopConfiguration.set("fs.s3n.awsSecretAccessKey", "SecretKey") val df = spark.read.format("csv").option("delimiter",",").load("s3n://bucketname/201808/1034/JPR_DM2_ORG/*.gz") df.count() spark.stop() } } Please help me on this issue. Many thanks for your help.

Online	Offline
Last Visited	‎11-19-2018 01:43 PM

Member Since	‎10-03-2018 01:40 PM
Last Visited	‎11-19-2018 01:43 PM
Posts	6

Cloudera Community

Re: How to Export DF data to S3 bucket

How to Export DF data to S3 bucket

Re: Reading files from s3 bucket sub folders

Reading files from s3 bucket sub folders

Re: Spark Scala : S3native.NativeS3Filesystem Not ...

Spark Scala : S3native.NativeS3Filesystem Not foun...