Member since
10-03-2018
6
Posts
0
Kudos Received
0
Solutions
11-14-2018
05:01 PM
Hi All, I am trying to export the DF data to S3 bucket but i am not able to do. I am getting below error. WARN FileOutputCommitter: Could not delete s3a://bucketname/Output/CheckResult/_temporary/0/_temporary/attempt_20181114215639_0002_m_000000_0
18/11/14 21:56:40 ERROR FileFormatWriter: Job job_20181114215639_0002 aborted. I have tried below code for testing. res.coalesce(1).write.format("csv").save("s3a://bucketname/Output/CheckResult") I am not sure what is the issue exactly here? I heard that Spark does not really support writes to non-distributed storage. Kindly help me how to achieve this? Many thanks.
... View more
Labels:
- Labels:
-
Apache Spark
10-09-2018
01:04 PM
Hi Aditya, Thanks a lot for your help. Is it possible to do in scala? As i dont have knowledge on python.
... View more
10-09-2018
03:52 AM
Hi all, I am trying to read the files from s3 bucket (which contain many sub directories). As of now i am giving the phyisical path to read the files. How to read the files without hard coded values. File path : S3 bucket name/Folder/1005/SoB/20180722_zpsx3Gcc7J2MlNnViVp61/JPR_DM2_ORG/ *.gz files "S3 bucket name/Folder/" this path is fixed one and client id(1005) we have to pass as a parameter. Under Sob folder, we are having monthly wise folders and I have to take only latest two months data. Please help me how to read the data without hard-coded. Many thanks for your help.
... View more
Labels:
- Labels:
-
Apache Spark
10-04-2018
03:11 AM
Hi Aditya, Thanks for your reply. I have downloaded hadoop-aws.jar file and aws-java-sdk-1.7.4.jar also. I am using intellij and from Intellij am trying to access S3 bucket to read the data but no luck. Even, in core-site.xml also i have configured the aws key and secret key.
<configuration>
<property>
<name>fs.s3n.awsAccessKeyId</name>
<value>......</value>
</property>
<property>
<name>fs.s3n.awsSecretAccessKey</name>
<value>......</value>
</property>
<property>
<name>fs.s3a.awsAccessKeyId</name>
<value>......</value>
</property>
<property>
<name>fs.s3a.awsSecretAccessKey</name>
<value>......</value>
</property>
</configuration>
Many thanks for your help.
... View more
10-03-2018
05:31 PM
Hi All, I am new to Scala coding and trying to access AWS S3 bucket but it is failed. Please find the below error. Exception in thread "main" java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.s3native.NativeS3FileSystem not found I want to read multiple files(*.gz) from S3 bucket and make a single CSV file (merge all gz files) but unable to read the data and getting exception as shown above. Here is my code : import org.apache.spark.SparkConf import org.apache.spark.sql.SparkSession object ReadS3Files { def main(args: Array[String]) { val spark = SparkSession.builder.master("local[*]").appName("ReadS3Files").getOrCreate() val sc = spark.sparkContext val conf = new SparkConf().setAppName("ReadS3Files").setMaster("local[*]") val sqlContext = spark.sqlContextspark.sparkContext.hadoopConfiguration.set("fs.s3n.impl", "org.apache.hadoop.fs.s3native.NativeS3FileSystem") spark.sparkContext.hadoopConfiguration.set("fs.s3n.awsAccessKeyId", "AccessKey") spark.sparkContext.hadoopConfiguration.set("fs.s3n.awsSecretAccessKey", "SecretKey") val df = spark.read.format("csv").option("delimiter",",").load("s3n://bucketname/201808/1034/JPR_DM2_ORG/*.gz") df.count() spark.stop() } } Please help me on this issue. Many thanks for your help.
... View more
Labels:
- Labels:
-
Apache Spark