Member since 
    
	
		
		
		10-03-2018
	
	
	
	
	
	
	
	
	
	
	
	
	
	
			
      
                6
            
            
                Posts
            
        
                0
            
            
                Kudos Received
            
        
                0
            
            
                Solutions
            
        
			
    
	
		
		
		11-14-2018
	
		
		05:01 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Hi All,  I am trying to export the DF data to S3 bucket but i am not able to do. I am getting below error.  WARN FileOutputCommitter: Could not delete s3a://bucketname/Output/CheckResult/_temporary/0/_temporary/attempt_20181114215639_0002_m_000000_0
18/11/14 21:56:40 ERROR FileFormatWriter: Job job_20181114215639_0002 aborted.  I have tried below code for testing.  res.coalesce(1).write.format("csv").save("s3a://bucketname/Output/CheckResult")  I am not sure what is the issue exactly here? I heard that Spark does not really support writes to non-distributed storage.  Kindly help me how to achieve this?  Many thanks. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
			
	
					
			
		
	
	
	
	
				
		
	
	
- Labels:
 - 
						
							
		
			Apache Spark
 
			
    
	
		
		
		10-09-2018
	
		
		01:04 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Hi Aditya,  Thanks  a lot for your help. Is it possible to do in scala? As i dont have knowledge on python. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		10-09-2018
	
		
		03:52 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Hi all,  I am trying to read the files from s3 bucket (which contain many sub directories). As of now i am giving the phyisical path to read the files. How to read the files without  hard coded values.  File path :  S3 bucket name/Folder/1005/SoB/20180722_zpsx3Gcc7J2MlNnViVp61/JPR_DM2_ORG/ *.gz files  "S3 bucket name/Folder/" this path is fixed one and client id(1005) we have to pass as a parameter.  Under Sob folder, we are having monthly wise folders and I have to take only latest two months data.  Please help me how to read the data without  hard-coded.  Many thanks for your help. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
			
	
					
			
		
	
	
	
	
				
		
	
	
- Labels:
 - 
						
							
		
			Apache Spark
 
			
    
	
		
		
		10-04-2018
	
		
		03:11 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Hi Aditya,  Thanks for your reply. I have downloaded hadoop-aws.jar file and aws-java-sdk-1.7.4.jar also. I am using intellij and from Intellij am  trying to access S3 bucket to read the data but no luck. Even, in core-site.xml also i have configured the aws key and secret key.  
<configuration>
<property>
    <name>fs.s3n.awsAccessKeyId</name>
    <value>......</value>
  </property>
  <property>
    <name>fs.s3n.awsSecretAccessKey</name>
    <value>......</value>
  </property>
  <property>
    <name>fs.s3a.awsAccessKeyId</name>
    <value>......</value>
  </property>
  <property>
    <name>fs.s3a.awsSecretAccessKey</name>
    <value>......</value>
  </property>
</configuration>
    Many thanks for your help. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		10-03-2018
	
		
		05:31 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Hi All,  I am new to Scala coding and trying to access AWS S3 bucket but it is failed.  Please find the below error.  Exception in thread "main" java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.s3native.NativeS3FileSystem not found  I want to read multiple files(*.gz)  from S3 bucket and make a single CSV file (merge all gz files)  but unable to read the data and getting exception as shown above.  Here is my code :    import org.apache.spark.SparkConf  import org.apache.spark.sql.SparkSession    object ReadS3Files {    def main(args: Array[String]) {      val spark = SparkSession.builder.master("local[*]").appName("ReadS3Files").getOrCreate()      val sc = spark.sparkContext      val conf = new SparkConf().setAppName("ReadS3Files").setMaster("local[*]")      val sqlContext = spark.sqlContextspark.sparkContext.hadoopConfiguration.set("fs.s3n.impl", "org.apache.hadoop.fs.s3native.NativeS3FileSystem")      spark.sparkContext.hadoopConfiguration.set("fs.s3n.awsAccessKeyId", "AccessKey")      spark.sparkContext.hadoopConfiguration.set("fs.s3n.awsSecretAccessKey", "SecretKey")        val df = spark.read.format("csv").option("delimiter",",").load("s3n://bucketname/201808/1034/JPR_DM2_ORG/*.gz")      df.count()        spark.stop()    }  }  Please help me on this issue.  Many thanks for your help. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
			
	
					
			
		
	
	
	
	
				
		
	
	
- Labels:
 - 
						
							
		
			Apache Spark