Member since 
    
	
		
		
		07-14-2017
	
	
	
	
	
	
	
	
	
	
	
	
	
	
			
      
                99
            
            
                Posts
            
        
                5
            
            
                Kudos Received
            
        
                4
            
            
                Solutions
            
        My Accepted Solutions
| Title | Views | Posted | 
|---|---|---|
| 1887 | 09-05-2018 09:58 AM | |
| 2512 | 07-31-2018 12:59 PM | |
| 1978 | 01-15-2018 12:07 PM | |
| 1722 | 11-23-2017 04:19 PM | 
			
    
	
		
		
		08-13-2018
	
		
		02:14 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Hi,  I am recieving data from TCP as a json stream using pyspark.  I want to save the files(append files and basically a file is a minute based ex:yyyyMMddHHmm (file), so all messages in one min should go to the corresponding file) and parallelly I want to save the json to orc hive table.  I have two questions involved  1.  *[path : '/folder/file']  When I receive data in Dstream I flatMap and split("\n") and then repartition(1).saveAsTextfile(path,"json")  lines = ssc.socketTextStream("localhost", 9999)
flat_map = lines.flatMap(lambda x: x.split("\n"))
flat_map.repartition(1).saveAsTextFiles(path,"json")
  The above saves to the path given, but instead of giving one single file per minute and save to the folder, this makes three folders with a _SUCCESS file and a part_00000 file in every folder, which is not expected.  Please help me how to solve this as expected : basically one folder per day and one file per minute under the folder?  2. If I want to save the json to orc hive table.. can I do it from a dstream? or I have to change the dstream to rdd and then perform some processing to save it to orc?  as I am new to pyspark please help with the above or with some examples. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
			
	
					
			
		
	
	
	
	
				
		
	
	
- Labels:
- 
						
							
		
			Apache Spark
			
    
	
		
		
		08-10-2018
	
		
		09:12 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							@Veerendra Nath  Jasthi What is the frequency of files and how big are the files in the given path? Also could you please check your JVM heap memory (this is a guess, not solution)? 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		08-10-2018
	
		
		09:08 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							@Felix Albani Can you help me with the pyspark version of the above please. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		07-31-2018
	
		
		02:36 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 @Veerendra Nath  JasthiPossibly, you have a complicated computation (may be regex) running on Getfile, which is taking a lot of time to   complete, also check howmany files it is getting based on your regex, it should be fixed. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		07-31-2018
	
		
		01:51 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							@Felix Albani Thank your for quick response, I will go through the given info 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		07-31-2018
	
		
		01:45 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							@Veerendra Nath  Jasthi It should not be the case, what processors you are using and getting the issue? 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		07-31-2018
	
		
		01:15 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 @veerendra  If you are using nifi below 1.7, the best way is to restart nifi 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		07-31-2018
	
		
		01:10 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Hi All,  I am beginner to spark and wanted to do the below.  a port 55500 is trying to send jsons as a stream (ex: {"one":"1","two":"2"}{"three":"3","four":"4"}).  I have a orc table in hive with columns given below  one, two,three,four,spark_streaming_startingtime,spark_streaming_endingtime,partition_value  I want to load the streaming values in to hive orc table.  Can you please guide me how to achieve this.  Thank you for your support. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
			
	
					
			
		
	
	
	
	
				
		
	
	
- Labels:
- 
						
							
		
			Apache Spark
			
    
	
		
		
		07-31-2018
	
		
		12:59 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 @Bryan Bende I checked the nifi-app.log, the JVM heap size is max, whcih is rejecting the connections and failing the processor. It got resolved as the heap size issue is solved.  Thank you for you support. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		07-25-2018
	
		
		12:31 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 I am using listen syslog processor (not on port 514).  It was working fine, we upgraded nifi to 1.5.0.3.1.1.7-2, it worked fine after the upgrade but from last 3 days, the processor is throwing error as  failed to invoke @Onscheduled method    Can you please let me know how to come over this.     
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
			
	
					
			
		
	
	
	
	
				
		
	
	
- Labels:
- 
						
							
		
			Apache NiFi
 
        













